Part 2: Classification Versus Categorization?

News Upcoming Webinars Trade Shows and Events Press Releases Newsletters Blog
Categorization - Auto-Classification - Concept Searching

Part 2: Classification Versus Categorization?

Since one of our core competencies is classification/categorization, I was curious what Microsoft was using for their classification technology. In Part 1 of our Classification Versus Categorization? blog we looked at different words that meant the same thing. By the by, that’s one of the challenges with artificial intelligence – the technology used was developed by Secure Islands, an Israeli based startup purchased by Microsoft.

In a nutshell, the two brothers who founded Secure Islands developed a new solution for data protection – embedding security directly in data. The software is designed to classify sensitive information automatically, based on policies outlined by an enterprise, and then to wrap it in the appropriate level of digital rights management (DRM).

Secure Islands’ data immunization technology uniquely embeds protection within information itself, at the moment of creation or initial organizational access. This process is automatic and accompanies sensitive information throughout its lifecycle, from creation through usage and collaboration to storage and archival. The software captures data at creation, in motion, in use, and at rest – identifying data in need of protection based on content-driven, pre-defined policies, and submitting for automatic classification and protection.

I am assuming that the approach to integration was that the fewer the changes the better, at least concerning functionality. The heart of the software was called IQProtector. In the form of metadata, IQProtector attached to each file or email message the data sensitivity level, the data security type – such as customer information, personally identifiable information, financial information, and personal credit card information – and other security related characteristics. It flagged documents based on words, phrases, regular search expressions, credit card data, and other criteria. And IQProtector contained special logic to identify Payment Card Industry (PCI) data.

It does not appear that much changed in the classification process. On the downside, I did come across information that indicates because it is highly Windows centric, due to its reliance on RMS technology, document access is noticeably slowed. According to a software review, “IQProtector ‘paused’ the opening or creation of encrypted files by a few seconds even for small documents. For larger files, access was annoyingly slow. We concluded that IQProtector is inappropriate for files of about 25 MB or greater.” That’s about as close as I can get as to the classification technology.

Now our classification, oops categorization, is quite different. Although we do specialize in data discovery and classification and content optimization file analytics, our insight engine automatically generates multi-term metadata and classifies it against one or more taxonomies. It brings value to any application that requires metadata, such as searchrecords managementmigration, and data privacy and sensitive information detection.

It is a technology framework, not an application. Since we generate metadata in the form of subjects, topics, and concepts, the ambiguity in single words is eliminated. It will also identify similar content that shares the same theme, even if the exact terms do not match. It also eliminates end user tagging, even if it’s a record. It respects the inherent security of content, although it is outside the scope of the software to change the security settings. So, you can understand that our type of classification term is quite different from that used by Microsoft.

In future, I will concentrate on using the term categorization, as my technical guru says it’s a more appropriate word than classification. Until then, you say potato and I say potahto.

Concept Searching