As a marketing person, I had to create a PowerPoint for a speaking engagement on the importance of classification. Although we ‘do’ auto-classification, it is typically just the term with no explanation, as it seems self-explanatory.
However, once delving into the subject, I realized that all auto-classification is not created equal. To bolster my feelings, the AIIM, Automating Information Governance – Assuring Compliance white paper we co-sponsored indicated that respondents felt that auto-classification was becoming extremely important. But the survey didn’t go into the details.
The two definitions I found (and I apologize as I don’t have the sources), were these:
- A feature found in some content management systems (CMS) or records management applications that will scan the contents of a document and automatically assign categories and keywords based on the document contents
- Content based assignment of one or more pre-defined categories to documents (records), usually machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically
To depict what auto-classifiers do, it would look something like this visually.
The types of auto-classifiers utilize different types of technologies, and the choice can be difficult. It all depends on your specific requirements. The types include:
Some, can be a combination of technologies. The first thing is to determine the purpose of auto-classification, if you have the ability to modify classifications, is to to be used to solve a specific problem, or can it be used as an enterprise solution.
And don’t forget about addressing metadata. Regardless of the auto-classification, if it doesn’t include semantic metadata generation, walk away.
Any thoughts, experiences that would be helpful to share?