Taxonomies and Training Sets – Be Gone

Training Sets

Every taxonomy package has a range of functionalities, some good and some not so good. Our software can now interact with artificial intelligence, so staying up to date is an important criteria. You may not need it now, but you probably will in the future.

Some packages require training sets, and training sets ad infinitum – every time you make changes you need a training set. There are a couple of issues with this.

First of all, scalability. If your training set is too small, you aren’t getting a good representation of the documents that contain the terms you defined. It typically can’t be too large, although the larger the better – your limitations will be due to performance and time. So you are sort of stuck. What are you going to run it against, 500 documents or a million documents?

And our solutions employ document movement feedback. So if I am creating a term or managing terms that aren’t classified correctly, I don’t have to wait until the classification is run. I can, at the magic click of a button, see the changes to the term I am creating or managing. It will show me samples of documents that won’t now be classified, and documents that, due to the change in classification score, will be ranked higher or lower.

This is actually quite handy as it takes place in real time, so you don’t walk away and then forget, as I surely would, what the heck the change made was. You can get it done in one fell swoop. As far as we know, we are alone in offering this functionality. If I am wrong please, feel free to let me know in no uncertain terms.

What do you think? Would this type of functionality be important to you?