Although going back a few years, generations in technology, text analytics projects usually failed (Altaplana’s “Text/Content Analytics 2011: User Perspectives on Solutions and Providers”). I don’t know what more recent statistics would reveal. I suspect not much of an improvement. One of the challenges is the fact that many, if not most, organization’s content is a mess. Seriously a mess. How you can make heads or tails out of that information is beyond me.
Text analytics is serious business. It can alter the course of your organization, for good or for bad. It’s not something to take lightly. According to our SharePoint and Office 365 Metadata Survey, most organizations overwhelmingly wanted to do text analytics as a future project. What strikes me is that how can you perform text analytics when most of your content is in shambles? These organizations did not have a metadata repository, were using manual tagging (99%) and the remaining 1% were using drop down lists. Unfortunately we didn’t ask how many had an information lifecycle plan for unstructured and semi-structured content. I would guesstimate that answer to be close to zero.
My assertion is that before you tackle text analytics, clean up your content. It will reduce the size of the data set and filter out the garbage. That’s my thought for the day.