So You Want to Do Text Analytics? Step 1: Content Optimization
Based on our latest SharePoint and Office 365 State of the Market Survey Results, it appears that SharePoint organizations are thinking about text analytics and data analytics, but haven’t taken the plunge or have moved them higher up the ‘must do’ list.
I attribute part of this interest to the media and analysts. Big data started out as a big deal, then sort of petered out, now it is back in the news and again has joined the ranks of a must-have application, even though organizations don’t really understand what it does or the benefits it brings. Based on our survey results, it is still a pipe dream for SharePoint organizations, and I don’t expect to see a move for a few years.
Part of the adoption problem is that when big data was hyped up a couple of years ago it was expensive and needed expertise, either in-house or external. With IT teams already busy dealing with business challenges, it became a nice-to-have item. At the time, I also noticed that everyone was talking about doing it, but no one was telling you how to do it. Have times changed? I hope so.
Our focus is text analytics. The content exists, but it’s a matter of getting to the right content to get the right answer. First and foremost on the preparation list is to clean up your content. It’s overwhelming, you can’t manage it, and it just keeps growing. Do you really need everything you are saving? Your end users will say they do, just in case they need it.
We call it content optimization, which just sounds a bit more interesting than cleaning up your old, never used content. Some analysts recommend getting rid of at least 69 percent of you content, some even more. The immediate benefit, of course, is a reduction in your server footprint – one of our clients reduced their servers from 57 to 3. Search is significantly improved, and the elimination of 50 gazillion revisions of the same document gives you a better chance of making a business decision using the right data.
What do you think of cleaning up your content? You probably could use a tool – no, this is not a vendor pitch, just good advice. What you should be looking for is a tool that right off the bat identifies duplicate documents, similar documents or revisions, content that contains privacy or sensitive vulnerabilities, records that were never declared, and content that could be archived, and that also identifies considerations for eDiscovery and litigation support. You need a tool, because it will eliminate human error and will identify content that you probably never dreamed existed. Also, it makes things go quite a bit faster, because it’s automated.
Once your corpus of content is optimized, it’s time to move on to the next step – information extraction, or the creation of a content set that contains the answer to the problem you are trying to solve. Read more about that step in our next blog.
Real-life knowledge discovery scenarios, and the significant return on investment achieved, can be heard in the expert webinar recording ‘What You Don’t Know May Hurt You – Achieving Insight and Knowledge Discovery’.