I realize I am starting to sound like a broken record in regards to my growing disdain over the hype about ‘Big Data’. We only deal with big data from an aspect of unstructured content so any comments are limited to that use of the term. I read a very good article on Tech Republic the other day that brought up the point that before big data should be clean data. For the other side of the house I can’t comment if that is done or not. From an unstructured content point of view I can pretty much guarantee it’s not.
If we accept the assertion that 80% of enterprise data is unstructured (IDC), 60% of documents are obsolete (eLaw), approximately 69% of the data most organizations keep can, and should, be deleted (Lorrie Luelli, Of counsel at Ryley Carlock & Applewhite, PC Information Governance), and less than 50% of content is correctly indexed, meta tagged or efficiently searchable (IDC). I would say the state of unstructured content in most organizations is a mess. It can’t even be found let alone used for text analytics.
I would say it’s the exception, not the norm for our clients to ‘cleanse’ their unstructured content. What about your organization? Do you have procedures and policies in place to keep your unstructured content squeaky clean?