Can Big Data and Text Analytics Be Friends? No, I Don’t Think So

Years ago, more years than I care to admit, data warehousing was all the rage. A precursor, if you will, to big data. Times have changed. Big data is, well, big data. So much so, a data warehousing solution could no longer choke down data and spit out an answer. Hence, scalability is key, as is performance. And, of course, the ability to handle massive amounts of data, fast.

My focus is unstructured content, falling into the realm of text analytics, mining, or whatever the mot du jour – I am not going to argue what the best word is for analyzing unstructured content. If we assume that analysts and pundits are correct, then 80 percent of all data is unstructured, which means companies jumping into big data will gain insight on only 20 percent of their data, at best. Added to that, is the ‘fact’, as stated by Gartner, that 80 percent of business decisions are made using unstructured data. Ok, that is the last time I am quoting the 80 percent rule – find out more about the truth in the statistics here.

The way I look at it, text analytics is more of an art, whereas big data is a science. With big data, you know where you are going – or you believe you do and it is finite. It is a well-defined set of inputs. With text analytics, you may know you have a problem to solve, but may not know all the dots that need to be connected, if they exist, or where they exist. It lends itself to discovery and the hopeful ‘aha’ moment. To me, it’s more interesting. Although, the two paths can diverge rather dramatically. I would assert that unstructured data offers the opportunity to chart the future, whereas big data is where we have been, or where we are, not necessarily where we are going.

I don’t think text analytics carries the ‘status’ of big data. In our annual survey white papers, big data always tops the list as a desirable function, but hardly any survey respondents are using it. The current mode of operation appears to be software vendors retrofitting text into a database field, and then calling it a day.

Look at healthcare, it is rare that medical professionals can access unstructured data. And yet, it holds a wealth of information. I do believe software vendors wear blinders. They are reluctant to change, and text is not considered to be of much value. On the other hand, text analytics aficionados have spiraled downward into social and the voice of the customer. For both camps, what happened to solving business problems? Do you think there could be a balance where the best of both worlds can be combined?

