Text Mining – Getting to the Minutiae
Big data is still high on the radar for many organizations. Text mining not so much, yet 80 percent of business decisions are made using unstructured content. It holds a wealth of information that typically remains hidden.
Some experts state that text analytics and text mining are pretty much the same. According to Seth Grimes, in an article in the Huffington Post, “The terms ‘text analytics’ and ‘text mining’ are largely interchangeable. They name the same set of methods, software tools, and applications. Their distinction stems primarily from the background of the person using each — ‘text mining’ seems most used by data miners, and ‘text analytics’ by individuals and organizations in domains where the road to insight was paved by business intelligence tools and methods — so that the difference is largely a matter of dialect.”
Not so fast, as another expert weighs in. Linguamatics CTO, David Milward, offers this distinction, “There is certainly overlap, but I think there are cases of analytics that would not be classed as text mining and vice versa. Text analytics tends to be more about processing a document collection as a whole, text mining traditionally has more of the needle in a haystack connotation.”
The Oxford English Dictionary defines text mining as, “The process or practice of examining large collections of written resources in order to generate new information.” It goes on to say that the goal of text mining is to discover relevant information in text, by transforming the text into data that can be used for further analysis.
Now this definition raises a problem. The big data industry has consistently taken the approach that unstructured text must be turned into data first, and only then can it be analyzed. This is a subprime approach, as all the insight and human knowledge becomes lost in the translation.
We could discuss all day whether it is mining or analytics. But one of our long-term clients, a multi-national oil and gas company, successfully addressed the challenges of text mining and analytics. It needed to deploy a high-quality search solution across technical research data, to be used by over 5,000 geophysicists globally.
The challenge was poor search and the inability to find all relevant information to make an informed decision to reduce costs. The objective was to improve geological decision making, through the ability to identify accurate, relevant and related content from a large corpus of highly technical and specialized information.
The solution was the implementation of conceptClassifier for SharePoint, to provide the underlying semantic and geotagging framework, in order to improve SharePoint search. This project saved millions of dollars, through the ability to access relevant information – in other words, to improve search.
Regardless of the definition you choose to use, the bottom line is to leverage information, transforming it into useable knowledge to impact the bottom line. This is used to deliver insight, to improve decision making, reduce risk and costs, and increase revenues.
What do you think? Our client was able to achieve its objectives. If you had a better search solution, could you achieve yours?