Many organizations still struggle with the most basic aspects of managing unstructured content, which include free-form language, emails, documents, and social networking applications. The perceived lack of need, or seemingly overwhelming challenges, for managing unstructured content has caused not only the inability to manage content but has also led to poor information governance practices. This has far more immediate and serious implications in terms of compliance and data privacy issues, which can lead to fines, sanctions, and loss of business.
What is the Problem?
The problem is that unstructured content for the most part has been ignored. Many organizations, if not most, do not have a plan for, nor do they proactively manage, their unstructured content. Not only that, they are not using their unstructured content at a most basic level to improve business processes. For example:
• 80% of Enterprise Data is unstructured
• 60% of documents are obsolete
• 50% of documents are duplicates
• A typical knowledge worker will spend 2.5 hours per day searching for information
• 85% of relevant documents are never retrieved in search
• The average cost of manually tagging one item runs from $4 to $7 per document, and does not factor in the accuracy of the meta tags nor the repercussions from mistagged content
• 67% of data loss in records management is due to end user error
• 70% of data breaches are due to a mistake or malicious intent by end users
Ensuring that the right information is available to end users and decision makers is fundamental to trusting the accuracy of the information. Once this has been accomplished the content can be managed and used to extend the realm of unstructured content, beyond improving business processes such as search, records management, and data privacy. Organizations can then find the descriptive needles in the haystack to gain competitive advantage and increase business agility.
From the Big Data view, turning everything into structured data is an option, but the current maturity of text analysis tools rate the certainty of the information at less than 70%. This is just data extraction, not concepts or ideas contained in the unstructured content. Unreliable information ultimately produces random garbage. A new and more accurate approach is needed.
Concept Searching’s technologies and framework analyze and extract highly correlated concepts from very large document collections. This enables organizations to attain an ecosystem of semantics that delivers understandable results.
The valuable insight gained can be used to identify competitive advantages, customer perception, regional trends, and, perhaps more importantly, identify the internal knowledge capital that exists but is rarely used because it cannot be found.
I am always curious, Big Data and to a much lesser extent Text Analytics are enjoying much hype in the marketplace. Do you use Text Analytics tools? What do you see are the drawbacks?