Big Data – The Word of the Day
‘Big Data’ is the word of the day, sort of like how knowledge management was ten years ago. Big Data definitions are still relatively vague and elusive. Right now, there is no ‘accepted’ definition of Big Data and a lot of the focus has been on the infrastructure needed to support it. The evolving, if not the de facto definition used by IDC, Gartner Group, and IBM define Big Data as the three ‘V’s’, Volume, Velocity, and Variety.
Data Volume is the key attribute of Big Data. Organizations are trying to manage and analyze terabytes, petabytes, and even exabytes of information. Volume can even be defined as the number of records, transactions, or files. Every organization will have their own definition of volume, defining it in terms of how long it must be stored, if it is used in data warehousing, or only for analytics.
Velocity refers to the speed of which information is consumed by an organization. Velocity is the second attribute of Big Data. Particularly challenging, although used quite frequently now, is the growing volumes of data combined with the speed at which data is delivered and the ability to analyze it in real-time. For example, in a blog by Forrester, Progress Apama has 100 microseconds to detect trades coming in at 5,000 orders per second.
The third attribute, Variety is made up of structured, unstructured, semi-structured data, and unstructured content. A recent survey by Information Today indicated that unstructured content either has or will surpass the data in relational databases in the next thirty-six months. The same survey indicated that organizations still feel that unstructured content is ungovernable, regardless of Big Data entering the picture.
Unstructured content delivers a high level of value if the golden nuggets of information can be found. How it differs from structured or unstructured data is that it contains insights, relationships among disparate pieces of content and human sentiment. Unfortunately, as referenced in the survey above, most organizations to do not even understand the extent of their unstructured content, nor do they leverage it for business advantage.
Perhaps Big Data is just the big picture now. For those of us who focus on unstructured content, Big Data will be of value only to those organizations that have capitalized on their unstructured content, understand the value, and are actively managing it. For most organizations even this is still far away.
More to come on Big Data and unstructured content…
Follow us on Twitter