Big Data is all the rage (when is enough enough?). Conceptual or semantic modeling is often viewed as taking information, regardless of where it exists, and forcing that information into a relational database for analysis. In an excellent article by Malcolm Chisholm, Ph.D., Big Data and the Coming Conceptual Model Revolution, he discusses the topic of the importance, and change in approach, needed for conceptual modeling.
Where I don’t think he has gone far enough, is that ‘concepts’ do not and will not fit logically into a database. “The success of columnar databases in ultra-large scale data environments has presented a challenge to the relational paradigm. Of course there is enormous hype about big data, but it is also enough of a reality to demand attention. To use the columnar databases successfully you have to unlearn the relational paradigm.” I agree with this statement.
He continues, “Conceptual models must capture all business concepts and all relevant relationships. If instances of things are also part of the business reality, they must be captured too. Unfortunately, there is no standard methodology and notation to do this. Conceptual models that communicate business reality effectively require some degree of artistic imagination. They are products of analysis, not of design.” This is where I disagree, and the traditional approach won’t work, and people need to start thinking outside of the box.
From an outside-of-the-box approach, there is indeed a standard methodology and notation to do this. It is the purpose of semantic metadata, classification, and business taxonomies. Speaking on behalf of all of us vendors there are products that work quite well in capturing the essence, meaning, concepts, and relationships between disparate pieces of data (content), and some have extreme scalability and performance capabilities. With over 80% of data in an organization categorized as ‘unstructured’ (IDC) these needles in the haystack are quite valuable to the organization and are not captured in a relational database.
Primarily relegated to ‘text analytics’ it has become a specialized practice under the ‘Big Data’ umbrella. One the one hand can the two live as one? I don’t really see that happening. Can they co-exist? They have to as the largest quantity of enterprise data is unavailable for analysis, decision making, and improved business processes.
WDYT?



































