Friday, January 11, 2013

Big Data, NoSQL, Now What?

In my previous post, I outlined three high level BI use cases and their key attributes;

  1. Corporate/enterprise BI
  2. Data scientist 
  3. Multi-tenant SaaS use case.
In this post, I dive a little deeper into the Corporate/Enterprise BI use case and discuss the appropriate technologies.

Corporate/Enterprise BI

Let's first consider the case of corporate or enterprise BI .  Corporations have extremely complex structured data and derived measures.  Unique data elements are produced by product development, manufacturing, finance, demand planning, inventory management, sales operations and sales planning, order management and fulfillment, HR, etc.  There are also data that are shared among all of these departments.  The classic problem of multiple versions of the truth has not gone away, and vast sums of money are wasted in operational inefficiencies as armies of BI/IT/IS people (whatever you happen to call them) and analysts wrestle with reconciling these data as they try to present a coherent picture to management.  This problem has not been adequately solved, though a great deal of writing on the topic of BI and data architecture and data warehousing suggests that it has.  It's almost as if the industry thinks that this problem magically went away after it got distracted by the mountains of unstructured data piling up from clickstreams, Splunk, social media, etc.  These problems certainly haven't gone away for the name brand companies you know.  Tons of operational dollars and insights remain on the table as these problems go unmet.  We know how to solve them, but the discipline to solve them is lacking and companies are getting distracted.  The derived measures in particular give BI departments headaches.  

Can these problems be solved with Hadoop?  MapReduce?  Are the answers to this problem buried somewhere in mountains of unstructured data waiting to be discovered by advanced analytics?  Clearly not.  Do the analysts in your company know how to write MapReduce?  Do they know how to create dashboards and scorecards that retrieve their data from MongoDB?  Again, of course, they most certainly do not. 

The solution here lies in the traditional data warehouse space.  The power of the Teradata platform has been democratized through competition from Netezza, Greenplum, Vertica, Exadata and the like.  These technologies enable data warehouse specialists to manage and store structured data with clear lineage and auditable data definitions and calculations.  They allow analysts and BI specialists familiar with SQL to build powerful queries, using either BI tools such as Cognos, Business Objects, SAS, MicroStrategy, or just plain SQL, to ask novel questions without having to know how to write MapReduce jobs in Javascript or Python. 

New 'Big Data' technologies have a place in the corporate and enterprise world, but they don't yet replace the old school data warehouse.

No comments:

Post a Comment