There are many new tools out there for
managing Big Data, and new innovations are being delivered to the market
every month, from big and small companies alike. I've actually been impressed with the progress that Microsoft has made with SQL Server of late, mostly driven by Dave DeWitt, PhD, and his new MSFT Jim Gray Systems Lab at University of Wisconsin. Most business folks don't realize that many of the technical principles behind systems at Teradata, Greenplum, Netezza and others were based on innovations such as the Gamma parallel database system as well as the dozen+ systems that Mike Stonebraker and his vast network of database systems researchers have been churning out over the past 15+ years.
The challenge now for most commercial IT and database professionals is the process of trying to match the
right new tools with the appropriate workloads. If, as Mike and his team say in their seminal paper "one size does not fit all for database systems," then one of the hardest next steps is figuring out which database system is right for which workload (a topic for another blog
post). This problem is exacerbated by the tendency to over-promote the potential applications for any one of these new systems, but hey, that's what marketing people get paid to do ;)
Until just recently, however, another key element that has been
missing is the focus on how data is going to be used when people implement their Big Data systems. Big Data is useless unless you architect your systems to support the questions that end users are going to ask. (Yet more fodder for another blog post.)
For many decades, there was no open, scalable, affordable way to do
Big Analytics. So the kind of capabilities that Tom Davenport talks about in Competing on Analytics were available only to companies with
huge financial resources – either to pay companies like Teradata (which is where the Wal-Marts and eBays of the world ended up) or hire tons of Computer Science PhDs and Stats professionals to build custom stuff at large scale (Google, Yahoo, etc.) The analytics themselves were even further restricted within those
companies, to professional analysts or senior executives who had staff to make
the results of these analytics digestible and available to them. These capabilities were kind of a shadow of the Executive Information Systems trend in the 1980s.
Today this is changing. Established companies and start-ups are
creating technologies that “democratize” Big Analytics, making large-scale analysis affordable
for even medium-sized businesses and usable by average people (instead of just business
analysts or professional statisticians). A great example is Google Analytics. Ten-plus years ago, the kind of analytics you get today with Google Analytics were only available to Webmasters who had implemented specialized logging systems and customized visualization. Now, my son Jonah gets analysis of his Web site that would have cost big bucks a decade ago.
However, there are still missing pieces. I believe we need:
- Large-scale, multi-tenant analytic database as a service, similar to Cloudant and Dynamo but tuned/configured specifically for analytical workloads with the appropriate network infrastructure to support large loads
- Large-scale, multi-tenant statistics as a service – equivalent functionality to SPSS, R, SAS, but hosted and available as an affordable Web service. The best example of this right now is probably Revolution. I guess the acronym would be Statistics as a Service
– or Statistics as a Utility
- Radically better visualization tools and services: I think that HTML5 has clearly enabled this and is making tools like Ben Fry's Processing more accessible so that the masses can do "artful analytics"
Once analytic databases and statistical functionality are available as Web services, I believe we'll see the proliferation of many new affordable and sophisticated analytic services that leverage these capabilities.
One of the best examples of this that I’ve been working on is a product
created by Recorded Future.* I
believe it is one of the most advanced analytics companies in the world. Christopher Ahlberg, Staffan Truve and the
entire amazing team at Recorded Future are making some of the most
sophisticated analytics in the world available to the masses. Another example in Boston is Humedica, where
my friend Paul Bleicher (founder of Phase Forward) is doing fantastic work –
perhaps some of the most advanced health care analytics in the world.
At this point there are still significant
technical hurdles. But in the very near future, the challenge for most
companies won’t be technology: it will be people, especially those who
will no longer be limited to static reporting.
To be successful, businesses will need to
build analytics-driven cultures: cultures where everyone believes it's his job to be information-seeking and to think analytically about the integrated data that can help him make better decisions and move faster every day.
The #1 Step Toward Building the
Right Culture
So, how do you go about building an
analytics-driven culture?
This is obviously a long discussion;
Tom’s book is a great primer on this, particularly Chapter 7 where he points
out that it’s analytical people
that make analytics work. But I think
that the most important step for businesses is to rethink the way we build
systems and to respond to the call to action created by the “consumerization”
of information technology in the enterprise.
Every day new analytic tools are being
made available to consumers over the Internet.
Those consumers then walk into
their workplaces and are faced with a pitiful cast of static, mundane and
difficult-to-use tools provided by their IT organizations – most of which are 10+ years behind on analytics. (Sorry if it hurts to hear this, but it's true – and I include myself as one of the people who is under-delivering on meeting enterprise end users' analytical expectations).
To build analytics-driven cultures, businesses need to shift IT’s emphasis
from process automation/"reengineering" (popularized by the late Michael Hammer and others) to decision
automation. With process
automation, the average worker is treated as a programmable cog in a machine;
with decision automation, the average worker is treated like an individual and
an intelligent decision point.
This isn’t New-Age management theory or
voodoo; there’s already a good track record for it. Southwest Airlines empowers its gate agents
do things that gate agents at American Airlines can’t even think of doing.
Ditto for Nordstrom and Zappos in retail, where sales representatives have broad discretion on how they satisfy each customer. These companies believe in the
individual identity of every person in that organization and use data and
systems to empower them. (The antithesis
to these companies are companies that are stuck in post-industrial employment models – most amusingly like Charlie Chaplin’s employer in "Modern Times" [watch]. And, yes, such companies do exist even today – otherwise we wouldn't have shows like "The Office" or movies like "Office Space.")
Unfortunately, the technology people in
most companies don’t think this way, and most of their vendors are still stuck in the process
automation mindset of the 80s and 90s. If companies
thought of their competitive edge as being decisions, they would expect their systems and
their user experiences to be radically different. In some ways, this is the process of thinking about an organization's systems in context of how data is going to be used/consumed instead of how the data is being created. Because we tend to build systems with a serial mindset, many systems in today's organizations were built to "catch" the data that is being generated. But the most forward-thinking organizations are designing their systems from the desired analytics back into the data that needs to be captured/managed to support the decisions of the people in their organizations.
Ironically, there are a bunch of us
artificial intelligence (AI) people from
the 1970s and 1980s who experienced a technology trend called expert
systems. Expert systems really involved
taking AI techniques and applying them to automating decision support for key
experts. Many of the tools that were pioneered back in the expert systems days
are still valid and have evolved significantly.
But we need to go one step further.
Today, consumer-based tools are providing data that empower people to
make better decisions in their daily lives. We now need business tools
that do the same: Big Analytics that enable every employee – not just the CEO
or CFO or other key expert – to make better everyday business decisions.
Here’s one great example of what can
happen when you do this.
Over the past decade, synthetic chemists have
begun to adopt quantitative/computational chemistry methods and decision tools
to make them more efficient in their wet labs. The use of tools such as Spotfire, RDKit and many others have begun to change the collaborative dynamic for chemists, by enabling them to
design libraries using quantitative tools and techniques and perhaps most importantly to use these quantitative tools collaboratively.
It’s very cool to see a bunch of chemists
working together to design compounds or libraries of compounds that they wouldn’t otherwise have created. Modern chemists use their remarkable intuition along with incredibly powerful computational models running on high- performance cloud infrastructure. They
analyze how active or greasy a potential compound could be or how soluble, big, dense, heavy or how synthetically tractable it might be. Teams of chemists spread across the globe use this data to make better decisions about which compounds are worth
synthesizing and which are not as they seek to discover therapies that make a difference in the lives of patients.
This is where the magic comes from – from
being decision-oriented not process-oriented.
Big Analytics can make average people junior artists – and natural
artists wizards – by giving them the
infrastructure to make sense of data and interact with people. It makes art and magic more repeatable.
Google Analytics is a good example of what happens when a business adopts an analytics-driven culture. By using Google Big Table with the Google filesystem, Google expressed the value of its analytics in a way that could be given
to anyone who manages a Web site. Google then watched the value of the
analytics get more rich, statistical, analytical.
I predict that this is what will happen
in the rest of the business world as Big Analytics takes hold and
analytics-driven cultures become more the norm – and expected of every enterprise system.
I have seen what can happen close up,
many times. As the co-founder of Vertica, I was fortunate in having Mozilla and Zynga as two of our best early customers. Zynga thought of itself
as an analytics company first and foremost.
Yes, their business was providing compelling games, but their
competitive edge came from making analytics-based recommendations to their customers in real time within games and about where to place ads in their
games. Another company that I work with closely that is providing this type of capability is Medio Systems.* Companies like Medio are democratizing Big Analytics for many companies and users.
By rethinking how you build systems – within the context of how the data in the system will be analyzed/impactful, and
thinking of every person in the company as an intelligent decision point –
you’ll smooth the path to Big Analytics and 21st-century competitive analytics.
Analytics as Oxygen
In the future, analytics won’t be
something that only analysts do. Rather, analytics will be like oxygen for
everyone in your organization, helping
them make better decisions faster to out-maneuver competition. Competing on analytics is not something to be
done only in the boardroom: it’s most powerful when implemented from the bottom
up.
By empowering the people who are closest
to the action in their organizations, Big Analytics will have an impact
that dwarfs the potential suggested by Executive Information Systems and Expert Systems. Companies that figure out how to leverage this trend will reap significant rewards – not unlike how companies like I2 and Trilogy realized the value of artificial intelligence decades after AI was perceived to have failed.
*Disclosure: I am a board member,
investor or advisor to this company.