Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Friday, July 13, 2012

Vertica - Remembering the Early Days


I had an awesome visit @ Vertica earlier this week for lunch. Cool new space in Cambridge and so many fantastic new people. Thanks to Colin Mahony for the invite and to all the talented engineers and business people at Vertica for building such an amazing product and a great organization!

Couple of memories that came to life for me during my visit:

Reading the draft of Mike Stonebraker's  "One Size Does Not Fit All" paper and thinking:  "This is the mission of an important new company: to prove that One Size Does Not Fit All in Database Systems."

During my first meetings with the "Vertica professors" - Mike, Dave Dewitt, Mitch Cherniack, Stan Zdonick, Sam Madden and Dan Abadi - thinking "We have an unfair intellectual advantage."  The technical hurdle was set early by this fantastic team.

Looking up at the new duct work from our original server room (at our original office), which Carty Castaldi vented into the conference room because the conf room was so cold and the servers were running so hot ;)

Inspired Duct Work by Carty Castaldi

The thrill I felt the first time that I watched SELECT work on the Vertica DB :)<

Our first Purchase Order. Thanks to Emmanuelle Skala and Tom O'Connell for that one and the many more that followed and made it possible to build such a great product :)

At our first company summer picnic at Mike's place on Lake Winnipesaukee: taking Shilpa's husband Nitin Sonawane for a ride on the JetSki and him being thrown 10 feet in the air going over a wave. I thought he'd never talk to me again. So glad that he didn't get hurt and that he talks to me regularly ;) 

Our first big customer deal with Verizon and then the first repeat purchases by V. Thanks to Derek Schoettle and Rob O'Brien for building such a great telco vertical and for doing deals with integrity from Day One.

Sitting in the basement at Zip's house in Chicago with Stan, Zip and Mike as they jammed Bluegrass music and we all ate Chicago-style pizza until the wee hours.   Thanks to Zip and to everyone at Getco for being such a great early customer and partner.

Relief I felt when Sybase admitted that Vertica did not infringe on their IP :)  Thanks to Mike Hughes and everyone else involved for the truly awesome result.


Getting early offers from a bunch of big companies to buy Vertica and thinking "These guys don't realize how important Big Data is going to be and how great our product is and how incredibly talented our engineering team is."  Thank you to our BOD for resisting the early temptation in spite of tough economic conditions at the time and thanks to Chris Lynch for negotiating such a great deal with HP.  

During lunch on Wednesday, realizing that Vertica's product is truly world-class and has proven that one size does not fit all. Special thanks to every engineer at Vertica, especially Chuck B. : you all ROCK!

I have a much more detailed post in the works, about the early days of Vertica and what I as a founder learned from the experience.  Stay tuned for this post in the next few months.

Monday, July 9, 2012

Medio and The Future of Big Data Analytics

Why I Joined the Advisory BoD of Medio Systems

Today, I'm thrilled to announce that I've joined the advisory Board of Directors of Medio Systems.  

Over the past nine months, I've had a chance to work with Brian Lent, Rob Lilleness and many of the other folks on the team at Medio.  I've been approached by dozens of "Big Data" companies over the past two to three years as more people have begun to focus on large-scale database challenges.  Medio is one of the few companies that I've chosen to work with directly because they're creatively tackling some of the most pressing challenges, including: 
  • Big Data Analytics in Real Time:  Over the past 10 years I've watched as database systems have become more and more "real time": as data is generated, users expect to see that data and be able to analyze it immediately.  Medio has created a  platform that enables its customers to analyze their data in real time without the daily-latency constraints of traditional data warehouse solutions. Incredibly powerful. 
  • Mobile/Tablet Devices: As I sit here writing this blog post, my wife and father-in-law are playing Words with Friends on their respective iPads. Two years ago, my wife told me she would never use an iPad. Now, she's addicted. Mobile/tablet devices matter, and they generate a wealth of Big Data that can help businesses understand and get closer to their customers than ever before. The Medio team has uniquely deep experience with mobile app analytics.
  • Democratizing Analytics: As we were building Vertica back in 2005-2010, it became clear to me that once people had figured out how to represent their data more efficiently, the bottleneck would quickly become how to enable regular people (not just database experts or business analysts) to take advantage of the data and use the data plus statistics to ask complex questions and solve cool problems.  Medio's focus has been on empowering business people with the tools necessary to make better business decisions, truly democratizing analytics.  
  • Analytics Made Easy: I watched dozens of our customers at Vertica deploy solutions that required them to integrate database technologies with reporting and visualization tools and then tune/tweak those tools and the integrations. This turned out to be a  never-ending problem that distracted them from the actual problems and questions they were trying to analyze. Customers didn't want to be experts at database systems or visualization: what they wanted were tools to help them answer business questions faster and more effectively. Medio has created the infrastructure that enables  customers to focus on asking and getting answers to their questions without worrying about how the underlying technology works.  
  • Predictive Capabilities: I strongly believe that analytic systems must have forward-looking/predictive capabilities. Just analyzing history is no longer enough. The next wave of analytic systems will empower users to anticipate what might happen in the future and model the potential impact of decisions they make today, leveraging history as the basis for those models.  Medio CTO Brian Lent has been an early proponent and tireless champion of using information to help support decision-making with predictive capabilities. His vision and the technology the Medio team has created are unique.  
To see for yourself, check out Medio.  This is a company worth watching.

Thursday, April 19, 2012

Building an Analytics-Driven Culture

Turning Big Data and Big Analytics into Business Opportunity

If you haven’t read my friend Tom Davenport’s book Competing on Analytics (Harvard University Press, 2007), you should.  If you have read it, it’s a really good time to read it again. 

Why?  The Big Data revolution.  Or I should say, the Big Analytics revolution. (BTW, I think that "Big Analytics" isn't a great term either. But what the heck, let's just make it easy for the marketing folks to transition from Data to Analytics by using the same adjective ;) 

In his book, Tom talked about organizations that were using analytics – analyzing massive amounts of data – to gain a real competitive edge in their business performance.  Practitioners ranged from health care organizations and pharmaceutical companies to retailers (such as Best Buy) and the entertainment industry (Harrah’s Entertainment, whose CEO Gary Loveland, an MIT graduate, wrote the foreword to the book).  And we can’t forget the sports teams – not just the Oakland A’s, famously portrayed in Michael Lewis’ book and then the movie Moneyball, but our own Boston Red Sox and New England Patriots. Professional sports is being transformed radically by analytic tools, techniques and culture.  

Today, I think we’re on the edge of a secular shift in business: the ability of virtually any business – not just large and technically sophisticated businesses with big budgets – to get a competitive edge by using analytics every single day.

Big Analytics for the Rest of Us

Big Data is only part of the story. What matters more is what you do with Big Data: Big Analytics.

Big Data is a fact of life for almost every company today.  It’s not something you run out and buy.  You already have it, whether you want it or not – or whether you know it or not. It’s the large quantities of data that companies accumulate and save daily about their customers, their employees and their partners; plus the vast corpus of public data available to companies elsewhere on the Web. 

In years past, the constraints of traditional database technology – such as first-generation relational database architectures and the outdated business models of the companies who commercialized these systems – made it difficult and expensive for people and organizations to store and access such volumes of data for analysis. Today, the popular adoption of innovative technologies such as the Hadoop distributed data file system (HDFS)/MapReduce, as well as many other "built for purpose" database systems enable data to be aggregated and available for analysis cost efficiently with extreme performance.  (BTW, I believe that Hadoop/MR is way over-hyped right now: it's great and very useful, but only one piece of the Big Data/Big Analytics puzzle.)

Some of the innovative products/companies that I've had the privilege to be a part of include:
There are MANY others  which is good database innovation mojo  especially compared to seven years ago when Mike Stonebraker and I started Vertica.  At that point, the standard response from people when I said I was working with Mike on a new database company was "Why does the world need another database engine?  Who could possibly compete with the likes of Microsoft, IBM and Oracle?"  But the reality was that Oracle and the other large RDBMS vendors had significantly stifled innovation in database systems for 20+ years.  

Jit Saxena and the team at Netezza deserve huge kudos for proving that, starting in the early 2000s, the time was right for innovation in large-scale commercial database system architectures. Companies were starved for database systems that were built for analytical purposes.  I'm not a fan of using proprietary hardware to solve database problems (amazing how quickly people forgot about the Britton Lee experiment with "database machines").  But putting the proprietary hardware debate aside, thanks to innovators like Mike Stonebraker, Dave Dewitt, Stan Zdonik, Mitch Cherniack, Sam Madden, Dan Abadi, Jit Saxena and many others, now we're well on our way to making up for lost time.

Some other database start-ups of note include:
There are many new tools out there for managing Big Data, and new innovations are being delivered to the market every month, from big and small companies alike.  I've actually been impressed with the progress that Microsoft has made with SQL Server of late, mostly driven by Dave DeWitt, PhD, and his new MSFT Jim Gray Systems Lab at University of Wisconsin.  Most business folks don't realize that many of the technical principles behind systems at Teradata,  Greenplum, Netezza and others were based on innovations such as the Gamma parallel database system as well as the dozen+ systems that Mike Stonebraker and his vast network of database systems researchers have been churning out over the past 15+ years.  

The challenge now for most commercial IT and database professionals is the process of trying to match the right new tools with the appropriate workloads.  If, as Mike and his team say in their seminal paper "one size does not fit all for database systems," then one of the hardest next steps is figuring out which database system is right for which workload (a topic for another blog post).  This problem is exacerbated by the tendency to over-promote the potential applications for any one of these new systems, but hey, that's what marketing people get paid to do ;)

Until just recently, however, another key element that has been missing is the focus on how data is going to be used when people implement their Big Data systems.  Big Data is useless unless you architect your systems to support the questions that end users are going to ask. (Yet more fodder for another blog post.) 

For many decades, there was no open, scalable, affordable way to do Big Analytics. So the kind of capabilities that Tom Davenport talks about in Competing on Analytics were available only to companies with huge financial resources  either to pay companies like Teradata (which is where the Wal-Marts and eBays of the world ended up) or hire tons of Computer Science PhDs  and Stats professionals to build custom stuff at large scale (Google, Yahoo, etc.)   The analytics themselves were even further restricted within those companies, to professional analysts or senior executives who had staff to make the results of these analytics digestible and available to them. These capabilities were kind of a shadow of the Executive Information Systems trend in the 1980s.

Today this is changing.  Established companies and start-ups are creating technologies that “democratize” Big Analytics, making large-scale analysis affordable for even medium-sized businesses and usable by average people (instead of just business analysts or professional statisticians). A great example is Google Analytics. Ten-plus years ago, the kind of analytics you get today with Google Analytics were only available to Webmasters who had implemented specialized logging systems and customized visualization. Now, my son Jonah gets analysis of his Web site that would have cost big bucks a decade ago.

However, there are still missing pieces. I believe we need:

  • Large-scale, multi-tenant analytic database as a service, similar to Cloudant and Dynamo  but tuned/configured specifically for analytical workloads with the appropriate network infrastructure to support large loads
  • Large-scale, multi-tenant statistics as a service – equivalent functionality to SPSS, RSAS, but hosted and available as an affordable Web service. The best example of this right now is probably Revolution.   I guess the acronym would be Statistics as a Service  – or Statistics as a Utility
  • Radically better visualization tools and services: I think that HTML5 has clearly enabled this and is making tools like Ben Fry's Processing more accessible so that the masses can do "artful analytics"
Once analytic databases and statistical functionality are available as Web services, I believe we'll see the proliferation of many new affordable and sophisticated analytic services that leverage these capabilities.  One of the best examples of this that I’ve been working on is a product created by Recorded Future.*  I believe it is one of the most advanced analytics companies in the world.  Christopher Ahlberg, Staffan Truve and the entire amazing team at Recorded Future are making some of the most sophisticated analytics in the world available to the masses.  Another example in Boston is Humedica, where my friend Paul Bleicher (founder of Phase Forward) is doing fantastic work – perhaps some of the most advanced health care analytics in the world. 

At this point there are still significant technical hurdles. But in the very near future, the challenge for most companies won’t be technology: it will be people, especially those who will no longer be limited to static reporting.

To be successful, businesses will need to build analytics-driven cultures: cultures where everyone believes it's his job to be information-seeking and to think analytically about the integrated data that can help him make better decisions and move faster every day.  

The #1 Step Toward Building the Right Culture

So, how do you go about building an analytics-driven culture? 

This is obviously a long discussion; Tom’s book is a great primer on this, particularly Chapter 7 where he points out that it’s analytical people that make analytics work.  But I think that the most important step for businesses is to rethink the way we build systems and to respond to the call to action created by the “consumerization” of information technology in the enterprise.

Every day new analytic tools are being made available to consumers over the Internet.  Those  consumers then walk into their workplaces and are faced with a pitiful cast of static, mundane and difficult-to-use tools provided by their IT organizations – most of which are 10+ years behind on analytics.  (Sorry if it hurts to hear this, but it's true – and I include myself as one of the people who is under-delivering on meeting enterprise end users' analytical expectations).

To build analytics-driven cultures,  businesses need to shift IT’s emphasis from process automation/"reengineering" (popularized by the late Michael Hammer and others) to decision automation.  With process automation, the average worker is treated as a programmable cog in a machine; with decision automation, the average worker is treated like an individual and an intelligent decision point.

This isn’t New-Age management theory or voodoo; there’s already a good track record for it.  Southwest Airlines empowers its gate agents do things that gate agents at American Airlines can’t even think of doing. Ditto for Nordstrom and Zappos in retail, where sales representatives have broad discretion on how they satisfy each customer. These companies believe in the individual identity of every person in that organization and use data and systems to empower them.  (The antithesis to these companies are companies that are stuck in post-industrial employment models   most amusingly like Charlie Chaplin’s employer in "Modern Times" [watch]. And, yes, such companies do exist even today  otherwise we wouldn't have shows like "The Office" or movies like "Office Space.")

Unfortunately, the technology people in most companies don’t think this way, and most of their vendors are still stuck in the process automation mindset of the 80s and 90s.  If companies thought of their competitive edge as being decisions, they would expect their systems and their user experiences to be radically different.  In some ways, this is the process of thinking about an organization's systems in context of how data is going to be used/consumed instead of how the data is being created.  Because we tend to build systems with a serial mindset, many systems in today's organizations were built to "catch" the data that is being generated. But the most forward-thinking organizations are designing their systems from the desired analytics back into the data that needs to be captured/managed to support the decisions of the people in their organizations.  

Ironically, there are a bunch of us artificial intelligence (AI)  people from the 1970s and 1980s who experienced a technology trend called expert systems.  Expert systems really involved taking AI techniques and applying them to automating decision support for key experts. Many of the tools that were pioneered back in the expert systems days are still valid and have evolved significantly.

But we need to go one step further. Today, consumer-based tools are providing data that empower people to make better decisions in their daily lives. We now need business tools that do the same: Big Analytics that enable every employee – not just the CEO or CFO or other key expert  to make better everyday business decisions.

Here’s one great example of what can happen when you do this.

Over the past decade, synthetic chemists have begun to adopt quantitative/computational chemistry methods and decision tools to make them more efficient in their wet labs.  The use of tools such as Spotfire, RDKit and many others have begun to change the collaborative dynamic for chemists, by enabling them to design libraries using quantitative tools and techniques and perhaps most importantly to use these quantitative tools collaboratively.

It’s very cool to see a bunch of chemists working together to design compounds or libraries of compounds that they wouldn’t otherwise have created.  Modern chemists use their remarkable intuition along with incredibly powerful computational models running on high- performance cloud infrastructure.  They analyze how active or greasy a potential compound could be or how soluble, big, dense, heavy or how synthetically tractable it might be. Teams of chemists spread across the globe use this data to make better decisions about which compounds are worth synthesizing and which are not as they seek to discover therapies that make a difference in the lives of patients.  

This is where the magic comes from – from being decision-oriented not process-oriented.  Big Analytics can make average people junior artists – and natural artists wizards – by giving them the  infrastructure to make sense of data and interact with people.  It makes art and magic more repeatable.

Google Analytics is a good example of what happens when a business adopts an analytics-driven culture.  By using Google Big Table with the Google filesystem, Google expressed the value of its analytics in a way that could be given to anyone who manages a Web site. Google then watched the value of the analytics get more rich, statistical, analytical.  

I predict that this is what will happen in the rest of the business world as Big Analytics takes hold and analytics-driven cultures become more the norm   and expected of every enterprise system.

I have seen what can happen close up, many times. As the co-founder of Vertica, I was fortunate in having Mozilla and Zynga as two of our best early customers.  Zynga thought of itself as an analytics company first and foremost.  Yes, their business was providing compelling games, but their competitive edge came from making analytics-based recommendations to their customers in real time within games and about where to place ads in their games.  Another company that I work with closely that is providing this type of capability is Medio Systems.*  Companies like Medio are democratizing Big Analytics for many companies and users.  

By rethinking how you build systems  within the context of how the data in the system will be analyzed/impactful, and thinking of every person in the company as an intelligent decision point – you’ll smooth the path to Big Analytics and 21st-century competitive analytics.

Analytics as Oxygen

In the future, analytics won’t be something that only analysts do. Rather, analytics will be like oxygen for everyone in your organization,  helping them make better decisions faster to out-maneuver competition.  Competing on analytics is not something to be done only in the boardroom: it’s most powerful when implemented from the bottom up.  

By empowering the people who are closest to the action in their organizations, Big Analytics  will have an impact that dwarfs the potential suggested by Executive Information Systems and Expert Systems. Companies that figure out how to leverage this trend will reap significant rewards  not unlike how companies like I2 and Trilogy realized the value of artificial intelligence decades after AI was perceived to have failed.  

*Disclosure: I am a board member, investor or advisor to this company.