Thursday, April 19, 2012

Building an Analytics-Driven Culture

Turning Big Data and Big Analytics into Business Opportunity

If you haven’t read my friend Tom Davenport’s book Competing on Analytics (Harvard University Press, 2007), you should.  If you have read it, it’s a really good time to read it again. 

Why?  The Big Data revolution.  Or I should say, the Big Analytics revolution. (BTW, I think that "Big Analytics" isn't a great term either. But what the heck, let's just make it easy for the marketing folks to transition from Data to Analytics by using the same adjective ;) 

In his book, Tom talked about organizations that were using analytics – analyzing massive amounts of data – to gain a real competitive edge in their business performance.  Practitioners ranged from health care organizations and pharmaceutical companies to retailers (such as Best Buy) and the entertainment industry (Harrah’s Entertainment, whose CEO Gary Loveland, an MIT graduate, wrote the foreword to the book).  And we can’t forget the sports teams – not just the Oakland A’s, famously portrayed in Michael Lewis’ book and then the movie Moneyball, but our own Boston Red Sox and New England Patriots. Professional sports is being transformed radically by analytic tools, techniques and culture.  

Today, I think we’re on the edge of a secular shift in business: the ability of virtually any business – not just large and technically sophisticated businesses with big budgets – to get a competitive edge by using analytics every single day.

Big Analytics for the Rest of Us

Big Data is only part of the story. What matters more is what you do with Big Data: Big Analytics.

Big Data is a fact of life for almost every company today.  It’s not something you run out and buy.  You already have it, whether you want it or not – or whether you know it or not. It’s the large quantities of data that companies accumulate and save daily about their customers, their employees and their partners; plus the vast corpus of public data available to companies elsewhere on the Web. 

In years past, the constraints of traditional database technology – such as first-generation relational database architectures and the outdated business models of the companies who commercialized these systems – made it difficult and expensive for people and organizations to store and access such volumes of data for analysis. Today, the popular adoption of innovative technologies such as the Hadoop distributed data file system (HDFS)/MapReduce, as well as many other "built for purpose" database systems enable data to be aggregated and available for analysis cost efficiently with extreme performance.  (BTW, I believe that Hadoop/MR is way over-hyped right now: it's great and very useful, but only one piece of the Big Data/Big Analytics puzzle.)

Some of the innovative products/companies that I've had the privilege to be a part of include:
There are MANY others  which is good database innovation mojo  especially compared to seven years ago when Mike Stonebraker and I started Vertica.  At that point, the standard response from people when I said I was working with Mike on a new database company was "Why does the world need another database engine?  Who could possibly compete with the likes of Microsoft, IBM and Oracle?"  But the reality was that Oracle and the other large RDBMS vendors had significantly stifled innovation in database systems for 20+ years.  

Jit Saxena and the team at Netezza deserve huge kudos for proving that, starting in the early 2000s, the time was right for innovation in large-scale commercial database system architectures. Companies were starved for database systems that were built for analytical purposes.  I'm not a fan of using proprietary hardware to solve database problems (amazing how quickly people forgot about the Britton Lee experiment with "database machines").  But putting the proprietary hardware debate aside, thanks to innovators like Mike Stonebraker, Dave Dewitt, Stan Zdonik, Mitch Cherniack, Sam Madden, Dan Abadi, Jit Saxena and many others, now we're well on our way to making up for lost time.

Some other database start-ups of note include:
There are many new tools out there for managing Big Data, and new innovations are being delivered to the market every month, from big and small companies alike.  I've actually been impressed with the progress that Microsoft has made with SQL Server of late, mostly driven by Dave DeWitt, PhD, and his new MSFT Jim Gray Systems Lab at University of Wisconsin.  Most business folks don't realize that many of the technical principles behind systems at Teradata,  Greenplum, Netezza and others were based on innovations such as the Gamma parallel database system as well as the dozen+ systems that Mike Stonebraker and his vast network of database systems researchers have been churning out over the past 15+ years.  

The challenge now for most commercial IT and database professionals is the process of trying to match the right new tools with the appropriate workloads.  If, as Mike and his team say in their seminal paper "one size does not fit all for database systems," then one of the hardest next steps is figuring out which database system is right for which workload (a topic for another blog post).  This problem is exacerbated by the tendency to over-promote the potential applications for any one of these new systems, but hey, that's what marketing people get paid to do ;)

Until just recently, however, another key element that has been missing is the focus on how data is going to be used when people implement their Big Data systems.  Big Data is useless unless you architect your systems to support the questions that end users are going to ask. (Yet more fodder for another blog post.) 

For many decades, there was no open, scalable, affordable way to do Big Analytics. So the kind of capabilities that Tom Davenport talks about in Competing on Analytics were available only to companies with huge financial resources  either to pay companies like Teradata (which is where the Wal-Marts and eBays of the world ended up) or hire tons of Computer Science PhDs  and Stats professionals to build custom stuff at large scale (Google, Yahoo, etc.)   The analytics themselves were even further restricted within those companies, to professional analysts or senior executives who had staff to make the results of these analytics digestible and available to them. These capabilities were kind of a shadow of the Executive Information Systems trend in the 1980s.

Today this is changing.  Established companies and start-ups are creating technologies that “democratize” Big Analytics, making large-scale analysis affordable for even medium-sized businesses and usable by average people (instead of just business analysts or professional statisticians). A great example is Google Analytics. Ten-plus years ago, the kind of analytics you get today with Google Analytics were only available to Webmasters who had implemented specialized logging systems and customized visualization. Now, my son Jonah gets analysis of his Web site that would have cost big bucks a decade ago.

However, there are still missing pieces. I believe we need:

  • Large-scale, multi-tenant analytic database as a service, similar to Cloudant and Dynamo  but tuned/configured specifically for analytical workloads with the appropriate network infrastructure to support large loads
  • Large-scale, multi-tenant statistics as a service – equivalent functionality to SPSS, RSAS, but hosted and available as an affordable Web service. The best example of this right now is probably Revolution.   I guess the acronym would be Statistics as a Service  – or Statistics as a Utility
  • Radically better visualization tools and services: I think that HTML5 has clearly enabled this and is making tools like Ben Fry's Processing more accessible so that the masses can do "artful analytics"
Once analytic databases and statistical functionality are available as Web services, I believe we'll see the proliferation of many new affordable and sophisticated analytic services that leverage these capabilities.  One of the best examples of this that I’ve been working on is a product created by Recorded Future.*  I believe it is one of the most advanced analytics companies in the world.  Christopher Ahlberg, Staffan Truve and the entire amazing team at Recorded Future are making some of the most sophisticated analytics in the world available to the masses.  Another example in Boston is Humedica, where my friend Paul Bleicher (founder of Phase Forward) is doing fantastic work – perhaps some of the most advanced health care analytics in the world. 

At this point there are still significant technical hurdles. But in the very near future, the challenge for most companies won’t be technology: it will be people, especially those who will no longer be limited to static reporting.

To be successful, businesses will need to build analytics-driven cultures: cultures where everyone believes it's his job to be information-seeking and to think analytically about the integrated data that can help him make better decisions and move faster every day.  

The #1 Step Toward Building the Right Culture

So, how do you go about building an analytics-driven culture? 

This is obviously a long discussion; Tom’s book is a great primer on this, particularly Chapter 7 where he points out that it’s analytical people that make analytics work.  But I think that the most important step for businesses is to rethink the way we build systems and to respond to the call to action created by the “consumerization” of information technology in the enterprise.

Every day new analytic tools are being made available to consumers over the Internet.  Those  consumers then walk into their workplaces and are faced with a pitiful cast of static, mundane and difficult-to-use tools provided by their IT organizations – most of which are 10+ years behind on analytics.  (Sorry if it hurts to hear this, but it's true – and I include myself as one of the people who is under-delivering on meeting enterprise end users' analytical expectations).

To build analytics-driven cultures,  businesses need to shift IT’s emphasis from process automation/"reengineering" (popularized by the late Michael Hammer and others) to decision automation.  With process automation, the average worker is treated as a programmable cog in a machine; with decision automation, the average worker is treated like an individual and an intelligent decision point.

This isn’t New-Age management theory or voodoo; there’s already a good track record for it.  Southwest Airlines empowers its gate agents do things that gate agents at American Airlines can’t even think of doing. Ditto for Nordstrom and Zappos in retail, where sales representatives have broad discretion on how they satisfy each customer. These companies believe in the individual identity of every person in that organization and use data and systems to empower them.  (The antithesis to these companies are companies that are stuck in post-industrial employment models   most amusingly like Charlie Chaplin’s employer in "Modern Times" [watch]. And, yes, such companies do exist even today  otherwise we wouldn't have shows like "The Office" or movies like "Office Space.")

Unfortunately, the technology people in most companies don’t think this way, and most of their vendors are still stuck in the process automation mindset of the 80s and 90s.  If companies thought of their competitive edge as being decisions, they would expect their systems and their user experiences to be radically different.  In some ways, this is the process of thinking about an organization's systems in context of how data is going to be used/consumed instead of how the data is being created.  Because we tend to build systems with a serial mindset, many systems in today's organizations were built to "catch" the data that is being generated. But the most forward-thinking organizations are designing their systems from the desired analytics back into the data that needs to be captured/managed to support the decisions of the people in their organizations.  

Ironically, there are a bunch of us artificial intelligence (AI)  people from the 1970s and 1980s who experienced a technology trend called expert systems.  Expert systems really involved taking AI techniques and applying them to automating decision support for key experts. Many of the tools that were pioneered back in the expert systems days are still valid and have evolved significantly.

But we need to go one step further. Today, consumer-based tools are providing data that empower people to make better decisions in their daily lives. We now need business tools that do the same: Big Analytics that enable every employee – not just the CEO or CFO or other key expert  to make better everyday business decisions.

Here’s one great example of what can happen when you do this.

Over the past decade, synthetic chemists have begun to adopt quantitative/computational chemistry methods and decision tools to make them more efficient in their wet labs.  The use of tools such as Spotfire, RDKit and many others have begun to change the collaborative dynamic for chemists, by enabling them to design libraries using quantitative tools and techniques and perhaps most importantly to use these quantitative tools collaboratively.

It’s very cool to see a bunch of chemists working together to design compounds or libraries of compounds that they wouldn’t otherwise have created.  Modern chemists use their remarkable intuition along with incredibly powerful computational models running on high- performance cloud infrastructure.  They analyze how active or greasy a potential compound could be or how soluble, big, dense, heavy or how synthetically tractable it might be. Teams of chemists spread across the globe use this data to make better decisions about which compounds are worth synthesizing and which are not as they seek to discover therapies that make a difference in the lives of patients.  

This is where the magic comes from – from being decision-oriented not process-oriented.  Big Analytics can make average people junior artists – and natural artists wizards – by giving them the  infrastructure to make sense of data and interact with people.  It makes art and magic more repeatable.

Google Analytics is a good example of what happens when a business adopts an analytics-driven culture.  By using Google Big Table with the Google filesystem, Google expressed the value of its analytics in a way that could be given to anyone who manages a Web site. Google then watched the value of the analytics get more rich, statistical, analytical.  

I predict that this is what will happen in the rest of the business world as Big Analytics takes hold and analytics-driven cultures become more the norm   and expected of every enterprise system.

I have seen what can happen close up, many times. As the co-founder of Vertica, I was fortunate in having Mozilla and Zynga as two of our best early customers.  Zynga thought of itself as an analytics company first and foremost.  Yes, their business was providing compelling games, but their competitive edge came from making analytics-based recommendations to their customers in real time within games and about where to place ads in their games.  Another company that I work with closely that is providing this type of capability is Medio Systems.*  Companies like Medio are democratizing Big Analytics for many companies and users.  

By rethinking how you build systems  within the context of how the data in the system will be analyzed/impactful, and thinking of every person in the company as an intelligent decision point – you’ll smooth the path to Big Analytics and 21st-century competitive analytics.

Analytics as Oxygen

In the future, analytics won’t be something that only analysts do. Rather, analytics will be like oxygen for everyone in your organization,  helping them make better decisions faster to out-maneuver competition.  Competing on analytics is not something to be done only in the boardroom: it’s most powerful when implemented from the bottom up.  

By empowering the people who are closest to the action in their organizations, Big Analytics  will have an impact that dwarfs the potential suggested by Executive Information Systems and Expert Systems. Companies that figure out how to leverage this trend will reap significant rewards  not unlike how companies like I2 and Trilogy realized the value of artificial intelligence decades after AI was perceived to have failed.  

*Disclosure: I am a board member, investor or advisor to this company.

Friday, April 13, 2012

The Power of Mentoring

Paying Back by Paying It Forward


There are all kinds of reasons to mentor people in business.

It feels good to invest in the next generation. 

You get back more than you give. 
You'd want someone to help your own kids. 

But it’s also good business and it pays forward more than you can ever possibly imagine  especially if you are (or aspire to be) an entrepreneur.

That’s why I’m psyched to be participating in a Fireside Chat on Mentorship at the Greener Ventures Conference at the Tuck School of Business this Saturday, April 14. Joining me will be my long-time friend and colleague Dave Girouard (Dartmouth ’88, Thayer School of Engineering ’89), founder and CEO of Upstart.   Along with a number of other great folks, we will be judging the Greener Ventures Entrepreneurship Contest. We are psyched to be sponsoring the first-ever business plan competition at Dartmouth/Thayer/Tuck and are looking forward to a weekend in Hanover

I’ve been blessed over the past 20 years to have a fantastic series of mentors who have all given more to me than I can ever possibly describe or repay. 


In particular, Peter Barris (who I met in 1993 while a second-year student at Tuck) has been both consistent and thoughtful in his advice and support over the past 18+ years. One of the unique aspects of Peter’s advice over the years was that – consistently – he gave me advice that was as objective as possible. Peter was always able to abstract his own interests out from the situation. He gave me feedback and advice on my own professional development, and ideas that were the best for me in the long term – regardless of the short-term benefit or cost to himself.   This ability to focus on my long-term development regardless of short-term interests was more valuable for me than I can describe. 


Also, my friend Frank Moss has been an incredibly powerful influence on my professional development. Through a very critical time in my career, Frank taught me that – no matter what the reward  compromising your personal values in business is never worthwhile and that true leadership in business is not about making money, but rather it is about being mission-driven, building valuable things and helping people. 


I've been working with Dave Girouard on his new company, Upstart, which is focused on empowering young, smart and innovative people who are just beginning their careers and are interested in taking paths other than the traditional or conservative. 

Tuesday, April 10, 2012

Building New Systems for Scientists

Three Good Places to Start

In my last post,  I wrote about the need to build new data and software systems for scientists – particularly those working in the life sciences.  Chemists, biologists, pharmacologists, doctors and many other flavors of scientists working in life sciences/medical research are potentially the most important (and under-served) technology end-user group on the planet.  (If you are looking for a great read on this, try Reinventing Discovery : The Age of Networked Science.)

There are many ways to break down what needs to be done. But here are three ways to think about how we can help these folks:
  1. Support their basic productivity    by giving them modern tools to move through their information-oriented workdays
  2. Help them improve their collaboration with others    by bringing social tools to science/research
  3. Help them solve the hardest, most-challenging problems they are facing    by using better information/technology/computer-science tools
While all three of these are worth doing, #3 is the most tempting to spend time on, as the problems are hard and interesting intellectually.  

A great example in the life sciences currently is Next Generation Sequencing/Whole Genome Sequencing, where (with all kinds of caveats) one instrument generates about 75TB of data a year.  The cost of these instruments has dropped by an order of magnitude over the past five years, resulting in many decentralized labs just going out and buying them and getting busy with their own sequencing.  

One of the many challenges that this poses is that often the labs don't realize the cost of the storage and data management required to process and analyze the output of these experiments (my back of the envelope is at least 2X-3X the cost of the instrument itself). The worst case scenario is when the scientists point these instruments at public file shares and find those file shares filling up in a small number of days  to the dismay of other users who want to use the file shares.  Don't laugh: it's happening every day in labs all around the world .  


The drop in cost-per-genome makes Moore's Law look incremental.

Anyway, there are hundreds of interesting scientific problems like NGS that require equally advanced computer science and engineering.  One effort towards providing a place where these things can develop and grow in an open environment is OpenScience.

But I'd like to focus more on #1 and #2 – because I believe that these are actually the problems that, when solved, can deliver the most benefit to scientists very quickly. They don't depend on innovation in either the fundamental science or information technology. Rather, they depend primarily on end users and their organizations implementing technologies that already exist and   if deployed with reasonable thought and discipline   can have a huge positive impact on scientific productivity.  

Here are a few examples.

Collaboration Tools

Social networking, Wikis, Google Hangouts [watch] and other networking technologies make it easy and inexpensive to create more-flexible collaboration experiences for scientists.  These aren’t your father’s collaboration tools – the kind of process-automation gizmos that most technology companies produced for businesses over the past 30 years. Rather, these new tools empower the individual to self-publish experimental design and results and share  with groups of relevant and interested people  early and often. Think radically more- dynamic feedback loops on experimentation, and much more granular publishing and review of results and conclusions.  

Much of what scientists need in their systems is collaborative in nature.  If you are a researcher working in either a commercial organization or academic/philanthropic organization, how do you know that those experiments haven’t already been done by someone else – or aren’t being run now?  If scientists had the ability to "follow" and "share" their research and results as easily as we share social information on the Web, many of these questions would be as easy to answer as "Who has viewed my profile on LinkedIn?"  

Part of this depends on the clear definition of scientific "entities": Just as you have social entities on Facebook  Groups, Individuals, etc.   scientists have natural scientific entities. In the life sciences, it's the likes of Compounds, Proteins, Pathways, Targets, People, etc.  If you do a decent job of defining these entities and representing them in digital form with appropriate links to scientific databases (both public and private), you can easily "follow compound X." This would enable a scientist to not only identify who is working on scientific entities that he's interested in, but also fundamentally to stand on the shoulders of others, avoid reinventing  the wheel, and raise the overall level of global scientific knowledge and efficiency by sharing experiments with the community.  

Two start-ups that I mentioned in my earlier blog post  Wingu and Syapse  are creating Web services that enable this kind of increased collaboration and distribution of research in the pharma industry.  Many large commercial organizations are attempting to provide this type of functionality using systems such as SharePoint and MediaWiki.  Unfortunately, they are finding that traditional IT organizations and their technology suppliers lack the expertise to engage users in experiences that can compete for attention relative to consumer Internet and social tools.  

I've watched this dynamic as various research groups have begun to adopt Google Apps.  It's fascinating because companies and academic institutions that adopt Google Apps get the benefit of all the thousands of engineers who are working to improve Google's consumer experience and just apply these same improvements to their commercial versions  truly an example of the "commercialization of enterprise IT" (and credit to my friend Dave Girouard for the great job he did building the Enterprise group at Google over the past eight years).  

One of the ways that Google might end up playing a large and important role in scientific information is due to the fact that many academic institutions are aggressively adopting Gmail and Google Apps in general as an alternative to their outdated email and productivity systems. They have skipped the Microsoft Exchange stage and gone right to multi-tenant hosted email and apps. The additional benefit of this is that many scientists will get used to using multi-user spreadsheets and editing docs collaboratively in real time instead of editing Microsoft Office documents, sending them via email, saving them, editing them, sending them back via email, blah...blah...blah.

If companies aren't doing the Google Apps thing, they are probably stuck with Microsoft - and locked into the three-five year release cycles of Microsoft technology in order to get any significant improvements to systems like SharePoint.  After a while, it becomes obvious that the change to Google Apps is worthwhile relative to the bottleneck of traditional third-party software release cycles. Particularly for researchers, for whom these social features can have a transformational effect on their experimental velocity and personal productivity.  

Another example of how this dynamic is playing out is seen in the dynamic between innovators Yammer and Jive vs. Microsoft SharePoint. This is a great example of how innovators are driving existing enterprise folks to change, but ultimately we'll see how the big enterprise tech companies  Microsoft, IBM, etc.  can respond to the social networking and Internet companies stepping into their turf. And we'll see if Microsoft can make Office365 function like Google Apps. But if Azure is any indication, I'd be skeptical.  

Open-Source Publishing

First - thank you Tim Gowers for the post on his blog - all I can say is YES!

In my opinion, the current scientific publishing model and ecosystem are broken (yet another topic for another post). But today new bottom-up publishing tools like ResearchGate let scientists self-publish their experiments without depending on the outdated scientific publishing establishment and broken peer-review model.  Scientists should be able to publish their experiments – sharing them granularly instead of being forced to bundle them into long-latency, peer-reviewed papers in journals. Peer review is critical, but should be a process of gradually opening up experimental results and findings through granular socialization.  Peer review should not necessarily be tied to the profit-motivated and technically antiquated publishing establishment.  I love the work that the folks at Creative Commons have done in beginning to drive change in scientific publishing.   

One of the most interesting experiments with alternative models has been done by the Alzheimer Research Forum, or Alzforum. The set of tools known as SWAN is a great example of the kind of infrastructure that could easily be reused across eScience.   It makes sense that in the face of the huge challenge represented by treating Alzheimer's, people   patients, doctors, scientists, caregivers, engineers  would work together to develop the tools required to share the information and collaborate as required, regardless of the limitations of the established infrastructure and organizational boundaries.  I know there are lots of other examples and am psyched to see these highlighted and promoted :)

Data As A Service

Switching to infrastructure for a second:  One of the things that scientists do every day is create data from their experiments.  Traditionally this data lives in lots of random locations - on the hard drives of scientific instruments, on shared hard drives within a lab, on stacks of external hard drives sitting on their lab benches, perhaps a database that someone has set up within a lab.  

I believe that one of the ways that we can begin to accelerate the pace of experimentation and collaboration of scientists is to enable them to put their data into a rational location for sharing their data, conducting collaborative analytics, sharing the derived results, and establishing the infrastructure and provenance required to begin producing reproducible results.  

And, as we've seen with Next-Generation Sequencing: One of the challenges of science that depends on Big Data is that scientists are traditionally conditioned to manage their data and their analytics locally. This conditioning creates problems when you have large-scale data (can't afford to keep it all locally) and when you want to collaborate on the analysis of the data (data is too big to copy around to many locations).  This also creates a challenge in terms of the ability to create the provenance required to reproduce the results.  

One of the emerging trends in database systems is the "data-as-a-service" model. Data as a service essentially begins to eliminate the complexity and cost of setting up proprietary Big Data systems, replacing this with running databases as services on scale-out infrastructure in multi-tenant and single-tenant modes.  The most high-profile example of these lately has been Dynamo, Amazon's data-as-a-service key value store.  

One of the other well-developed data-as-a-service providers is Cloudant,  which provides developers an easy-to-use, highly scalable data platform based on Apache CouchDB.  Cloudant was founded in 2008 by three physicists from MIT, where they were responsible for running multi-petabyte data sets for physics experiments running on infrastructure such as the Large Hadron Collider.  Cloudant’s product was born out of its founders’ collective frustration with the tools available specifically in context of the interest of serving science.  (Yet again an example of the needs of science driving the development of technology that benefits the rest of us.)  

One of the things that attracted me to working with the team at Cloudant was their continued interest in developing large-scale data systems for scientists and their commitment to prioritize the features and functions required by science end-users at the top of the company's priority list. 

What other pockets of inspiration and innovation are you seeing in building new systems for scientists?  Please comment.

Tuesday, April 3, 2012

It’s Time to Build New Systems for Scientists (Particularly Life Scientists)

Society and the Cambridge Innovation Cluster Will Benefit

Scientists are potentially the most important technology end-users on the planet. They are the people who are conducting research that has the potential to improve and even save lives. Yet, for the most part, scientists have been almost criminally under-served by information technologists and the broad technology community. 

Interestingly, life sciences have had a tremendous impact on computer science.  Take, for example, Object Oriented Programming (OOP), developed by Dr. Alan Kay. A biologist by training, Dr. Kay based the fundamental concepts of OOP on microbiology: “I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning - it took a while to see how to do messaging in a programming language efficiently enough to be useful).” 

It’s time to return the favor. 

Pockets of Revolution and Excellence 

Of course, there are others who are trying to advance information technology for scientists, like visionaries Jim Gray, Mike Stonebraker, and Ben Fry. 

Throughout his career, Jim Gray, a brilliant computer scientist and database systems researcher, articulated the critical importance of building great systems for scientists. At the time of his disappearance, Jim was working with the astronomy community to build the world-wide telescope. Jim was a vocal proponent of getting all the world’s scientific information online in a format that could easily be shared to advance scientific collaboration, discourse and debate. The Fourth Paradigm is a solid summary of the principles that Jim espoused. 

My partner and Jim’s close friend Mike Stonebraker started a company  Paradigm4, based on an open source project called SciDB  to commercialize an “array-native” database system that is specifically designed for scientific applications and was inspired by Jim’s work. One of my favorite features in P4/SciDB is data provenance, which is essential for many scientific applications. If the business information community would wake up from its 30-year “one-size-fits-all” love affair with traditional DBMS, it would realize that new database engines that have provenance as an inherent feature can create better audit-ability than any of the many unnatural acts they currently do with their old school RDBMS. 

Another fantastic researcher who is working at the intersection of the life sciences and computer science is Ben Fry. Ben is truly a renaissance man who works at the intersection of art, science and technology. He’s a role model for others coming out of the MIT Media Lab and a poster child for why the Lab is so essential. Cross-disciplinary application of information technology to applications in science is perhaps the most value-creating activity that smart young people can undertake over the next 20 years. (At least it’s infinitely better than going to Wall Street and making a ton of dough for a bunch of people that already have too much money). 

Time to Step Up IT for Scientists 

But ambitious and visionary information technology projects that are focused on the needs of scientists are too rare. I think that we need to do more to help scientists  and I believe that consumer systems as well as traditional business applications also would benefit radically. 

As a technologist and entrepreneur working in the life sciences for the past 10+ years, I’ve watched the information technology industry spend hundreds of billions of dollars on building systems for financial people, sales people, marketers, manufacturing staff, and administrators. And, more recently, on systems that enable consumers to consume more advertising and retail products faster. Now the “consumerization of IT” that drove the last round of innovation in information technologies  Google, Twitter, Facebook, and so on  is being integrated into the mainstream of corporate systems. (In my opinion, however, this is taking 5 to 10 times longer than it should because traditional corporate IT folks can’t get their heads around the new tech.) 

Meanwhile, scientists have been stuck with information technologies that are ill-suited to their needs  retrofitted from non-science applications and use cases. Scientists have been forced to write their own software for many years and develop their own hardware in order to conduct their research. My buddy Remy Evard was faced with this problem  primarily in managing large-scale physics information  while he was the CIO at Argonne National Labs. Now he and I share the problem in our work together at the Novartis Institutes for Biomedical Research (NIBR). 

When I say “systems,” I am talking about systems that capture the data and information generated by scientists' instruments and let scientists electronically analyze this data as they conduct their experiments and also ensure the “repeatability” of their experiments. (Back to the value of provenance). With the right kind of systems, I believe we could: 
  • Radically increase the velocity of experimentation. Make it much easier for scientists to do their experiments quickly and move on to the next experiment sooner 
  • Significantly improve the re-usability of experimentation – and help eliminate redundancy in experiments 
  • Ensure that experiments – both computational and wet lab – can be easily replicated 
  • Radically improve the velocity of scientific publication 
  • Radically improve the ability of peers to review and test the reproducibility of their colleagues’ experiments 
Essentially, with radically better systems, we would vastly improve the productivity and creativity of science. I also believe there would be immeasurably large benefits to society as a whole  not only from the benefits of more effective and efficient science, but also as these systems improvements are applied to consumer and business information systems. 

How We Got Here 

So, what’s holding us back? 

Scientific applications are highly analytical, technical, variable and compute-intensive – making them difficult and expensive to build. 

Scientific processes don’t lend themselves to traditional process automation. Often the underlying ambiguity of the science creates ambiguity in the systems the scientists need, making development a real moving target. Developers need to practice extreme Agile/Scrum methods when building systems for scientists –traditional waterfall methods just won’t work. 

Developers need to treat the core data as the primary asset and the applications as disposable or transitory. They must think of content, data and systems as integrated and put huge effort into the management of meta-data and content. 

Great systems for scientists also require radical interoperability. But labs often operate as fiercely independent entities. They often don’t want to share. This is a cultural problem that can torpedo development projects. 

These challenges demand high-powered engineers and computer scientists. But the best engineers and computer scientists are attracted to organizations that value their skills. These organizations have traditionally not been in life sciences, where computer scientists usually rank near the bottom of the hierarchy of scientists. 

When drug companies have a few million dollars to invest in their lab programs, most will usually choose to invest it in several new new chemists instead of an application that improves the productivity of their existing chemists. Which would you choose? Some companies have just given up entirely on information systems for scientists.

So, scientists are used to going without – or they resort to hacking things together on their own. Some of them are pretty good hackers [watch]. But clearly, hacking is a diversion from their day jobs – and a tremendous waste of productivity in an industry where productivity is truly a life-and-death matter. 

No More Excuses 

It’s also completely unnecessary, given the changes in technology. We can dramatically lower the cost and complexity of building great systems for scientists by using Web 2.0 technologies, HTML5, cloud computing, software-as-a-service application models, data-as-a-service, big data and analytics technologies, social networking and other technologies. 

For example, with the broad adoption of HTML5, apps that used to require thick clients can now be built with thin clients. So we can build them more quickly and maintain them more easily over time. We can dramatically lower the cost of developing and operating flexible tools that can handle the demanding needs of scientists. 

Using Web technologies, we can make scientific systems and data more interoperable. With more interoperable systems based on the Web, we can capitalize on thousands of scientists sharing their research and riffing off of it – whether inside one company or across many. There’s tremendous benefit to large scale in the scientific community – if you can make it easier for people to work together. 

Many of my friends have been asking me why I’ve been spending so much time at the Novartis Institute for Biomedical Research (NIBR) over the past three years. The simple answer is that I believe that building systems for scientists is important  and I find it incredibly rewarding. 

One of the lessons I learned while we were starting Infinity Pharmaceuticals was that while scientists needed much better systems, small biotech companies didn’t have the resources to build these systems. So, every time we looked at spending $1 Million on a software project at Infinity, we always decided it was better to hire another five medicinal chemists or biologists. The critical mass required to build great systems for scientists is significant -- and arguably even exceeds the resources of a 6,000+person research group like NIBR. The quality of third-party solutions available for scientists and science organizations such as NIBR are pitiful: there have been only a handful of successful and widely adopted systems, Spotfire being the most notable example.  Scientists need better 3rd party systems and delivering these systems as web services might finally provide a viable business model for "informatics" companies.  Two great new examples of these types of companies are Wingu and Syapse - check them out.  

Calling All Cambridge Technologists 

So here’s my challenge to the technology industry: How about investing $3 to $4 billion on systems for scientists? And have the Cambridge Innovation Cluster take the lead? 

If you work in Kendall or Harvard Squares, you can’t throw a Flour Bakery & CafĂ© sticky-bun without hitting a scientist. It’s one huge research campus with millions of square feet of laboratory space and scientists everywhere. 

Get engaged with scientist-as-the-end-user – they’ll welcome you, trust me. Build something in one of these labs. If you build something compelling in a thoughtful way, it’s going to be noticed, sought and adopted by others. 

Since Cambridge has one of highest concentrations of scientists in the world, technologists here should focus on building systems for scientists. I’m betting that they can do it better than anyone else in the world. 

What do you think?