Showing posts with label Life Sciences. Show all posts
Showing posts with label Life Sciences. Show all posts

Tuesday, November 27, 2012

Open Patient Consent and Clinical Research Data

Overcoming Lingering Concerns

In my last several posts, I wrote about how crucial it is to increase the liquidity of clinical research data  particularly clinical trial data  and how we can achieve improved data liquidity with patient-centric systems and software.

But there’s a remaining  and not insignificant  challenge:  patient consent.

Before sharing their health information, people want to know it’s going to be secure and beneficial to do so.  Until you’ve got people who are willing to share their data, it’s tough to justify the investment in building secure systems. A classic chicken-and-egg problem.

This is a very similar dynamic to e-commerce back in the 1990s.  People were afraid to enter their credit card numbers into a Web site. Many people were even saying that no one would ever trust personal financial data to the Web. Today, you can take a picture of a check with your iPhone, deposit the check electronically and throw the physical check away.  The convenience of electronic financial transactions via the Web far outweighed the security risks (both real and perceived).  

I believe that we’re going through a similar transition with electronic health data that we went through with personal financial data back in the 1980s and 1990s.  It may take a decade or two for people to be comfortable sharing their anonymized and aggregated medical information to benefit research...but maybe not.  In my humble opinion, the benefits of portability of our medical information now far outweigh the security risks/concerns.  

Some of the benefits are very pragmatic:
  • Portability: When you switch doctors, you can bring your medical history with you electronically
  • Accessibility:  When you have an emergency and the ER team needs your history immediately and will want to search for all your allergies quickly
  • Reference: When your doctor asks you when you had your last immunizations
You might recall that some high-profile early efforts at personal patient record systems failed ― specifically, Google Health and Microsoft HealthVault.  I think of these not as failures, but as invaluable experiments that helped us all learn what works and doesn’t work in managing and sharing health data securely and efficiently.  And perhaps most importantly, these experiments began to socialize the ideas of medical information being represented electronically and of patients owning their medical records. 

Meet Project Green Button

Project Green Button is an important experiment in creating data liquidity and sharing medical information.
You may have heard of Project Blue Button.  This was a fantastic project launched by the US Department of Veterans Affairs, enabling VA patients to download a copy of their own health data from the VA’s systems  by clicking on a Blue Button.  This is a really great example of empowering patients to own their own medical data and improving the portability of their medical information.  

Now let’s think about it the other way around.

Research has suggested that if people were presented with a Green Button (see picture above) that provided a single-click way to share their data with researchers, more than 80% of people would press it.  My close friend and trusted colleague John Wilbanks gave a great TED Talk about this a few months ago and has been doing ground-breaking work on open consent, the philosophy behind the Green Button.

I’m very hopeful that John will be successful in his mission to empower patients to share their own data to benefit research – after all, it is their data  and that most people, when asked, will be willing to share their anonymized information to benefit research.  
The team at the LIVESTRONG Foundation, led by Director of Evaluation and Research Ruth Rechlis-Oekler, Ph.D,  did a great study a few years ago about cancer patients’ and survivors’ willingness to share their information in the interest of improving research.  The results were compelling: the majority of patients and survivors WERE willing to share their de-identified and aggregated health information with researchers in the interest of improving health care for others. 

So, the willingness is there.  All we are missing are the systems to enable this. And the time to create those systems (similar to what I described in my last blog post) is NOW.   One of the most interesting companies working in this area is Avado; founder Dave Chase is one of the thought leaders in this space and has fantastic vision for where the industry needs to go over the next 10 years.  

Wanted: A Trusted “Zone”

In order to prime the pump of online personal health data, we need patient-controlled solutions and a trusted zone where we can connect patients securely with their data. This trusted zone would be a place where:
  • Each patient has a “dashboard” through which he can get access to all data about his own health, regardless of where the referential data is located.
  • Each patient can determine whether and with whom to share pieces of his health data – with his family doctor, relatives or loved ones.
  • Each patient can track his own health, and record and track his own experiences.
The result would be something that feels like a simple dashboard: a single environment, available to the patient, where he can access, share, and manage all his relevant health information.

Once we can “free the data” in a trusted environment like this, it’s possible to bolt on a whole battery of cool apps that you can’t even anticipate today  and that you don’t even have to develop yourself.

With today’s rapid app development technologies, we can build apps for physicians and for many other use cases, and simultaneously make the apps accessible via browser-based applications. 

A great example is the LIVESTRONG Cancer Guide and Tracker for iPad, which collects and combines patient-reported data for patients living with cancer.  We currently have a pilot project under way to connect the Tracker app with traditional clinical systems and a secure Web-based application, enabling the patient and his doctor to collaborate more effectively during office visits.  We're using the fantastic SMART Platform developed at Children's Hospital in Boston under the leadership of Zak Kohane and Ken Mandl. 
A Call to Action

So, here are a few action items:

First, we in the biopharmaceutical industry should ask ourselves:  “What data can we ‘free up’ about our products and our studies to better support patients and physicians?"  Let's start with the basic inclusion/exclusion criteria for our existing studies that we already share with our clinical partners.  As I mentioned in my previous post, if we would just increase the liquidity of this data in the biopharmaceutical industry, we could have a HUGE impact on the efficiency of research through the ability for the right patients to find the right studies at the right time.  

Second, we should support the idea of open consent by supporting the Consent to Research project (  Let's all give the flexibility to patients to support research with their own medical information and, in the process, radically improve the efficiency of the healthcare system by simplifying the complex spider web of consents that our dysfunctional healthcare system has created.  The current system of consents does not protect patients, but rather confuses them.

This blog post came out of a presentation that I recently delivered at Rev Forum, a conference sponsored by Lance Armstrong’s LIVESTRONG Foundation and Genentech. I have worked with LIVESTRONG and various biopharmaceutical companies on new health care information products and apps that take advantage of data liquidity to help patients combat cancer and other difficult diseases.

Monday, November 19, 2012

Better Systems for Clinical Data Collaboration

Innovation in Systems and Software

In my last two posts, I wrote about the need for more liquidity in clinical research data.  As a foundation for sharing this new more-liquid clinical research data, we need more patient-centric systems, where patients can create, consume and maintain relevant medical information.

However, in the average hospital, most patient data is generated and organized in the clinic, and typically stored in a variety of different legacy hospital systems.  Pretty illiquid by definition. Therefore, we need fresh approaches to sharing that data across hospital systems   and then across multiple hospitals.  

Fortunately, innovation in systems and software is beginning to happen on this front.

Several organizations  including Dana-Farber Cancer Institutethe LIVESTRONG Foundation, and Boston Children’s Hospital  are working to build a reference implementation.  This is a technology model that would describe how a hospital could publish the information contained in its systems easily, securely and efficiently to other institutions. 

Another very interesting technology is the SMART Platform, developed under the leadership of Dr. Zak Kohane and Dr. Ken Mandl at Children’s Hospital in Boston.  The SMART Platform and i2b2 Analytical tool are truly a step in the right direction.   

These new technical approaches provide the ability to build cool new apps very quickly – apps that combine data reported by the patient in an interface like the LIVESTRONG Cancer Guide and Tracker iPad app alongside data collected by traditional clinical systems. Using the same approach, we can build apps for physicians and many other use cases and simultaneously make the apps accessible via browser-based applications.

Let’s think about what else we can do as an industry  and encourage many people and companies to start writing new apps quickly.  

In my next post, I’ll talk about big remaining challenge to the liquidity of clinical research data – patient consent – and how we win patient trust and cooperation.

This blog post came out of a presentation that I recently delivered at Rev Forum, a conference sponsored by Lance Armstrong’s LIVESTRONG Foundation and Genentech. I have worked with LIVESTRONG and various biopharmaceutical companies on new health care information products and apps that take advantage of data liquidity to help patients combat cancer and other difficult diseases.

Tuesday, November 13, 2012

Improving Data Liquidity in Clinical Research

Empowering Patients and Doctors

In my last post, I wrote about the need for data liquidity in clinical research – and the need for biopharmaceutical companies and healthcare institutions to take the lead by freeing up data about their studies, clinical trials and drugs.

To accelerate clinical research efforts for diseases like cancer, we need two things.  First, stakeholder institutions (like biopharmaceutical companies and healthcare institutions) need to free up their data.  Second, we need new systems and software that can share and manage that data – securely and at scale – across our complex healthcare ecosystem. 

Start-ups are being formed every day that are pushing the envelope on applications and technologies that take advantage of health care data liquidity. In Boston/Cambridge alone, we have three start-up incubators dedicated to health information technologies, each with 10+ start-ups.  That’s more than 30 top-tier, vetted start-ups focused on health information technologies being developed at any given point and time. For example, check out:
Pharma companies have long been proponents of “free your data”  as long as it means freeing data from patients, from claims clearinghouses, from pharmacies and so on.  Pharma companies have been less enthusiastic about freeing their own data, about their studies, clinical trials and drugs.  But the logjam is starting to break.

As I mentioned in my previous post, recently GlaxoSmithKline (GSK) publicly announced it would make detailed data from clinical trials available to researchers.  The company’s detailed patient data (data that forms the basis of trials of approved drugs as well as discontinued investigational drugs) would be made accessible to researchers.  Researchers’ requests for access will be reviewed by an independent panel of experts, and the patient data will be anonymous.
Last year, one of the large biopharmaceutical companies  in collaboration with the electronic health record company Cerner  began an initiative to build an open interface that will enable sponsors of clinical research studies such as Novartis, Genentech, GSK or Pfizer to publish inclusion/exclusion criteria about their clinical studies to specific clinical partners – in much greater detail than what’s available on the National Institutes of Health's today.  

The goal of this project is to create a electronic mechanism to ensure that all eligible patients are identified for appropriate studies via their doctors.  This mechanism would be able  without changing any data privacy – to flag patients’ records when they are diagnosed or when new studies or updates to studies become available; and to dynamically notify doctors and match patients with new or evolving studies based on patients' clinical profiles and the inclusion/exclusion criteria of the trial.

Under this new electronic standard for study information, study eligibility criteria are expressed in a standardized, machine-readable format. Any EHR system can ingest the study data automatically as new studies are created and as existing studies change.  With this more-liquid data, providers can then match the inclusion/exclusion criteria against the health records in their systems.

A provider configures its systems to flag records of potentially qualifying patients using a form of research-study-recommendation engine. The provider runs and tunes this engine so that the next time a clinician pulls up a patient’s health record (or the patient’s health record changes), the EHR system will suggest to the doctor that his patient may qualify for a clinical trial – both local and not-so-local trials (think truly global patient recruitment with little or no extra effort). 

In addition, when a new trial is published, doctors with patients can be notified that there is a new trial of possible interest and eligibility for specific patients.

Through this type of simple standard for study information exchange – one that empowers doctors and is run by providers – doctors and patients could be automatically made aware of trials regardless of where the study is being run or when a new study starts.

Using this simple standard, Cerner has worked with various large biopharmaceutical companies to build and test end-to-end Proofs of Concept   in the process successfully demonstrating that this approach can work very well with relatively little extra effort on the part of the study sponsors, the providers or the biopharmaceutical companies.   There is no additional risk of information privacy, since these criteria have been published previously to the providers, and the patient data does not have to be shared at all. The standard just enables getting better data on studies to providers in a more targeted way.

In short, starting to improve liquidity of clinical research data just requires leadership from pharma companies and cooperation from health care providers to prime the pump and adopt  standards – and perhaps the encouragement of trusted brokers such as LIVESTRONG to bring the parties together. 

Increasing the liquidity of data in ways like this could improve enrollment in studies, especially for rare diseases with small patient populations. Doctors and patients would have a proactive monitoring system that reminds them about all relevant research studies – especially new studies in rare indications – regardless of geography or the distractions in their daily lives.

This blog post came out of a presentation that I recently delivered at Rev Forum, a conference sponsored by Lance Armstrong’s LIVESTRONG Foundation and Genentech. I have worked with LIVESTRONG and various biopharmaceutical companies on new health care information products and apps that take advantage of data liquidity to help patients combat cancer and other difficult diseases.

Wednesday, November 7, 2012

The Need for Data Liquidity in Clinical Research

Much of the content in the next few posts was developed jointly with my close friend and trusted colleague Joris Van Dam.  Joris is truly a superstar and is doing fantastic work around the world related to eHealth and improving the liquidity of data in healthcare.  

In the course of creating new drugs and therapies, organizations in the biopharmaceutical and healthcare industries amass huge amounts of clinical trial data.  Unfortunately, much of that data remains locked up in individual IT systems, making key data unavailable to the many of the participants in clinical trials: physicians and their patients.  As our society intensely seeks cures for cancer and other diseases, this is nuts – and completely unnecessary.  It's time for a change.  The data required to empower researchers can be shared securely and appropriately.  

In the Information Age where we can do so much via the web, our smartphones, our iPads and the "Cloud," we shouldn’t accept word of mouth as the best tool for patients to find the right studies.  It’s clearly not.  Nor should we accept data illiquidity as an obstacle to timely, broad availability of information about clinical trials. 

The Story of Melissa

Meet Melissa.  Melissa isn’t her real name and that isn’t her real picture above, but this is a true story.

Earlier this year Melissa was diagnosed with one of the deadliest forms of cancer.  Despite her predicament, Melissa is one of those patients (like many of those involved in the LIVESTRONG community) who decided to take an active interest in her own care. 

She wanted to proactively explore any and all kinds of treatment opportunities, including experimental treatment in a clinical research study.

Melissa was smart enough to understand that clinical trials offer no guarantee of improving her condition, let alone a cure.  But the opportunity to participate in a clinical trial would give her hope. It would give her the ability to fight, and the satisfaction, that through her disease, she might be able to contribute to better treatment and ultimately perhaps even a cure, if not for herself then potentially for others like her. It would give her the feeling that her pain and suffering mattered and that she could make a difference in the world.

Melissa believed that it was important for her to have the opportunity to join a clinical research study. So she spent a lot of time on the Internet, educating herself about her disease and treatment opportunities. One day, using the U.S. National Institutes of Health’s, she found a study for which she appeared to qualify  one being run not too far from her home.

Unfortunately, didn’t list the name or contact number for the investigative site.  It just said that it was a  study being run by a large biopharmaceutical company and that she could call the main switchboard number. She called and they really couldn’t help her  so Melissa was stuck.  Next, she turned to a clinical-trials matching web site and asked if there was anything they could do to help her.

Fortunately, by chance the team at the research group of the large biopharmaceutical company happened to know the person who runs that matching web site. That person connected Melissa with someone who offered to help coordinate.

The folks at the biopharmaceutical company went into their clinical trials database to identify the study manager. Then, they contacted the study manager, who went to the matching web site asking if it was okay to share their contact details with Melissa.  A few weeks and many phone calls later, Melissa finally had a screening appointment at the clinic. Within a short time Melissa  through her own perseverance and a lot of luck  was screened and enrolled in the study.

Now the shocker in this story is that... 

....the study investigator was Melissa’s own doctor!  

This was the very doctor who had diagnosed Melissa just a few months earlier.

It’s tempting to think that this doctor dropped the ball.  But in fact he hadn’t. He’s an extremely competent and compassionate physician, not to mention a great study investigator.  He just had a lot going on, and the timing of the start of the trial had been off a bit with the timing of Melissa’s diagnosis. 

It might also be tempting to say that the biopharmaceutical company was at fault for not listing the investigator’s contact details on But the clinic in question is based in Europe, where regulations are such that pharmaceutical companies have to obtain explicit consent from each individual investigative site before its contact details can be listed on  The company just hadn’t dealt with all that red tape yet and there are no systems set up for information to flow more easily.

So, this situation was no one’s fault in particular, but rather a matter of circumstances, bad timing and the lack of data liquidity.

Time for Big Pharma and Healthcare Institutions to Step Up to the Plate was an important milestone and a catalyst when it was launched. On the back of emerged a slew of applications that help patients and doctors navigate the data, and find studies that are particular to a condition.

These include:
More recently, there are new smartphone apps such as TrialX and CoActive.

This is what data liquidity is all about:  Making data appropriately available to encourage an ecosystem of applications that help patients and physicians  and ultimately help drive down the cost of health care and improve outcomes. 

But was launched 12+ years ago ― 7 years before the first iPhone.  We now need to go much further and faster in liberating clinical research data, and I believe that the large biopharmaceutical companies and healthcare institutions have the opportunity to take the lead. 

GlaxoSmithKline (GSK) recently announced that it will open up access to its clinical trial data as appropriate to support open collaboration among researchers.  (You can read more about this decision in this Wall Street Journal article here.)

To accelerate clinical research efforts, we also need new systems and software that can improve liquidity of clinical trial data across the complex healthcare ecosystem.

By increasing the liquidity of clinical trial data this way, we could both improve the lives of patients and reduce overall health care costs. 

The information technologies exist.  Attitudes toward information-sharing are changing.   And cost reduction and better outcomes are compelling motivators. 

In my next posts, I'll talk about some specific initiatives that could have a big impact on the liquidity of data in the healthcare industry as well as a number of issues related to consent, where my great friend John Wilbanks is leading the charge. Check out his TED talk.

This blog post came out of a presentation that I recently delivered at Rev Forum, a conference sponsored by Lance Armstrong’s LIVESTRONG Foundation and Genentech. I have worked with LIVESTRONG and various biopharmaceutical companies on new health care information products and apps that take advantage of data liquidity to help patients combat cancer and other difficult diseases.

Tuesday, April 10, 2012

Building New Systems for Scientists

Three Good Places to Start

In my last post,  I wrote about the need to build new data and software systems for scientists – particularly those working in the life sciences.  Chemists, biologists, pharmacologists, doctors and many other flavors of scientists working in life sciences/medical research are potentially the most important (and under-served) technology end-user group on the planet.  (If you are looking for a great read on this, try Reinventing Discovery : The Age of Networked Science.)

There are many ways to break down what needs to be done. But here are three ways to think about how we can help these folks:
  1. Support their basic productivity    by giving them modern tools to move through their information-oriented workdays
  2. Help them improve their collaboration with others    by bringing social tools to science/research
  3. Help them solve the hardest, most-challenging problems they are facing    by using better information/technology/computer-science tools
While all three of these are worth doing, #3 is the most tempting to spend time on, as the problems are hard and interesting intellectually.  

A great example in the life sciences currently is Next Generation Sequencing/Whole Genome Sequencing, where (with all kinds of caveats) one instrument generates about 75TB of data a year.  The cost of these instruments has dropped by an order of magnitude over the past five years, resulting in many decentralized labs just going out and buying them and getting busy with their own sequencing.  

One of the many challenges that this poses is that often the labs don't realize the cost of the storage and data management required to process and analyze the output of these experiments (my back of the envelope is at least 2X-3X the cost of the instrument itself). The worst case scenario is when the scientists point these instruments at public file shares and find those file shares filling up in a small number of days  to the dismay of other users who want to use the file shares.  Don't laugh: it's happening every day in labs all around the world .  

The drop in cost-per-genome makes Moore's Law look incremental.

Anyway, there are hundreds of interesting scientific problems like NGS that require equally advanced computer science and engineering.  One effort towards providing a place where these things can develop and grow in an open environment is OpenScience.

But I'd like to focus more on #1 and #2 – because I believe that these are actually the problems that, when solved, can deliver the most benefit to scientists very quickly. They don't depend on innovation in either the fundamental science or information technology. Rather, they depend primarily on end users and their organizations implementing technologies that already exist and   if deployed with reasonable thought and discipline   can have a huge positive impact on scientific productivity.  

Here are a few examples.

Collaboration Tools

Social networking, Wikis, Google Hangouts [watch] and other networking technologies make it easy and inexpensive to create more-flexible collaboration experiences for scientists.  These aren’t your father’s collaboration tools – the kind of process-automation gizmos that most technology companies produced for businesses over the past 30 years. Rather, these new tools empower the individual to self-publish experimental design and results and share  with groups of relevant and interested people  early and often. Think radically more- dynamic feedback loops on experimentation, and much more granular publishing and review of results and conclusions.  

Much of what scientists need in their systems is collaborative in nature.  If you are a researcher working in either a commercial organization or academic/philanthropic organization, how do you know that those experiments haven’t already been done by someone else – or aren’t being run now?  If scientists had the ability to "follow" and "share" their research and results as easily as we share social information on the Web, many of these questions would be as easy to answer as "Who has viewed my profile on LinkedIn?"  

Part of this depends on the clear definition of scientific "entities": Just as you have social entities on Facebook  Groups, Individuals, etc.   scientists have natural scientific entities. In the life sciences, it's the likes of Compounds, Proteins, Pathways, Targets, People, etc.  If you do a decent job of defining these entities and representing them in digital form with appropriate links to scientific databases (both public and private), you can easily "follow compound X." This would enable a scientist to not only identify who is working on scientific entities that he's interested in, but also fundamentally to stand on the shoulders of others, avoid reinventing  the wheel, and raise the overall level of global scientific knowledge and efficiency by sharing experiments with the community.  

Two start-ups that I mentioned in my earlier blog post  Wingu and Syapse  are creating Web services that enable this kind of increased collaboration and distribution of research in the pharma industry.  Many large commercial organizations are attempting to provide this type of functionality using systems such as SharePoint and MediaWiki.  Unfortunately, they are finding that traditional IT organizations and their technology suppliers lack the expertise to engage users in experiences that can compete for attention relative to consumer Internet and social tools.  

I've watched this dynamic as various research groups have begun to adopt Google Apps.  It's fascinating because companies and academic institutions that adopt Google Apps get the benefit of all the thousands of engineers who are working to improve Google's consumer experience and just apply these same improvements to their commercial versions  truly an example of the "commercialization of enterprise IT" (and credit to my friend Dave Girouard for the great job he did building the Enterprise group at Google over the past eight years).  

One of the ways that Google might end up playing a large and important role in scientific information is due to the fact that many academic institutions are aggressively adopting Gmail and Google Apps in general as an alternative to their outdated email and productivity systems. They have skipped the Microsoft Exchange stage and gone right to multi-tenant hosted email and apps. The additional benefit of this is that many scientists will get used to using multi-user spreadsheets and editing docs collaboratively in real time instead of editing Microsoft Office documents, sending them via email, saving them, editing them, sending them back via email, blah...blah...blah.

If companies aren't doing the Google Apps thing, they are probably stuck with Microsoft - and locked into the three-five year release cycles of Microsoft technology in order to get any significant improvements to systems like SharePoint.  After a while, it becomes obvious that the change to Google Apps is worthwhile relative to the bottleneck of traditional third-party software release cycles. Particularly for researchers, for whom these social features can have a transformational effect on their experimental velocity and personal productivity.  

Another example of how this dynamic is playing out is seen in the dynamic between innovators Yammer and Jive vs. Microsoft SharePoint. This is a great example of how innovators are driving existing enterprise folks to change, but ultimately we'll see how the big enterprise tech companies  Microsoft, IBM, etc.  can respond to the social networking and Internet companies stepping into their turf. And we'll see if Microsoft can make Office365 function like Google Apps. But if Azure is any indication, I'd be skeptical.  

Open-Source Publishing

First - thank you Tim Gowers for the post on his blog - all I can say is YES!

In my opinion, the current scientific publishing model and ecosystem are broken (yet another topic for another post). But today new bottom-up publishing tools like ResearchGate let scientists self-publish their experiments without depending on the outdated scientific publishing establishment and broken peer-review model.  Scientists should be able to publish their experiments – sharing them granularly instead of being forced to bundle them into long-latency, peer-reviewed papers in journals. Peer review is critical, but should be a process of gradually opening up experimental results and findings through granular socialization.  Peer review should not necessarily be tied to the profit-motivated and technically antiquated publishing establishment.  I love the work that the folks at Creative Commons have done in beginning to drive change in scientific publishing.   

One of the most interesting experiments with alternative models has been done by the Alzheimer Research Forum, or Alzforum. The set of tools known as SWAN is a great example of the kind of infrastructure that could easily be reused across eScience.   It makes sense that in the face of the huge challenge represented by treating Alzheimer's, people   patients, doctors, scientists, caregivers, engineers  would work together to develop the tools required to share the information and collaborate as required, regardless of the limitations of the established infrastructure and organizational boundaries.  I know there are lots of other examples and am psyched to see these highlighted and promoted :)

Data As A Service

Switching to infrastructure for a second:  One of the things that scientists do every day is create data from their experiments.  Traditionally this data lives in lots of random locations - on the hard drives of scientific instruments, on shared hard drives within a lab, on stacks of external hard drives sitting on their lab benches, perhaps a database that someone has set up within a lab.  

I believe that one of the ways that we can begin to accelerate the pace of experimentation and collaboration of scientists is to enable them to put their data into a rational location for sharing their data, conducting collaborative analytics, sharing the derived results, and establishing the infrastructure and provenance required to begin producing reproducible results.  

And, as we've seen with Next-Generation Sequencing: One of the challenges of science that depends on Big Data is that scientists are traditionally conditioned to manage their data and their analytics locally. This conditioning creates problems when you have large-scale data (can't afford to keep it all locally) and when you want to collaborate on the analysis of the data (data is too big to copy around to many locations).  This also creates a challenge in terms of the ability to create the provenance required to reproduce the results.  

One of the emerging trends in database systems is the "data-as-a-service" model. Data as a service essentially begins to eliminate the complexity and cost of setting up proprietary Big Data systems, replacing this with running databases as services on scale-out infrastructure in multi-tenant and single-tenant modes.  The most high-profile example of these lately has been Dynamo, Amazon's data-as-a-service key value store.  

One of the other well-developed data-as-a-service providers is Cloudant,  which provides developers an easy-to-use, highly scalable data platform based on Apache CouchDB.  Cloudant was founded in 2008 by three physicists from MIT, where they were responsible for running multi-petabyte data sets for physics experiments running on infrastructure such as the Large Hadron Collider.  Cloudant’s product was born out of its founders’ collective frustration with the tools available specifically in context of the interest of serving science.  (Yet again an example of the needs of science driving the development of technology that benefits the rest of us.)  

One of the things that attracted me to working with the team at Cloudant was their continued interest in developing large-scale data systems for scientists and their commitment to prioritize the features and functions required by science end-users at the top of the company's priority list. 

What other pockets of inspiration and innovation are you seeing in building new systems for scientists?  Please comment.

Tuesday, April 3, 2012

It’s Time to Build New Systems for Scientists (Particularly Life Scientists)

Society and the Cambridge Innovation Cluster Will Benefit

Scientists are potentially the most important technology end-users on the planet. They are the people who are conducting research that has the potential to improve and even save lives. Yet, for the most part, scientists have been almost criminally under-served by information technologists and the broad technology community. 

Interestingly, life sciences have had a tremendous impact on computer science.  Take, for example, Object Oriented Programming (OOP), developed by Dr. Alan Kay. A biologist by training, Dr. Kay based the fundamental concepts of OOP on microbiology: “I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning - it took a while to see how to do messaging in a programming language efficiently enough to be useful).” 

It’s time to return the favor. 

Pockets of Revolution and Excellence 

Of course, there are others who are trying to advance information technology for scientists, like visionaries Jim Gray, Mike Stonebraker, and Ben Fry. 

Throughout his career, Jim Gray, a brilliant computer scientist and database systems researcher, articulated the critical importance of building great systems for scientists. At the time of his disappearance, Jim was working with the astronomy community to build the world-wide telescope. Jim was a vocal proponent of getting all the world’s scientific information online in a format that could easily be shared to advance scientific collaboration, discourse and debate. The Fourth Paradigm is a solid summary of the principles that Jim espoused. 

My partner and Jim’s close friend Mike Stonebraker started a company  Paradigm4, based on an open source project called SciDB  to commercialize an “array-native” database system that is specifically designed for scientific applications and was inspired by Jim’s work. One of my favorite features in P4/SciDB is data provenance, which is essential for many scientific applications. If the business information community would wake up from its 30-year “one-size-fits-all” love affair with traditional DBMS, it would realize that new database engines that have provenance as an inherent feature can create better audit-ability than any of the many unnatural acts they currently do with their old school RDBMS. 

Another fantastic researcher who is working at the intersection of the life sciences and computer science is Ben Fry. Ben is truly a renaissance man who works at the intersection of art, science and technology. He’s a role model for others coming out of the MIT Media Lab and a poster child for why the Lab is so essential. Cross-disciplinary application of information technology to applications in science is perhaps the most value-creating activity that smart young people can undertake over the next 20 years. (At least it’s infinitely better than going to Wall Street and making a ton of dough for a bunch of people that already have too much money). 

Time to Step Up IT for Scientists 

But ambitious and visionary information technology projects that are focused on the needs of scientists are too rare. I think that we need to do more to help scientists  and I believe that consumer systems as well as traditional business applications also would benefit radically. 

As a technologist and entrepreneur working in the life sciences for the past 10+ years, I’ve watched the information technology industry spend hundreds of billions of dollars on building systems for financial people, sales people, marketers, manufacturing staff, and administrators. And, more recently, on systems that enable consumers to consume more advertising and retail products faster. Now the “consumerization of IT” that drove the last round of innovation in information technologies  Google, Twitter, Facebook, and so on  is being integrated into the mainstream of corporate systems. (In my opinion, however, this is taking 5 to 10 times longer than it should because traditional corporate IT folks can’t get their heads around the new tech.) 

Meanwhile, scientists have been stuck with information technologies that are ill-suited to their needs  retrofitted from non-science applications and use cases. Scientists have been forced to write their own software for many years and develop their own hardware in order to conduct their research. My buddy Remy Evard was faced with this problem  primarily in managing large-scale physics information  while he was the CIO at Argonne National Labs. Now he and I share the problem in our work together at the Novartis Institutes for Biomedical Research (NIBR). 

When I say “systems,” I am talking about systems that capture the data and information generated by scientists' instruments and let scientists electronically analyze this data as they conduct their experiments and also ensure the “repeatability” of their experiments. (Back to the value of provenance). With the right kind of systems, I believe we could: 
  • Radically increase the velocity of experimentation. Make it much easier for scientists to do their experiments quickly and move on to the next experiment sooner 
  • Significantly improve the re-usability of experimentation – and help eliminate redundancy in experiments 
  • Ensure that experiments – both computational and wet lab – can be easily replicated 
  • Radically improve the velocity of scientific publication 
  • Radically improve the ability of peers to review and test the reproducibility of their colleagues’ experiments 
Essentially, with radically better systems, we would vastly improve the productivity and creativity of science. I also believe there would be immeasurably large benefits to society as a whole  not only from the benefits of more effective and efficient science, but also as these systems improvements are applied to consumer and business information systems. 

How We Got Here 

So, what’s holding us back? 

Scientific applications are highly analytical, technical, variable and compute-intensive – making them difficult and expensive to build. 

Scientific processes don’t lend themselves to traditional process automation. Often the underlying ambiguity of the science creates ambiguity in the systems the scientists need, making development a real moving target. Developers need to practice extreme Agile/Scrum methods when building systems for scientists –traditional waterfall methods just won’t work. 

Developers need to treat the core data as the primary asset and the applications as disposable or transitory. They must think of content, data and systems as integrated and put huge effort into the management of meta-data and content. 

Great systems for scientists also require radical interoperability. But labs often operate as fiercely independent entities. They often don’t want to share. This is a cultural problem that can torpedo development projects. 

These challenges demand high-powered engineers and computer scientists. But the best engineers and computer scientists are attracted to organizations that value their skills. These organizations have traditionally not been in life sciences, where computer scientists usually rank near the bottom of the hierarchy of scientists. 

When drug companies have a few million dollars to invest in their lab programs, most will usually choose to invest it in several new new chemists instead of an application that improves the productivity of their existing chemists. Which would you choose? Some companies have just given up entirely on information systems for scientists.

So, scientists are used to going without – or they resort to hacking things together on their own. Some of them are pretty good hackers [watch]. But clearly, hacking is a diversion from their day jobs – and a tremendous waste of productivity in an industry where productivity is truly a life-and-death matter. 

No More Excuses 

It’s also completely unnecessary, given the changes in technology. We can dramatically lower the cost and complexity of building great systems for scientists by using Web 2.0 technologies, HTML5, cloud computing, software-as-a-service application models, data-as-a-service, big data and analytics technologies, social networking and other technologies. 

For example, with the broad adoption of HTML5, apps that used to require thick clients can now be built with thin clients. So we can build them more quickly and maintain them more easily over time. We can dramatically lower the cost of developing and operating flexible tools that can handle the demanding needs of scientists. 

Using Web technologies, we can make scientific systems and data more interoperable. With more interoperable systems based on the Web, we can capitalize on thousands of scientists sharing their research and riffing off of it – whether inside one company or across many. There’s tremendous benefit to large scale in the scientific community – if you can make it easier for people to work together. 

Many of my friends have been asking me why I’ve been spending so much time at the Novartis Institute for Biomedical Research (NIBR) over the past three years. The simple answer is that I believe that building systems for scientists is important  and I find it incredibly rewarding. 

One of the lessons I learned while we were starting Infinity Pharmaceuticals was that while scientists needed much better systems, small biotech companies didn’t have the resources to build these systems. So, every time we looked at spending $1 Million on a software project at Infinity, we always decided it was better to hire another five medicinal chemists or biologists. The critical mass required to build great systems for scientists is significant -- and arguably even exceeds the resources of a 6,000+person research group like NIBR. The quality of third-party solutions available for scientists and science organizations such as NIBR are pitiful: there have been only a handful of successful and widely adopted systems, Spotfire being the most notable example.  Scientists need better 3rd party systems and delivering these systems as web services might finally provide a viable business model for "informatics" companies.  Two great new examples of these types of companies are Wingu and Syapse - check them out.  

Calling All Cambridge Technologists 

So here’s my challenge to the technology industry: How about investing $3 to $4 billion on systems for scientists? And have the Cambridge Innovation Cluster take the lead? 

If you work in Kendall or Harvard Squares, you can’t throw a Flour Bakery & CafĂ© sticky-bun without hitting a scientist. It’s one huge research campus with millions of square feet of laboratory space and scientists everywhere. 

Get engaged with scientist-as-the-end-user – they’ll welcome you, trust me. Build something in one of these labs. If you build something compelling in a thoughtful way, it’s going to be noticed, sought and adopted by others. 

Since Cambridge has one of highest concentrations of scientists in the world, technologists here should focus on building systems for scientists. I’m betting that they can do it better than anyone else in the world. 

What do you think?