We are familiar with a core problem facing biopharma: it typically takes about 10 years and more than $2 billion to bring a new drug to market. Only 10 percent of molecules tested in clinical trials reach the market. What's more, this timeline and cost haven’t really budged for decades.
This problem is complex and needs to be addressed from multiple angles:
Target validation, lead identification, and lead optimization: We need to fail faster and increase the probability of success.
Clinical trials: We need to get better at finding target populations, running creative designs, and simplifying logistics––and thus move more quickly.
Clinicogenomics: We need access to new kinds of data so we can better understand patients and disease.
…and more.
Key to enabling each of these changes is the democratization of data. That is, we need to make the vast quantities of data that biopharma companies have broadly and easily available to researchers.
But that’s not just a matter of wrangling data or building new tools; it will require researchers to fundamentally change the way they work to be data-first. It will require organizations to embrace being digital (not just digitizing). And it will involve every part of an organization: people and culture, platforms, and (of course) data.
In this piece, I’ll speak to these key components of the transformation to a data-first approach. But first, let’s take a step back and look at the big picture.
What can data-first R&D look like in biopharma?
Let’s start with a very visible example: in the winter of 2020, scientists mapped the genome of a novel coronavirus. Within two days, researchers had used that map to design vaccines for the virus, and within 10 months, two of those vaccines had undergone clinical trials, received emergency authorization, and were being provided to healthcare workers and high-risk individuals.
Admittedly, the COVID vaccines also benefited from regulatory tailwinds, federal funding, and unusual urgency, none of which is the norm in day-to-day R&D. But even more crucial was the data-first approach behind the scenes. This ranged from using computational immunology models to rapidly identify targets on the virus to optimizing selection of clinical trial sites with Real World Data (RWD) to identify current virus hotspots to numerous other links in the chain of vaccine development.
This approach is markedly different from what happens today in most biopharma R&D settings. While research scientists are among the most data-driven people in the world, they tend to view data first through the lens of evidence generation rather than as the primary tool to fast-forward research by rapidly identifying and prioritizing hypotheses before going into a “wet lab.” This excerpt from Nature illustrates the difference:
“…Peter Howley, of Harvard University, … asked why certain strains of human papillomavirus are associated with cervical cancer and others are not. Bioinformatics data showed that equivalent proteins in different human papillomavirus strains bind to a different constellation of host proteins in the cell. Howley hypothesized that these differences could explain why some strains cause cancer. It allowed Howley and other immunologists to hone their hypotheses more finely before diving into expensive and time-consuming ‘wet lab work.’”
This success, like the success of COVID vaccines, was built on a foundation. The scientists were fluent in using data-first techniques (and in the latter case had even helped develop some of them). The computational models had been developed and evolved over a number of years. Pipelines existed that were already collecting and harmonizing massive datasets.
Digital-native biotechs (like the relative newcomers BioNTech and Moderna) tend to be ahead of traditional pharmas in building this foundation. But traditional pharma has an advantage or two as well, especially in the volumes of data they already have.
Let’s dive into what the transformation to a data-first biopharma R&D setup looks like. First, the foundation necessary for transformation.