How does data management disrupt the status quo in drug discovery?

data management in drug discovery

Harnessing data value to manufacture next-generation medicines

Calithera Biosciences, a small immunotherapy company manufacturing drugs for cancer and cystic fibrosis in premarket development, has struggled to raise data management efficiency for years. The business was generating detailed data on hundreds of patients, with datasets covering all sorts of information, from lab test results to clinicians’ notes. As MIT Technology Review highlights, the company recently implemented Artificial Intelligence (AI) technology to manage their constant data growth and adhere to the US Food and Drug Administration regulations. Creating a protected area for regulated content, Calithera cut down on costs usually spent on compliance-related data management and secured data from ransomware attacks. 

Calithera is just one of many examples of organizations that tapped into data science to push the boundaries of clinical development. In this article, we will explore how effective data management can simplify drug discovery, allowing companies to safeguard patients’ privacy and comply with relevant laws and regulations.

Why data management matters

Pharma companies often struggle to build an accurate trajectory for developing effective and high-quality medicines. According to a recent study, the median cost of manufacturing a single commercial drug can reach $2.0 billion. This sum covers all the phases of research and development. But, vast expenses are just the beginning of the daunting challenge. The probability of developing a finished pharmaceutical product remains relatively low. A couple of years ago, prominent researchers declared that drug discovery had been in stagnation over the last couple of decades.


Learn how Avenga created an innovative drug ordering system that could automatically communicate the drug order to a pharmacy and display messages related to predetermined situations of the user. [Success story]


Drug manufacturers that have faced difficulties handling their data accurately and effectively are a dime a dozen. The focus on more sophisticated data management approaches is at the epicenter of change. That’s why drug companies are likely to pay more attention to big data and predictive analysis in the impending future. According to recent research by TechInsight360, AI spending in the healthcare and pharmaceuticals industry is predicted to increase from $463 million in 2019 to $2.45 billion by 2025.

When it comes to transforming implicit patterns into new, valuable, and, most importantly, actionable drug development plans, pharmaceutical companies still encounter hidden pitfalls. One of the major issues that can stall progress is inadequate data management. Clinical researchers can gather tens of terabytes of data in mere days. The sheer volume of accumulated data puts extensive pressure on biopharma companies because data overload often makes it difficult to accumulate new knowledge from cross-study analysis and creates difficulties in finding the software that would automate manual operations. Data that hasn’t been processed for use is nothing more than an untapped asset.

How effective data management can accelerate drug discovery

A new generation of data management platforms uses AI and has distinct advantages in discovering promising therapeutic properties. With some creative maneuvering, pharma companies can challenge the existing unpredictabilities and difficulties. Here are some of the common challenges your company can face on its way to a data-driven culture.

  • Data is siloed and not ready for AI. Data integrity has the potential to greatly transform the effectiveness of clinical trials. However, companies often find their data located in disparate places, starting from their internal archives to spaces associated with external partners. Chances are, each of these locations has unique storage practices and quality checking procedures, not to mention naming and labeling conventions.

According to a 2018 Pharma Intelligence survey, 50% of their clients use more than 5 data sources for a typical clinical trial. Below is a        graph showcasing this trend.

Data sources for a typical clinical trialFigure 1. Data sources used for a typical clinical trial by Pharma Intelligence

As staff and practices change over time, the situation becomes even more complex. Data normalization helps biopharma companies systematically move the right data to the right place at the right time, and make archived data available for new applications. Storing data in a shared multifunctional data management platform remains a necessity that takes drug discovery forward.

  • Different data modalities require new data architectures. Drug discovery companies often store precious biomedical data in the form of MR (Magnetic resonance), CT (Computed tomography), or X-rays scans. Although these data types hold keys to discoveries, automating data capture and curation at scale remains a major undertaking for many organizations. There are opportunities to organize and analyze imaging data manually. However, nearly everybody agrees that this process is time-consuming and error-prone. Thanks to advancements in technology, AI and ML can assist with algorithms curating diverse data and working with files of multiple modalities. Still, automation heavily relies on the comprehensive infrastructure that combines imagining assets and should be planned in advance.
  • Massive datasets create additional computational challenges. Pharma companies gather data from a variety of sources, especially when it comes to past and present clinical trials, and materials from third-party partners. The scope of datasets is perhaps one of the omnipresent challenges confronting biopharma companies. Nearly two-thirds of respondents in the Pharma Intelligence survey said they experienced issues when aggregating, cleaning, and transforming clinical trial data. Computational demands in drug discovery are especially sophisticated with a strong need for new cloud-scale resources and hybrid environments that support the various algorithmic workflows. Setting up data architectures that would suit massive amounts of clinical and medical imaging data is an onerous and expensive task.

Discover Avenga’s tailored solution in query tracking, internal optimization, and drug testing that was delivered to a top pharmaceutical company. [Success story]


  • Compliance can be seen as a burden. Drug discovery companies face a high bar of regulatory requirements while working with sensitive biomedical data. In addition, with the increase in remote work and collaboration via individual networks, organizations also face challenges in cybersecurity, so adhering to strict government protocols is often a top priority for manufacturers.

The below graph by Pharma Intelligence shows 58% of its survey respondents were not confident in the quality or completeness of their clinical data from an audit and compliance perspective.

confidence in the quality of dataFigure 2. The level of survey participants’ confidence in the quality of data from the perspective of data auditing and regulatory compliance by Pharma Intelligence

As the majority of respondents have little trust in the quality of data, the need for more effective data management strategies arises.

The future of drug development

Improved data management strategies allow tremendous progress in clinical developments as the demand for more efficient ways of analyzing massive datasets will only increase in the foreseeable future. What’s more, biopharma companies will capitalize on digitalization and manage clinical trials remotely. New data management tools will be based on Deep Learning (DL), Machine Learning (ML), and Natural Language Processing (NLP), which will serve as a foundation for more effective digital infrastructures. According to a recent Deloitte report, most AI startups working on biopharma R&D are currently focused on the drug discovery stage of the process. This advancement can result in more efficient drug approval rates, reduced development costs, and enhanced patient outcomes.

The explosive growth of data increases the burden on companies to have sufficient storage space and to remain cost-efficient. As one of the industry’s leading forces, AI will have a disruptive impact on drug development, enabling manufacturers to centralize clinical development and accumulate valuable datasets. 

The graph below illustrates the digital technologies that will challenge the status quo in drug development from the short- and long-term perspectives.The future of drug development

Figure 3. AI-assisted technologies that are predicted to disrupt drug discovery in the next decade by the Deloitte Centre for Health Solutions

This graph from the 2020 Deloitte report highlights the continuous enhancement of data management practices, with automated data capturing, integration, and sharing, as well as workflow automation, that will be changing drug discovery in the next 5 years.


In addition to driving digital transformation in drug discovery, AI and ML are revolutionizing drug safety. [Discover more]


Data management in drug discovery is on the brink of large disruption, with innovative data environments catalyzing digital transformation. Yet, numerous challenges, including poor data organization, different data modalities, new computational issues, and strict regulatory requirements, need to be adequately addressed. Adopting effective data management strategies will mean that biopharma companies can stay compliant with relevant laws and regulations and accelerate drug discovery in mere years.

Don’t hesitate to get in touch with us to discover how your organization can leverage the potential of new data management strategies.

Back to overview