Narrow the evidence gap in patient care with faster access to real-world evidence

How data federation and data products can improve the curation, democratization, and application of real-world evidence data to provide near-real-time insights for payers, providers, and life sciences organizations

Last Updated: April 15, 2024

Would it be concerning to discover that many clinicians are resigned to making critical patient-care decisions based not on real-world evidence (RWE) based on data but on a handful of anecdotal observations, personal experiences, and gut feelings?  

This is not a new concern. In 2015, the National Academy of Medicine set a goal for 90% of clinical decisions by 2020 to be supported by accurate, timely, and up-to-date clinical information reflecting the best available evidence. However, while randomized control trials (RCTs) remain the gold standard for generating clinical evidence, as of 2022, only 65% and as little as 14% of clinical decisions are supported by RCTs for some medical disciplines. 

This isn’t surprising since gathering clinical data across the Healthcare and Lifesciences industry through formal clinical trials is very expensive and time-consuming. Running an RCT from phases 1 to 3 is estimated to cost about $364 million USD based on 2014 figures adjusted for inflation and requires 6.4 years. As the shift to precision medicine continues, the applicability of RCT data remains limited.  

As such, organizations require the integration of other sources of real-world data to build up clinical repositories of evidence that can be used to support clinical decisions and innovations for patient point of care, as well as ongoing drug discovery and development. 

This blog explores capabilities like data federation and data products and how they can improve the curation, democratization, and application of real-world evidence data to provide near-real-time insights for payers, providers, and life sciences organizations. 

What is the availability of real-world evidence?

Digitization of healthcare data is becoming more prevalent and key across the industry. Industry consolidation and regulations driving the digitization and standardization of electronic medical records (EMR) have increased, delivering a wealth of real-world clinical data that can be referenced for evidence-based medicine.  The Mayo Clinic Platform, for example, improves care delivery through insights and knowledge derived from the de-identified data of 10 million patient records, which includes laboratory values, diagnosis codes, vital signs, prescription medications, and clinical notes. This data is necessary to enhance the effectiveness of clinical data consults and provide highly personalized patient care plans.

The availability of data from patient claims and pharmacy prescriptions offers further insights into patient care for research and clinical evidence. Institutional data on clinical trials, biobanks, and other clinical data warehouses provide readily available third-party datasets for providers, payers, and life sciences organizations.

However, business, technology, and data privacy and security concerns make consolidating different data sources into a single location impossible.  The need for a single point of access to RWE data is incredibly compelling, and many organizations are already on the journey to tackle the evidence gap in patient care and clinical development. 

The evidence gap in patient point of care

Less than 20% of patients are linked to a standard care guideline in many therapeutic areas, with only 4% of patient care situations having guidance derived from RCTs. Clinical decisions based on research are always lacking. It is not easy for clinicians to accurately determine patterns using existing data sets across care plans to answer questions like, “What happened to other patients like mine?” This evidence gap becomes more of an issue, particularly for patients with multiple conditions, complex medical histories, and diverse ethnic backgrounds. 

Organizations such as EMIS Health, a leading health-tech company based in the UK, is closing this gap by providing clinicians access to over 120 million patient records across disparate sources across the UK healthcare ecosystem, including the NHS, to empower integrated care and clinical research.

The development of enhanced clinical data access allows clinicians to make data-driven decisions with evidence that answer point-of-care situations such as: 

  • What is the correct diagnosis? 
  • What diagnostic tests should be ordered?
  • What is the implication of this abnormal lab result or genomic marker?
  • What is the prognosis for this specific patient most likely to be?
  • What medications or other treatment modalities should be pursued, in what order, to optimize outcomes?
  • Will this procedure be worth the risk and/or cost for this patient?

The more data that becomes available through various sources of real-world evidence, the more accurately and efficiently clinicians can treat patients. 

The evidence gap in clinical development

The gap in real-world evidence is also significant when tackling patient care through faster drug development. Similar to practicing clinicians, clinical researchers can often only base their decisions while conducting clinical research on a handful of related RCT observations, personal experiences, and anecdotal evidence. This prevents many researchers from recommending the most optimal inclusion-exclusion criteria and protocols, which results in longer and more expensive clinical drug development cycles. 

Life science companies are making rapid advancements in integrating RWE into their clinical development lifecycles. GSK’s collaboration with 23andMe has been a significant example of leveraging real-world data to accelerate clinical decision-making with more accurate answers to questions such as:

  • What are the impacts of various ethnic backgrounds on the clinical trial?
  • What are the optimal demographics for the inclusion and exclusion criteria?
  • How large is the patient population size available for this trial?
  • Where should the trial be held, and who are the best clinicians to administer the trial?

Many clinical datasets are used to enhance the efficiency and efficacy of clinical trials. RWE, through diverse examples of patient claims, further informs the view of population health. 

For example, payer organizations such as Optum, enable users to securely access petabytes of Protected Health Information (PHI) datasets from many siloed data sources to create a complete Patient 360 that consistently deliver insights used for improving patient outcomes. Along with many other applications, patient demographics data can enrich evidence used in clinical drug development to inform the therapeutic impact and how to deliver access most efficiently and economically. 

Closing the evidence gap with a clinical data repository

Healthcare and Life Science leaders see the value of delivering comprehensive access to real-world data to supplement traditional RCT data for clinicians and researchers. 

For example, Stanford Healthcare was able to build a very targeted clinical data warehouse as part of their learning health system for Covid-19 patient care by dedicating over 300 hours of effort from a multidisciplinary team comprising 11 members including clinical end-users, clinical informaticists, data scientists, and EHR data specialists, with a total estimated cost of $300,000 worth of resources. This clinical data warehouse significantly accelerated Covid-19 research and the development of patient care guidelines. 

Whether it is defined as a learning health system, a clinical data warehouse, or a data enrichment engine, there are many ways organizations are developing clinical data repositories that enable the ability to analyze care delivery and learn from institutional data. 

As many organizations can attest, most of the effort in building a clinical data warehouse is the upstream data analysis, such as population definition, data extraction, and data validation, which would rely on supporting data extraction, data integrity, and data exploration capabilities. 

Therefore, leveraging a tool that can provide a single point of secure access to interrogate, combine, and publish data from disparate sources is highly valuable in accelerating the development of a clinical data repository and enabling users to leverage historical and real-world evidence frictionlessly.

Making it happen, exploration and discovery of relevant evidence data

For clinicians to use real-world evidence data, they have to be at least able to find and access the information they need in near-real time, no matter whether it’s stored on-premises, in the cloud, or multiple regions. Moreover, it becomes easier for clinicians to find data relevant to their case or project by curating data for data integrity and relevancy. 

Data exploration and discovery across disparate sources via federation are necessary to curate evidence that can be easily used in practical decision-making. You can use capabilities such as data catalogs to profile and extract metadata from disparate sources to better organize and understand available data. 


In addition to relying on metadata, when a domain expert can participate further in the hands-on exploration of data curation, the resulting clinical evidence data sets become much more relevant and accurate. As domain experts are typically not data engineers, they become much more productive given a single point of access enabled with secure data federation to allow users to interact with disparate data across different domains, geographies, regions, and cloud environments. Then they can combine and interrogate the data in whatever tool they are most comfortable with.

Secure access control to sensitive intellectual and PHI data

As a highly regulated industry, security compliance policies are another challenge preventing clinicians from accessing the wealth of real-world evidence within institutions. In most cases, it takes weeks to months of data engineering efforts behind the scenes to identify, curate, and migrate datasets while satisfying security compliance before clinicians gain necessary access to specific data sets for data exploration or to make data-driven clinical decisions.  Therefore, Role-based Access Control (RBAC) and Attribute Based Access Control (ABAC) in Data Authorization features significantly simplify security governance by allowing IT/Security teams to provide compliant access to data sets without data movement or duplication.

Furthermore, use workflows to pre-aggregate and anonymize data sets being interrogated without physically moving or storing the data outside the authorized local region to adhere to geographic compliance requirements. 

For example, Sophia Genetics provides secure access to complex multimodal data sets gathered from millions of patients through a global data-sharing network. Operating with data sourced from countries worldwide, Sophia Genetics has many regulatory and data privacy compliance constraints. With Starburst, they track these constraints through a data mesh that can provide users with a single point of access to many distributed regional clusters of data aggregation. 

Productizing data for domain-driven use cases

Once relevant data is identified, the productization of data into domain-driven analytics becomes a process that encourages the interoperability and re-usability of the underlying collection of real-world data.

Whether data resides within a data or data lakehouse, distributed data often needs to be harmonized into specific standards to be useful. This is part of the discovery and data preparation process to transform real-world data into harmonized data products. Real-world data can often be harmonized using OMOP standards, clinical trial data with SDTM, and EMR data with Fast Healthcare Interoperability Resources (FHIR).

While the data harmonization process is challenging, the vast programmatic approaches to data harmonization with data transformation engines continue to tackle improvements in this process. 

To build a culture of self-enriching positive feedback processes, clinicians and researchers need the ability to both read and create curated data products that are interoperable and reusable across specific domains and problem spaces. 

Curating and publishing both underlying sources and analytics-ready data products for specific domains allows end-users to collaborate and share data curation activities in a managed way without the chaos that comes with duplication of data across disparate data stores – which creates a whole set of challenges around management, security, costs, and freshness. 

Real-time data-driven decisions

With access to data, clinicians also need near “real-time” data. Near real-time or timely data can mean different things for different use cases, whether that’s having results within 10 secs or once a day. When leveraging technology to accelerate the time to insight, the query engine needs great performance across analytic datasets, both raw and curated. 

While many database engines can deliver fast results to already curated datasets, only a select few can shortcut the curation process to accelerate discovery and insights without the days to weeks of data pipeline development – this doesn’t factor in ongoing pipeline management and maintenance. 

Deliver the best end-user experience to analytics-ready data

As we’ve seen over and over again, end users prefer to continue using the tools that they are most comfortable with for  data analytics. For clinicians, that may mean using applications. Analysts might like interacting with visualization and dashboarding tools like PowerBI, Tableau, or ThoughtSpot. Data scientists and programmers may choose Python, R, or SAS. 

To be effective at their jobs, users should not need to learn a new skill just to traverse a clinical data repository. Instead, the delivery of data should be able to integrate with BI tools and APIs to provide end-users with an experience where they do not need to understand data engineering to make data-driven decisions. 

Conclusion: Accelerating digital transformation initiatives

The healthcare and life science industry knows about the evidence gaps that limit clinicians’ abilities to enhance patient outcomes through more accurate and informed data-driven decision-making. Leaders in the space continue to find ways to accelerate their digital transformation initiatives. They are tackling the challenges in delivering their clinicians, and researchers access to real-world evidence to enhance their impact on patients. 

An ideal initiative starts with high-level institutional support to invest in the technology and processes to make it work. This means empowering subject matter experts and clinicians to curate and re-use real-world evidence data sets and supporting the development and maintenance of the technology to accelerate the positive feedback lifecycle of data curation and productization to narrow the evidence gap in delivering patient care.

Learn more

Join the data renaissance in healthcare and life sciences

Download free eBook

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.