Boxes - Unlocking the promise of UK health data

Box 4: Investments by the global biopharmaceutical industry in health data resources

In 2018, GSK invested in US company 23andMe and its database of genetic and phenotypic data.[9]

In 2015, Roche acquired a majority stake in molecular and genomic analysis business Foundation Medicine,[10] and took full ownership in June 2018.[11]

In 2012, Amgen acquired DeCODE Genetics which held genetic and clinical data on the Icelandic population. In June 2019, DeCODE announced a major collaboration with US-based healthcare delivery network Intermountain Healthcare, which aims to analyse the genomes of 500,000 people from Intermountain’s

Box 5: Guiding principles on the NHS’s uses of health data – our perspectives

Principle (July 2019 iteration)35 Our perspectives

Any use of NHS data, including operational data, not available in the public domain must have an explicit aim to improve the health, welfare and/or care of patients in the NHS, or the operation of the NHS. This may include the discovery of new treatments, diagnostics, and other scientific breakthroughs, as well as additional wider benefits.

Where possible, the terms of any arrangements should include quantifiable and explicit benefits for patients which will be realised as part of the arrangement.

We strongly support this principle – given the biopharmaceutical industry exists to improve the health, welfare and/or care of patients – and aim to work with the NHS to design ways of quantifying the benefits for patients that the analysis and use of health data can deliver through the development of new medicines.
NHS data is an important resource and NHS organisations entering into arrangements involving their data, individually or as a consortium, should ensure they agree fair terms for their organisation and for the NHS as a whole. In particular, the boards of NHS organisations should consider themselves ultimately responsible for ensuring that any arrangements entered into by their organisation are fair, including recognising and safeguarding the value of the data that is shared and the resources which are generated as a result of the arrangement. We recognise the good intent of this principle and support the concept of fair terms. However, there is no definition of ‘fair’, and there is a risk that this principle could be interpreted to mean that every NHS organisation will need to set up data commercialisation resources with specialist experience and expertise, and therefore that the UK data landscape could become more fragmented.
Any arrangements agreed by NHS organisations should not undermine, inhibit or impact the ability of the NHS, at national level, to maximise the value or use of NHS data. NHS organisations should not enter into exclusive arrangements for raw data held by the NHS, nor include conditions limiting any benefits from being applied at a national level, nor undermine the wider NHS digital architecture, including the free flow of data within health and care, open standards and interoperability. We support this principle and would like to see it more patient-centred. Through further discussion with patient groups, we hope the principle can be further refined to reflect the fact that the sources of data are patients themselves, and therefore to reflect the importance to many patients that no party should lay claim to the exclusive use of data for research.
Any arrangements agreed by NHS organisations should be transparent and clearly communicated in order to support public trust and confidence in the NHS and wider government data policies. We support this principle but note practical experience that the transparency of agreements reached to date has varied across NHS organisations. Central guidance informed by patients on what ‘transparency’ means in practice is needed.
Any arrangements agreed by NHS organisations should fully adhere to all applicable national level legal, regulatory, privacy and security obligations, including in respect of the National Data Guardian’s Data Security Standards, the General Data Protection Regulation (GDPR) and the Common Law Duty of Confidentiality. We support this principle, and – given our members’ experience of both national and international obligations and data platforms and systems – commit to observe it and to support other stakeholders in the UK health data landscape in doing so.

Box 6: Examples of the ways in which biopharmaceutical companies use health data for research

  • The Salford Lung Study was a community-based, real-world Phase III randomised controlled trial (RCT) for a new treatment for COPD and asthma, sponsored by GSK. The RCT made use of electronic patient records, which allowed patients to be monitored during ‘normal’ clinical practice in near-real-time but with much less intrusion into their lives than typical RCTs.13 The Salford Lung Study shows the potential of establishing virtual clinical trials using the UK’s health data resources.
  • Research undertaken by biopharmaceutical companies BioMarin and Alexion using health data gathered through the 100,000 Genomes Project has helped researchers better understand the clinical spectrum of symptoms that people living with rare genetic diseases show – and has also helped diagnose patients unknowingly living with rare genetic disorders.[39]
  • The BSRBR-RA study is a unique collaboration between the University of Manchester, the British Society for Rheumatology and the biopharmaceutical industry. It tracks the progress of over 20,000 people with rheumatoid arthritis (RA) who have been prescribed biologic medicines (including biosimilars) and other targeted therapies.[40]
  • AstraZeneca is working with NHS Scotland as part of its Global Genomics Initiative to make use of patients’ genetic information to develop new treatments.[41]

Box 7: Examples of the UK’s larger health datasets

  • The Clinical Practice Research Datalink (CPRD) collects data on patients from a network of GP practices across the UK (including 11 million currently registered patients [42]).
  • Wales’s Secure Anonymised Data Linkage (SAIL) Databank holds a wide range of de-identified health and care datasets, from primary care to outpatient data, which can be linked and accessed via a remote gateway for approved research projects.[13]
  • The 100,000 Genomes Project combines whole genome sequencing data with medical records from around 85,000 people.[43]
  • England’s Hospital Episode Statistics (HES) capture a wide range of clinical information on around 20 million patients admitted to hospital a year.[44]
  • The UK Biobank has been collecting increasingly detailed data on 500,000 people since 2006.[45]
  • England’s National Cancer Registration and Analytics Service (NCRAS) collects data on all cases of cancer that occur in people living in England.[46]
  • The Scottish Cancer Registry has been collecting population-based information on cancer since 1958 and now holds over 1.8 million records.[47]
  • The National Institute for Cardiovascular Outcomes Research (NICOR) collects clinical data on cardiovascular patients across the UK. It oversees the National Cardiac Audit Programme, which had over 380,000 patient records entered in 2016-17.[48]
  • The Systemic Anti-Cancer Therapy (SACT) dataset has been collecting data on the use of systemic anti-cancer therapies across all NHS trusts in England since 2012.[49]

Box 8: Feedback collected from the biopharmaceutical industry by Health Data Research UK [51]

An engagement process with biopharmaceutical industry representatives led by HDR UK in 2019, in order to inform the specification of the Digital Innovation Hubs (DIHs) programme, collected the following feedback on the UK health data landscape:

  • Time delays and unpredictability prevent UK data access for many companies: their priorities are to see transparent, predictable, quick access to data.
  • Companies most frequently request health data that can support trial recruitment, help demonstrate value, and understand and stratify disease.
  • Companies value health data services that assist with health data discovery, offer quick and predictable access to health data once discovered, provide data curation, and are underpinned by pre-approved contracts and models.
  • Gaps in the UK’s health data that companies want to see addressed are: direct linkage to secondary care data to understand treatment effectiveness in detail; quick assessments of patients presenting in each site for trial feasibility; and the ability to recruit patients in real time based on automated eligibility checks.

Box 9: Flatiron [52]

For maximum utility, cancer datasets need to capture each patient’s stage at diagnosis, every treatment cycle (including the specific treatments delivered) and each patient’s responses and outcomes. Few health datasets anywhere in the world capture this kind of detail.

US company Flatiron created a unique dataset of around two million patients with cancer, which was bought by biopharmaceutical company Roche in 2018.

Flatiron’s value was generated not by the sheer volume of information in its database, but instead by the way in which each entry in its database was meticulously curated to develop a clinical research-grade dataset, in an enormously labour-intensive process. [52]

Box 10: Examples of inefficient data access processes [56]

Delays in HES linkage to clinical data for a rare disease specialist centre, requiring further amendments to Confidentiality Advisory Group (CAG) and Health Research Authority approvals, led to an 18-month delay in a project for direct care supporting better disease detection and referrals.[57]

A global contract research organisation (CRO) reported that it had agreed and executed a data access arrangement in one EU country in eight months, but that the equivalent access arrangement in the UK was still under discussion in the UK two years after the CRO had first sought the data.

In 2018, a UK SME looking for linked genetic and clinical data to validate a suspected target association and raise funds to develop a new drug found a relevant dataset within two weeks but, after an unexplained delay of three months while the university concerned started the contracting process, had to give up working with that dataset.

A global company wanted to access national data on outcomes related to current treatment pathways to support a submission to NICE but found there was no way to access data across the country. After discussions with a number of trusts, this eventually resulted in the company conducting a single-centre audit which itself took six months to complete.

Box 11: AI and the need for access to high-quality data

There is much interest in the promise of AI to improve healthcare decision-taking and improve efficiency.[58]
On 8 August 2019, for example, the Prime Minister announced £250 million of investment to help the NHS become a world leader in its use.[59]

However, AI tools require access to high-quality data to learn from, and companies investing in AI therefore invest significantly in accessing and improving data – for example:

A joint report by the Medicines Discovery Catapult and the BioIndustry Association found that 75% of spending by companies in AI is actually on the upstream (often unseen) activities of data access, curation and data labelling, and not algorithm development and improvement.[50]

IBM has also reported around 80 per cent of the time spent by scientists developing AI technologies is spent finding, cleansing and organising data – rather than in developing the algorithms which actually perform any analysis.[60]

If the NHS’s £250 million investment in AI is appropriately allocated, therefore, at least £185 million of the investment may need to be spent on accessing and improving data.

Box 12: Examples of the challenges in discovering UK health data [69]

In 2017, a global biopharmaceutical company looking for cancer-related health data was incorrectly informed that comprehensive, national health data was available. However, after six months it became clear that the data available was incomplete and low quality, particularly regarding prescription data. The delay in accessing the data meant that it was nearly impossible to have the quality issues addressed.

In 2019, a global CRO requested data on the number of specific patients attending UK hospitals so that the UK could be included as a potential site for a global clinical trial. However, the data took so long to arrive that the UK was not included as a possible location.

Box 13: Sensyne Health

Sensyne Health is a unique partnership (initiated in February 2017) with a small number of NHS trusts which develops digital health products and enables companies to analyse anonymised data. The data remains in NHS ownership.[75]

Each analysis of anonymised patient data is preapproved for each programme on a case-by-case basis by the relevant NHS trusts. This is to ensure that the purpose of the anonymisation and the proposed analysis are subject to appropriate ethical oversight and information governance, including conformance with NHS principles, UK data protection law and applicable regulatory guidance.[76]

In August 2018 Sensyne Health floated on the London Stock Exchange.[75]

Box 14: HDR UK and the components of data access

1 The UK Health Data Research Alliance The Alliance was established in December 2018 to bring together leading healthcare and research organisations and health leaders to establish best practice for the ethical use of UK health data at scale.

2 Seven Health Data Research Hubs In September 2019, the following Hubs were announced:

3 UK Health Data Research Innovation Gateway The purpose of the Innovation Gateway is to provide services to the Hubs, and others, so that data from the Alliance can be discovered and accessed safely and responsibly. The first phase went live in January 2020, includes a catalogue of metadata to facilitate discovery of relevant datasets.

Name Focus Aim
DATA CAN Cancer Enable UK-wide high-quality cancer data
access to improve care, diagnosis and research.
INSIGHT Eye health Use data, analytics and AI to develop insights into
eye disease and wider health.
Gut Reaction Inflammatory
bowel disease
Use data to better stratify Crohn’s Disease and
ulcerative colitis patient responses.
PIONEER Acute care Use linked data to enable companies to develop
acute care products and services.
NHS Digital Clinical trials Increase opportunities for patients to participate in clinical trials.
BREATHE Respiratory Improve the lives of people with conditions such as asthma and COPD.
Discover-NOW Real-world data Understand, develop treatments and prevent long-term
conditions such as T2 diabetes.

Last modified: 20 September 2023

Last reviewed: 20 September 2023