Health Data from UK Biobank Exposed Online, Raising Privacy Concerns

Confidential health records from the UK Biobank have been exposed online multiple times, prompting serious concerns regarding the security of sensitive patient data. An investigation by the *Guardian* has revealed that this flagship medical research project, which houses the health information of approximately 500,000 British volunteers, may not be adequately safeguarding its data.

The UK Biobank is renowned for its extensive collection of health information, which has been instrumental in advancing research on diseases such as cancer, dementia, and diabetes. However, it appears that some researchers with approved access to this sensitive data have been negligent in maintaining its privacy. Although the exposed files do not contain names or addresses, they include extensive hospital diagnoses and dates related to over 400,000 participants, creating potential privacy risks.

In one instance, the *Guardian* identified a dataset that contained detailed hospital diagnosis records for a volunteer by using only their month and year of birth along with information about a significant surgery they had undergone. A data expert expressed alarm, stating the scale of this issue is “shocking,” particularly in an era when artificial intelligence and social media make it increasingly easy to cross-reference information.

Despite these concerns, UK Biobank has countered that no identifying data was given to researchers. Prof Sir Rory Collins, the chief executive of UK Biobank, stated, “We have never seen any evidence of any UK Biobank participant being re-identified by others.”

Founded in 2003 by the Department of Health and various medical research charities, UK Biobank collects genome sequences, scans, blood samples, and lifestyle information from its volunteers. Recently, the UK government extended Biobank’s access to participants’ general practitioner (GP) records. Researchers from universities and private companies apply for access to this wealth of data, which they were able to download directly onto their systems until late 2024.

The issue of data exposure has become more pronounced due to increasing demands from academic journals and funding organizations requiring researchers to publish the code used to analyze large datasets. In the course of this process, some researchers have accidentally uploaded Biobank datasets to GitHub, a popular code-sharing platform. UK Biobank has prohibited researchers from sharing data outside their controlled environments and has implemented additional training to mitigate these risks.

Concerns regarding data leaks intensified in the past year. Between July and December 2025, UK Biobank issued 80 legal notices to GitHub, which complied with requests to remove data from its platform. Despite these efforts, many files remain publicly accessible. Some datasets contain only patient IDs or limited test results, while others are considerably more comprehensive.

One dataset identified by the *Guardian* in January contained hospital diagnoses and related dates for about 413,000 participants, as well as their sex and birth month and year. A data expert who reviewed this file remarked, “It sent shivers down my spine to even open. I deleted the file immediately. It was very detailed and felt like a gross invasion of privacy even to glance at.”

To assess the risk of re-identification, the *Guardian* contacted several Biobank volunteers. Two volunteers who had undergone medical procedures during the relevant timeframe agreed to share their information with an external data scientist. One volunteer, who provided treatment dates for a fracture and seizure, could not be found in the dataset. The second volunteer, a woman in her 70s, shared her month and year of birth and the month and year of her hysterectomy. The analysis revealed a single matching entry in the dataset, corroborated by additional diagnoses that the volunteer had not disclosed initially.

“Effectively you were rehearsing the main parts of my medical history to me without me having given you any information at all. I didn’t expect that,” she stated. Although she expressed concern about the security of data, she intended to remain a participant in UK Biobank, which she views as “extremely important.” However, she added, “I’m more concerned about whether Biobank has broken its agreement with people. They said they would hold our data securely … I just feel as though that has to come into the equation.”

In response, UK Biobank maintained that the re-identification scenario presented by the *Guardian* does not pose a privacy risk, asserting that without additional information, it would be impossible to identify individuals. A spokesperson emphasized that participants are advised against sharing personal health information online, as this could lead to cross-referencing with Biobank data.

Privacy experts have criticized UK Biobank for not fully grasping the implications of sharing health data in today’s digital landscape. “Are these people aware that the internet exists?” asked Prof Felix Ritchie, an economist at the University of the West of England. He argued that expecting volunteers to refrain from sharing information online is unrealistic.

Dr Luc Rocher, an associate professor at the Oxford Internet Institute, highlighted that removing identifiers does not guarantee anonymity. He pointed out that knowing a person’s birthday and a significant medical event could allow for accurate identification of their record. “Once identified, that record could reveal sensitive information such as a psychiatric diagnosis, an HIV test result, or a history of drug abuse,” he explained.

Prof Niels Peek, a professor of data science and healthcare improvement at the University of Cambridge, expressed concern over the scale of the problem. He stated, “If it had happened once or 10 times I’d probably say: ‘It’s not great that it’s happened but at the same time zero risk is impossible.’ Hundreds. That’s a little bit too much.”

While Peek acknowledged that UK Biobank has taken serious measures to address these issues, he noted that the continuous nature of the data leaks reveals significant tensions between the desire to advance health research through extensive data access and the ethical obligation to protect individual privacy. Experts remain uncertain whether UK Biobank can fully regain control over the data that has already been released online, as many files continue to linger on various platforms despite removal efforts.

As the conversation around data privacy intensifies, UK Biobank’s challenges underscore the urgent need for robust safeguards in the evolving landscape of health research and data sharing.