In the last decade or so in which genetics became a digital science called genomics, various debates and laws have been framed about the privacy of personal genomics data. That debate is going to intensify now as a study published today shows how vulnerable such data is.
Using only publicly accessible databases, researchers, who simply wanted to conduct an exercise in “vulnerability research”, could identify nearly 50 individuals who had submitted personal genetic material for genomic studies.
Published in Science today, the study shows that posting of data from a single individual can reveal deep genealogical ties and lead to the identification of another person who might have no acquaintance with the person who released his genetic data. “We show that if, for example, your Uncle Dave submitted his DNA to a genetic genealogy database, you could be identified,” says Melissa Gymrek, one of the authors of the study. “In fact, even your fourth cousin Patrick, whom you’ve never met, could identify you if his DNA is in the database, as long as he is paternally related to you.”
As the information moves through shared male lines, nearly 135,000 records can potentially target several million US males.
Why only males? Because the Y chromosome is transmitted from father to son, just like family surnames in all human societies, and there is a strong correlation between surnames and the DNA on the Y chromosome. Yaniv Erlich of Whitehead Institute and colleagues studied unique genetic markers (called short tandem repeats or STRs) on Y chromosome of men who had participated in a study and whose genomes were sequenced and made publicly available as part of the 1000 Genomes Project.
Due to commercial and recreational reasons, databases are available that house Y-STR data by surname. Genealogy companies provide services to male adoptees and descendants of anonymous sperm donors to trace their patrilineal relatives or biological fathers.
This study shows that the risk of surname inference will grow in future. Genetic genealogy enthusiasts add thousands of records to these databases every month. More importantly, with the third generation sequencing platforms, the ability to link haplotypes (group of genes inherited together from one parent) and surnames will only get better.
What does this mean for people in India? As things stand today, this can well be termed a Western problem. We neither have any nationwide genetic genealogy database nor do we have whole genome sequencing data available for multiple individuals. But things are changing pretty fast and while large scale databases and such identity searches maybe a few years away here, what is right at the door step is the need for something like the Genetic Information Nondiscrimination Act (GINA) of America that was signed by President Bush into law in 2008. It makes genetic discrimination illegal in the US. It took nearly 12 years, as long as it took to sequence the human genome, to get there. Patient activists had a big role in that landmark regulation.
With free fall in DNA sequencing cost, exploding genomics data, and small and big studies being taken up here, it is high time India put a policy in place. Several diagnostics labs are already providing genetic testing. A quick Google search throws up dozens of such centres across the country. Last year one of the largest diagnostic labs in the country introduced something called Universal Genetic Test – a drop of blood can test you for 100 genetic disorders. I’m wondering, in the absence of any law, what stops health insurance companies or employers from using genetic information on predisposition of certain disorders to not discriminate against people?
(And if you thought insurance is still under penetrated in India, look at this World Bank report, which says at least 300 million Indians have some kind of health insurance today and that number will cross 630 million by 2015.)
In India we have not seen the genomic data deluge yet, be it in recreational genomics or in medical/agricultural/forensics genomics fields but as the old adage goes, dig your well before you are thirsty, says Binay Panda, who heads Ganit Labs. Ganit is a public-private partnership between Central and State IT and Biotech departments and Strand Lifesciences and is involved in various genomic projects, recently having completed the sequencing of the medicinal plant Neem.
This is the right time to put proper guidelines before we see peta- or exabytes of data coming out of individual sequencing centres, says Panda. “We, the scientists, who produce and use whole genome data must prepare and educate both the government and the public on the genetic data fair-use policy.”
This is not to dissuade people from donating their genetic material as that would be disastrous for medical advancement. The true promise of genomics resides in large numbers. Only by studying large samples, with thousands of people, can researchers detect subtle DNA variants that are linked to complex conditions like heart disease, diabetes or hypertension. Last year CSIR undertook Rs 100-crore initiative to study type II diabetes in Indians. It will compare the genomes of nearly 22,000 Indians for variations in specific locations of their DNA.
Cardiac surgeon Devi Shetty has been saying this for the last 13-14 years that Indians are genetically predisposed to cardiac disorders, much before personal genomics became popular. In 2009, CCMB researchers in Hyderabad discovered the genetic mutation that puts 45 million people in India at risk of chronic heart failure. Imagine if such genetic information were to be misused!
Why we also need a fair-use and data sharing policy in place is because the open data movement is getting strong. Several Indian biomedical research institutions have huge data sets sitting in their repository. Providing open access mechanism to that data for both basic and translational research will unleash many commercial and public health opportunities.
It’s time not to be paranoid about privacy but find a way to balance societal benefits and individual needs.
PS: I’ve volunteered for full Exome (~1 percent of the genome that codes for proteins) sequencing at a Bangalore life sciences company. As it gets going, I’ll share my experiences, learning and all related issues here.