Genomics data – share and share alike
22 September, 2015 | Michael Markie |
|
|
Today we are very excited to publish The ICR1000 UK exome series: a resource of gene variation in an outbred population a research article from Nazneen Rahman’s Lab at the Institute of Cancer Research, UK. The dataset described in the paper includes exome sequence data from 1,000 individuals of the general UK population where the samples have been taken from the 1958 Birth Cohort study, a population-based collection of all individuals born in the UK in one week in 1958. This particular dataset has high-utility for genomics researchers globally and so one thing Nazneen and her colleagues wanted to ensure is that the data is not hidden by obscurity but rather it is made publically available so others can benefit from it.
One of the pillars of F1000Research’s foundations is that all data on which the reported results are based has to be made available (being sensitive to genuine data protection concerns). We believe data sharing is good for science and society. This is something that holds true for the authors of the paper too; the sharing of genomic datasets can set others on a path that can ultimately improve our existing knowledge and hopefully lead us towards new and meaningful clinical applications.
To talk about their paper a little more, and indeed discuss why making genomics data openly accessible is the right thing to do, we spoke with Nazneen Rahman and Ann Strydom to see what they had to say.
Interview
F1000Research: Today, DNA sequencing produces huge amounts of genetic data on both individuals and populations. How has this impacted the work that your team does?
Nazneen and Ann: It has had an extraordinary impact, both in the nature of what we do and how we do it. We both generate huge amounts of genetic data ourselves and utilise data generated by others. This has required us to develop a data strategy which includes the creation and expansion of an informatics team and the purchase and management of computing infrastructure. We have also spent considerable time and energy developing ways to use, describe and interrogate data in consistent, robust and relevant ways. The potential benefits are huge; we have been able to discover many new links between genetic variation and disease and to translate those findings into the clinic to deliver much faster and more comprehensive gene testing.
F1000Research: Can you give some background on the ICR1000 UK exome series study and what prompted you to collect this data?
Nazneen and Ann: If you are trying to work out if genetic variation has a role in disease it is essential to have an understanding of the pattern of variation in the general population. It is an essential baseline from which our discoveries on genetic causes of disease are based. 10 years ago for each study we would analyse the gene in individuals with the disease and in the general population. New DNA sequencing technologies now make it possible to do all ~20,000 genes in one go (termed ‘exome sequencing’). However, when we started to do these studies the baseline information about gene variation in the general population was not available so we had to generate it ourselves. This is the ICR1000 UK series. We have analysed all the genes in 1,000 members of the general UK population. It has proved invaluable to us in both our disease gene discovery research and our work to translate gene testing into the clinic.
F1000Research: The data from this study was made available for researchers from your ICR research pages; why did you want to publish this particular data set?
Nazneen and Ann: We have found this data to be of tremendous value and it continues to be useful for us every day. There are hundreds of research and clinical laboratories doing similar work and we thought they would also find it valuable. We have tried to make it as useful and as user-friendly as possible. We have made all of our own data fully available and people can also obtain the original raw data (through application to 1958 committee) to use in any other way that would be helpful for them. Generating the data was a long, hard task and we very much want as much benefit to come from our endeavours as possible.
F1000Research: What was the reason(s) for publishing this data with F1000Research – are they any particular aspects of the publishing model that you thought would be useful?
Nazneen and Ann: Our group is strongly supportive of the open science model that F1000Research is promoting. Also the very rapid publication model with transparency of reviews means we can get this data out to people to use very quickly. The more traditional publishing model would have led to the paper being in a closed review system for months.
F1000Research: With huge amounts of data being produced by genome sequencing there is lots of information being generated; what do you feel the genomics community should be doing to make this available?
Nazneen and Ann: It cannot be overstated how beneficial open research and open data can be to genomics, particularly as we move into an era of widespread clinical utility. Currently, vast quantities of genomic data are residing in thousands of individual silos. If there are easy and effective mechanisms of sharing data much of it can be made accessible, without compromising individual privacy. I hope and believe the genomics community as a whole are strongly supportive of this ethos. There are many challenges to making it happen but there are also many national and international initiatives underway tackling the challenges. With support and buy-in from funders, researchers, publishers and an increasingly informed and aware public, hopefully we will gain the impetus needed to make open access to genomic resources a matter of routine.
|