A year later: Low budget analysis of personal genomic data

16 July, 2013

A year ago today, F1000Research launched its preliminary site. Among the very first papers published on that day was Manuel Corpas’ paper “Low budget analysis of Direct-To-Consumer genomic testing familial data”, in which he evaluated the potential of public domain analysis tools for personal genomics. Now, a year later, we followed up with Manuel to find out how his publication was received, and what he’s currently working on.

Your paper described the use of third party public tools to analyse personal genome data. What is the benefit of these tools over commercial options for genome analysis?

Firstly, and obviously, the advantage is that they’re free. Secondly, you can extend these tools: you can reutilize them and modify them according to what you like and don’t like. Then you can feed that new version back to developers and they can enhance it further.

It allows more collaboration, and anyone has the chance of extending functionality in this crowdsourcing approach .

What has the response been to your paper and the data included in it?

The response to the paper has been very good. To some extent it has allowed the recognition of the work we’ve been doing. Our paper was the first published work on family genome analysis using open source tools. Quite a few people have reused our datasets, and sent us the results of their analyses. That has been really useful.

[Ed. – One of the datasets from this paper was also featured in the top ten most-cited datasets on DataCite in January.]

The field of personal genomics moves fast. Are there any new updates since you wrote your paper a year ago? Would you have done things differently now?

The biggest difference is the price of getting the data. It costs a lot less now than it did a few years ago when we first bought a kit, and there are more types of analyses. In terms of tools there are still very few, and there is still the same problem that not many tools are available freely.

There are a few new tools, but none that allow people with no computational or genomics background to analyse their genomic data . One reason for this is that the public need to be educated about what the possibilities are. Another reason is that there has hardly been any community effort to provide tools that are accessible. It works two ways: The tools need to be easier to operate and easier to access.

What have you been working on in the past year, with regards to personal genomics?

I have published several papers, including one called “A genome blogger manifesto”, in GigaScience. In there I raised awareness of the conceptions that the general public has about personal genomics. There is a sort of genetic exceptionalism where people are comfortable publishing personal profiles on Facebook, but if you publish your personal genome it is treated as if you give away banking data. I believe that is not reasonable. I believe that you give away more personal information on a Facebook profile page than if you publish your genetic profile.

I’ve also been working on a crowdfunding initiative, which was picked up by Science magazine, to raise funds to analyse the genomes of my family. That allowed me to raise around $3000, which was about 20% of the amount I requested, but I had set an ambitious goal and I was quite pleased with the money we got, which allowed us to get the genomes of three members of the family analysed.

The first analyses used microarrays from 23andMe. Then we used the 3,000 USD to sequence the exomes of 3 members of the family. I was also able to do metagenomics DNA sequencing of my own gut microbiome, and that was released quite recently. At the moment, all of the data are in Figshare.

Your paper was one of the very first to appear in F1000Research. Why did you decide to submit to a very new journal?

The reason was simple. I was working at the genome campus when [F1000Research Managing Director] Rebecca Lawrence was inviting people to submit papers. I was doing this project in my own time, and I found that F1000Research was very helpful, and I got a lot of support. I knew that this paper, which includes personal genomic data, would be difficult to publish in other places, because of the objections people have about genomic privacy. After submitting the paper to F1000Research, I was quite impressed with the way the review was done, and I discovered that it was very fast as well!

topics: Open data

blog