Data, what is it good for?

15 September, 2011

The tricky question of data sharing, reuse and openness is a familiar topic to regular readers of Naturally Selected: see previous posts gathered here, here and here. So we were interested to see a news article in Nature by Zoë Corbyn yesterday, Researchers failing to make raw data public.

The article highlights a paper in PLoS ONE from John Ioannidis and colleagues, bemoaning the state of public availability of raw data in published articles (10.1371/journal.pone.0024357). Essentially, they find that not enough journals have data availability policies, and where they do, authors don’t adhere to them. This echoes something Heather Piwowar wrote about, again in PLoS ONE, earlier, on the deposition of gene expression microarray data (10.1371/journal.pone.0018657)*. Now, while some might see a rise in the rate of published raw data of over 700% in eight years somewhat encouraging, the general feeling I get is one of disappointment.

Ioannidis is quoted in the Nature article as saying “You need an extra editorial office and maybe more,” for journals to implement and enforce data sharing, which is all very well but article processing charges and subscriptions will have to take that into account, no doubt. You read that here first.

F1000 Member Steven Wiley is also quoted in the news article. He says that the Ioannadis paper doesn’t address why “scientists might defy data-sharing policies,” adding that data sharing lacks both a stick, and perhaps more importantly, an important carrot–or at least the perception of a carrot: Peter Murray-Rust for example firmly believes in it:

Wiley goes on to add that he suspects the majority of scientists’ data will never be used by other scientists. Whether this is because there isn’t sufficient standardization of data formats, or because scientists are generally too busy generating their own data to bother looking at someone else’s, isn’t really clear to me.

As ever, we’re interested in your views.

* The submission, revision and publication dates on that article are interesting. Sounds like she had some trouble with it.

topics: Open data

1 thoughts on “Data, what is it good for?”

There is not a single reason for some reluctance in making raw data public. One is surely laziness (of both authors and editors), another one might be that data are somehow twisted to fit some standards that are rarely met, another one might be that the generators of data are jealous of their data and fear that somebody else might use them to publish papers that will not give enough credit to those who actually produced the data. I have reviewed an article complaining about the data deluge:
http://f1000.com/11369956
where there was a complain about too many data. My concern is not only about the quantity of data, I am also concerned about their quality. How many sequences in Genbank are ascribed to correctly identified species? Since taxonomic expertise is vanishing, I fear that many of these correct data (in terms of sequences) are ascribed to species that were not correctly identified. So, one datum might be correct (the sequence) and another might not be so (the identification of the species).
All this reminds me a friend of mine who taught logic. He loved to use very cumbersome ways to assess the validity of even very simple sentences, generating equations for everything. Then, however, he did not use all these logical paraphernalia when discussing with his wife, or with “normal” people. Well, he did not use it even to discuss with his co-fanatics. If data are fake and the conclusions stemming from them are worth something, there will always be a way to prove that they are false. Because data must fit with some “theory” and theories can be tested also by using other data than those that were used to generate them. Many interesting theories were generated by providing wrong data. Darwin’s theory of evolution was based on non genetic data, for instance. And needed to be reformed at least twice. I dare to say that the data Darwin proposed to suggest evolution were wrong or, at least, not sufficient (no genetics), whereas he built a beautiful theory of ecology (natural selection). But who cares if the data were wrong? The theory is right! And Darwin remains the greatest scientist of all times.

Legacy comments are closed.

User comments must be in English, comprehensible and relevant to the post under discussion. We reserve the right to remove any comments that we consider to be inappropriate, offensive or otherwise in breach of the User Comment Terms and Conditions. Commenters must not use a comment for personal attacks.

Click here to post comment and indicate that you accept the Commenting Terms and Conditions.

blog