Peer reviewing the Data Note - an interview with Andrew Maurer

Peer reviewing the Data Note – an interview with Andrew Maurer

25 June, 2014

MaurerPhoto

Authors may wish to submit Data Notes for data no longer being actively used, giving other researchers the opportunity to work on the data for novel studies. Or researchers may wish to publish a Data Note to ensure that data continues to be accessible in the future. F1000Research is currently waiving APCs for Data Notes so it’s the perfect time to trial the Data Note article type.

But how does one go about peer reviewing such a paper? We asked one of our reviewers, Dr Andrew Maurer, about his experience of providing an open peer review for Kenji Mizuseki et al’s F1000Research Data Note “Neurosharing: large-scale data sets (spike, LFP) recorded from the hippocampal-entorhinal system in behaving rats [v1; ref status: indexed, https://f1000r.es/37v]”.

You provide a thought provoking commentary on common fears about data sharing in the neurosciences. What has been your personal experience of accessing data for further research in this area?

I was fortunate that there was a precedent in the discussion of data sharing. As mentioned in the Mizuseki et al. review, Dr. Giorgio Ascoli outlined the impediments and trepidations that researchers have about sharing data (Ascoli, 2006). Some of these sentiments also appeared in Teeters et al. (2008) in which they document the early data sharing and code-sharing platforms. While I personally think that these researchers were on the forefront in terms of socio-scientific policy to address these issues head-on- advocating data-sharing, I am concerned that scientists may be slow to adopt a similar mindset.
“Open data” versus “open etc.”, as of right now, may be a bit disconnected in the field of in vivo freely-behaving physiology.
I have been fortuitous in that there has been a significant amount of code sharing with spike sorting algorithms/GUIs written by Ken Harris and David Redish (and numerous others that have contributed to this effort), the Chronux toolbox constructed by Partha Mitra et al., and the maintenance of multiple toolboxes by Michael Zugaro (Please forgive me if I left someone off of this list – by no means is it meant to be a complete compendium). These toolboxes are readily available for download. Moreover, in the spirit of the 3rd industrial revolution (Jeremy Rifkin, 2011), we are 3D printing the microdrives which house our electrophysiology electrodes. At present, I believe our current format is a heavily modified version of what Caleb Kemere provides while other CAD files are available from Fabian Kloosterman et al. (2009) as well as the Open Ephys project (“flex drive”). I encourage everyone to take a moment and check out the Open Ephys project, co-founded by Josh Siegle and Jakob Voigts. It provides a relatively straightforward, cost-effective means to implement high-density in vivo electrophysiology experiments and the community support platform (wiki) to back it up.
In terms of “Open data”, my first experience with in vivo electrophysiology data sharing was when Edvard and May-Britt Moser provided a grid cell dataset for download on their lab webpage. The phenomenon was incredibly novel and the ability to record in the medial entorhinal cortex was something of a feat (still is). Personally, I imagine that the Moser laboratory made the dataset available to satiate the curiosity of others. Massive data sharing on the level of Mizuseki and collegues on the other hand is unprecedented in our field.

What were the differences (if any) in how you approached the peer review of this paper? If there were differences in your approach, were these due to the transparent nature of the peer review, the availability of the data, or the fact it was a data paper?

I did approach this review differently as it is a paper describing a database rather than reporting an empirical finding. It was a unique review. To answer a question with a question, “Does someone criticize the aesthetic beauty of someone who poses nude for an art class or should they be happy to have a subject to paint?”. To provide such an extensive database comes with a level of exposure that is atypical for any scientist to undergo. We must appreciate this for all of the beautiful curves and ugly moles that come with it. Criticism seems to be a moot point when no one else is sharing their data. After saying this, it may have gone without needing to be addressed.

In your opinion does data availability help or hinder the review process in your research area?

At this point, I think it remains to be determined as there has been a dearth of open data up until now. Nonetheless, I believe that researchers have their own personal “stockpile” of data in which they can cross-validate recent discoveries if they wish. When it comes to reviewing a paper, however, I have concerns about how available data will be used. If the paper under a review is from an open database, then would a reviewer conducted “re-analysis” be fair? In one hand, it could be a productive part of the peer-review process as science should work for replication and it is rare when the same database undergoes two differently authored algorithms, analyzing for the same effect. On the other hand, it may put “analysis code on trial”. For example, if the reviewer and the reviewee come to different answers with respect to their respective analyses on the same database, then the difference in analysis would be the primary factor. In order to settle the dispute, who is to proof the code? Finally, if a reviewer actually significantly improves on a finding (We should all be so lucky!), do they themselves become an author (in either a closed or open review process).

In your review, you say “I hope that these data facilitate cross-laboratory collaboration where two groups are reticent to share their own data.” What can journals do to help researchers share their data?

This may never come to fruition, but for the sake of “reproducibility”, journals may wish to provide the server space to upload and store data that the analyses were derived from. With the decreasing cost of computer memory and efficacy of compression algorithms, it seems like a tractable problem- especially in light of the difficulties in replicating previous results (e.g., Begley and Ellis, 2012). This would directly discourage scientific fraud (those who wish to perpetuate fraud need to fabricate the raw data rather than the stats), provide a mechanism to speed up scientific “correction”, and allow direct comparisons to be made across contradicting experiments. If these publications also provided code and implementation instructions, it may lead to an increase in their impact relative to standard publications (“teach me” versus “tell me”). There is a huge upside to such an approach, but the activation barrier on the part of the journal (the need to set-up and maintain a server) and the authors (the need to put the data in an accessible format, upload, provide code and implementation instructions) may be too high.

What has been your overall experience reviewing for F1000Research?

I enjoyed the transparency and the encouragement for a quick “turnaround”. The open format of journals seems to be gaining momentum and I appreciate the effort to “keep science civil”.

topics: Open data, Open research, Peer review

blog