Isn’t all expert peer review subjective?

10 October, 2013

Analysis of methods for research assessment, particularly in light of the huge public investment in the current Research Excellence Framework (REF) 2014 exercise, is welcome. A new study published in PLOS Biology aiming to assess different methods of post-publication assessment of research, including an analysis of F1000Prime data from 2005, raises important questions – possibly more than it answers.

F1000 welcome research involving F1000Prime data and frequently make data that usually requires a subscription to access available for research. As we did for Eyre-Walker and Stoletzki, who demonstrate good reproducible research practice by making the (F1000) data supporting their analyses available in a public repository.

The F1000Prime data used in this study were from 2005, when the service was rapidly evolving from covering biological sciences to, in 2006, encompassing medicine. An analysis of more recent data might yield different results. The fact that F1000Prime has been established for more than a decade means it is probably one of the most studied alternative metrics to citations and Impact Factors – although numbers of studies on F1000 data are still relatively low (while writing this I created a Mendeley group for them here – please add any that may be missing). One aspect of Eyre-Walker’s study is the link between F1000Prime recommendations and future citation share, and they report a weak correlation. At least one study, by the Medical Research Council, has found F1000Prime recommendations to be a good predictor of citation impact but the evidence on whether F1000Prime recommendations predict citations is essentially divided.

But how much does this matter? While correlations between different systems and measures of “impact” are interesting, the most exciting discoveries about how research is used and regarded post-publication surely lie in the aggregate power of combining the increasingly diverse collection of article level and alternative metrics data with “rich multidimensional assessment tools”. With this we may be able to further explore what science has led to impact in the real world – improvements in human health, society, economics or the ecosystem. And identify important papers that traditional indicators might miss (as two studies have found for F1000Prime classifications and tags – another aspect of the recommendation system we use).

Another question raised by the paper is what scientific ‘merit’ actually is (the authors do not define merit but say scientists are not good at identifying it). Eyre-Walker and Stoletzki do however define impact – as citations. As others have done, studies of research assessment/impact tend to rely on an assumption that citations (to papers) and therefore the Journal Impact Factor are equal to scientific impact or quality (when we know some of the most cited papers are the most fraudulent). But citations have, before online publishing, often been the best, albeit flawed, surrogate for “impact” we can measure.

Eyre-Walker and Stoletzki’s assertion that F1000 recommendations are “subjective post-publication peer review” is probably true. But if so then all peer review, as defined by a small number of experts assessing science and employed by peer-reviewed journals and grant funding agencies, is similarly subjective. Much evidence has found various flaws – including bias, errors, anti-innovation and delays – in the (pre-publication) peer-review process. For this reason we, like Jonathan Eisen and colleagues, don’t agree with the authors’ conclusion that pre-publication assessment is the solution to the problems with expert review post-publication. At F1000 we believe that transparent post-publication review, whether by a large Faculty of experts, journal selected reviewers or through the many eyes of the crowd, can help address some of these problems. Many papers in F1000Prime are evaluated multiple times (resulting in scores much greater than 1, 2 or 3) and our users frequently tell us that the “human readable” summary of why each paper has been selected is one of the service’s most valuable features.

We intend to keep innovating in post-publication review until we (or someone else) come up with a solution to these problems in scholarly communication.

blog