F1000 Rankings
5 October, 2011 | Richard P. Grant |
|
|
We launched the F1000 Journal Rankings on Monday. We’ve taken the ratings given to articles by F1000 Faculty Members, and cooked up a way of ranking journals using those scores. This is a new venture for us, one that is both exciting and slightly terrifying.
(Updated: Read Declan Butler’s piece in Nature, here.)
First, why are we ranking journals? Aren’t the journal impact factor and the journal usage factor enough? We (along with many others) think there are serious problems with those metrics, and while we’re not aiming to replace them, we do hope that our rankings will be a valuable addition to the mix. But that’s not the primary reason for releasing our new calculations: we are more concerned with helping authors discover the best place to publish in their particular speciality. There are many journals out there, and most of us have not heard of most of them. But within the small, highly specialized communities of today’s researchers, there are hundreds, if not thousands, of journals with low readerships, that are nevertheless of critical importance to their particular fields.
How do you find those journals? How do you find what people are reading, so you can publish in those places and reach your intended audience? We hope that the F1000 Journal Rankings will help you. By drilling down into Faculty and Section levels, you’ll be able to find the most important, the most read, journals in each specialty–as judged by experts in those fields. We’re also working on making the rankings searchable, so if you want to tell the researchers who need to know the latest about your particular cell surface receptor, or Alzheimer’s treatment, or deep-sea ecosystem, you’ll be able to find where to publish.
If we’re going to offer such a service, then we have to be very clear about how we go about it. One of the guiding precepts of the F1000 Journal Factor, or FFj as we’re calling it, is that of transparency. From the very start of the project over two years ago, we committed to making the process by which we would rank journals completely open: both in the formula we would use and in the auditability of the process. Users should always, we said, be able to see which articles were rated, by whom, and how those ratings translate into a ranking for journals. This transparency stands in stark contrast to certain other metrics, but also exposes us to a different sort of criticism. We’re standing in a field with our trousers round our ankles inviting everybody to have a go at us.
But we believe that this is the right approach. Faculty of 1000 is based entirely on qualitative judgements, and if we’re going to use those judgements to come up with a metric, we have to be totally, clearly and painfully honest. We have discussed the algorithm we are using with a fair number of people of the last couple of years, respectfully declining some suggestions and taking on board others. Yet already people have pointed out new issues that I will have to address, and decide what to do about it. This is peer review in action (and also why it says “Beta” on the rankings pages).
Now, we know that there are problems with the Journal Impact Factor, and it is easy to come up with a list of gotchas pertinent to download metrics. But without going into those here, let us make clear that the FFj is not, and is not intended to be, a replacement for either. It is not somehow “better”, but it is different. It is after all a qualitative measure, with its own peculiar set of problems and weaknesses. Some of those problems we will be able to mitigate; others arise from the very nature of the F1000 system and we’ll have to live with them. We hope that the FFj will complement those other metrics, and that people will take what it is good from each of them.
We have already received some feedback on the algorithm. There are some good points made (the normalization factor–which accounts for the volume of articles published by a particular journal–was always the weakest link), and we’ll have to play with the numbers to see what kind of difference those suggestions make. Other comments point out things that we know already (such as the effects of using a log scale), but which we have kept for one reason or another.
Last year, at about this time, we merged the two services (F1000 Biology and F1000 Medicine) into one site. Unfortunately, the way we currrently handle articles in the back end means that the same article, if it is evaluated in both Medicine and Biology faculties, will appear twice in the listings for that journal at the top level of rankings (of course, once you get into the Biology or Medicine levels, or down in the Faculties and Sections, it won’t appear). So for example, Mutations in UBQLN2 cause dominant X-linked juvenile and adult-onset ALS and ALS/dementia appears in Biology with an article factor (FFa) of 14, and in Medicine with an FFa of 10. This contributes a total of 24 (and counts as two articles) for Nature, whereas it should be one article with an FFa of 17 (sum of highest rating plus increments for the rest = 10 + 3 + 2 + 2 = 17). This would make very little difference to a highly ranked journal such as Nature, but might be quite significant to a low volume, specialist journal.
Smaller journals are also sensitive to patchiness or over-representation in our Faculty. While this is something we are always trying to address, it won’t be always possible to cover everything–and of course, some topics go through phases of sensationalism or celebrity, so we might expect to see spikes in the data. Overall, because of the way we’ve constructed the rankings, small specialist journals that accrue a decent number of evaluations will do well out of the rankings. We contend this is a good thing, as there is a lot of good research published in such places, which you might miss if you were wedded only to the impact factor. We find that even the ranking of larger journals does not always correlate with the impact factor. PNAS, for example, scores very well at F1000, despite having a single digit impact factor. While at the bench, I found PNAS a very valuable journal, so I am pleasantly surprised by how well it does. And of course, a small community is unlikely to publish in journals with high impact factors–but by drilling down through our rankings those journals will appear in their rightful place, according to their value to authors who want to reach those communities.
However, it is apparent that some journals might have a higher ranking than perhaps they should, perhaps through over-coverage, perhaps through the action of a single Faculty Member. On Monday, a journalist observed that one obscure journal was doing particularly well in one section. The journal’s ranking was due mainly to the Editor-in-Chief of that journal evaluating papers from his journal. That’s clearly inappropriate behaviour, and we’d missed it until now. So as a result of this feedback we’re revising and strengthening the code of conduct, and working out how to police things so that we can pick up on similar cases and do something about it (for now we have removed that journal from the rankings. It is perhaps worth noting that an attempt to make a journal look good can backfire). On the other hand, we have a number of Editors-in-Chief on our Faculty–as you might expect–and there are other examples where one of them has submitted evaluations from their own journal. But here, not only have they declared their conflict of interest, they have also evaluated from other journals (and indeed, not evaluated one of ‘their’ articles for two years). So it is a tricky situation, and one that requires our honesty and transparency.
Another potential problem is that we have not yet excluded retracted articles from the ranking calculations. This is something we should be able to fix very soon now, although it does not make a huge amount of difference: the FFj for Nature in 2010, for example, is reduced by a mere 0.03 if we subtract the three retracted papers that were evaluated that year.
To summarize, we know there are problems with the F1000 Journal Factors. Equally, we think there are areas where it has clear advantages over other metrics. Some of the problems we can address; others we just have to live with and be frank about. We will work very hard to improve what we’re doing over the coming months and years, although it will probably never be completely watertight. Such is the nature of qualitative measurements.
We do actively encourage feedback from the scientific community: after all, if F1000 is to be of any use to researchers, we need to listen to what you say.
|
I’m really surprised that Editors are allowed to submit evaluations from their own journals – do you think this a sound idea? I have concerns about this. I’d not, except in very rare circumstances, have used reviewers from the same departments /institutions as the authors for journal pre-publication review, and considerations for evaluating for F1000 must be similar.
Thanks for your comment, Irene. That is something that’s caused a great deal of discussion over the last couple of days in the office here! We’re tending towards allowing it, but making it completely transparent–by clearly saying that the evaluating FM is an editor of that journal–but also excluding that evaluation from the rankings.
It will take a bit of work but we think that’s probably the best way of handling it. After all, editors should know what’s good in their journals, and it would be a shame to lose that input completely.
Thanks for the quick reply, Richard. I personally feel it could potentially damage F1000’s credibility – which would be a real shame – to include them in the mainstream listings even if they don’t get included in the rankings (will they also not give a rating, ‘must read’ etc?).
I’ll make sure your comments are noted. Thanks–it’s appreciated.
I don’t see why it’s a problem that editors pick articles in their journals, if they are not their own articles. F1000 used to be a tool to evaluate articles not journals, and editors are best placed to recommend an article that they have already read, reviewed, and evaluated in addition to other reviewers. I believe editors evaluating articles from their journals is much more ethically correct than groups of F1000 members, who are known friends or allies yet not co-authors, keep evaluating their friends’ articles–which happens all the time and cannot be tracked because “friendship” cannot be measured.
Hello James, thanks for that.
Have you got specific, concrete examples of groups of F1000 members only evaluating each others’ papers? Over and above what you’d expect in small fields where everybody knows everybody else? Rest assured, we’d take such accusations very seriously indeed.
As an F1000 Faculty member, I’ll be honest and say I don’t even know who is and isn’t and F1000 member. There is of course the declaration of interests box on the evaluation form. From the comments above, this is clearly being under used.
Thanks for taking the concern seriously, and I know you do take these accusations seriously but they are not specific to F1000 members. What I am referring to shouldn’t be surprising since it’s a common practice even in peer review, editorial boards, and study sections. There are groups of known allies, who informally plan to be represented in different journal boards and different study sections, and they are smart enough not to be caught. So, to answer your question, of course they do not “only” review each other, because they are smart enough.
Anyway, I was referring to a general problem in the scientific community, but I would be happy to privately email you specific examples that I have seen in the past years.
That’d be great, thanks James. We’ve heard the accusation many times, but nobody has been able to point to a specific instance.
David, thanks for that. Yes, we’re going to tighten up on adherence. We do have quite the code of conduct, as you know. We want to make it more explicit, and more apparent.