Refereeing guidelines

The current refereeing guidelines are provided at this link.

The main objective is to make refereeing a useful process for the authors in particular, and everybody in general, clearly showing that good science is built on an informed and polite (sometimes tough, but always diplomatic) exchanges of ideas.

Is there anything we should add or change to these guidelines in order to make the job of referees clearer, and the whole refereeing process more efficient at achieving its goals?

Let me contribute to the debate by a reminder of this March 2020 blog post on the subject. This is more about improving the procedure than just the guidelines.

I appreciate the idea of a Codebases journal which ‘publishes’ official documentation and repositories for scientific codebases of interest. This is, at the very least, a useful mechanism for archiving in the field and for allowing academic developers to get credit through the usual mechanisms of the academy – citations.

However, refereeing a codebase is an entirely different kettle of fish from a more traditional journal article. Most scientific codebases do not have professionally maintained build systems to guarantee easy installation and portability to platforms that the authors don’t use. It could easily take a potential referee half a day of hacking to simply get a large codebase to build. To check that every piece of a large library actually works as intended is clearly beyond the scope of voluntary review.

On the other hand, simply reviewing the associated documentation without trying to use the code makes little sense. For one thing, we are publishing the codebase in addition to the documentation and, most likely, the documentation will make little sense without simultaneously working with example calculations.

I think we need to make specific guidelines for what we expect reviewers to actually check / do when they agree to review a Codebase – and these may need to slightly adjust what we expect of a Codebase submission. I had an informal discussion with Anton Akhmerov regarding this some time ago and he laid out various levels of expectation:

There can be different approaches to review and accordingly different things software review could certify. To name a few options, a software paper review could evaluate whether :

  • the code is state of the art and in demand,

  • the software is innovative in some aspects,

  • the software fulfills a minimal checklist of criteria making it somewhat maintainable and reusable (this approach is implemented by https://joss.theoj.org/),

  • it’s bug-free and the documentation is complete.

The referee instructions and invitations should be tailored to what we expect the referees to assess in review. JOSS’ checklist-like review has no problem finding volunteer reviewers because the task is rather simple. As far as I know, now the instructions aren’t anywhere close to making it clear—so that’s something for the editorial college to work on.

My own take is that ‘publication’ in SciPost Physics Codebase should indicate:

  1. That the code has current scientific relevance in Physics;
  2. That the code can be compiled and executed without too much faffing by a technically competent practitioner in the area on at least one specified reference platform with appropriate dependencies;
  3. That the code has unit tests covering both its elementary functionality and some reasonable set of higher level science applications which the practitioner can easily run and check.
  4. That the documentation is intelligible to said practitioner and provides at least one non-trivial scientific worked example which can be reproduced.

This is already a fairly significant level of review to ask of a volunteer referee, but it’s hard to imagine much less without giving up on evaluating the code at any level. On the other hand, asking referees to look at the source itself as part of code review is a non-starter – not that heroic referees shouldn’t if they want to, but we should make it clear that it’s not expected.

If we further ask the referee to judge whether the unit tests have sufficient coverage to meet scientific standards, we push much of the demonstration of correctness onto the author and their development of a unit test framework.

Thoughts?