The refreshing power of preprints

The incidental journalist

By Albert Cardona, December 27th, 2016.

Preprint archives have made traditional academic journals obsolete.

The role of a traditional academic journal is many-fold:

  1. To organize peer review.
  2. To archive academic work and distribute copies to repositories (libraries) both for local access and as a backup.
  3. To select manuscripts that are both relevant to the journal's readership (whatever that means, such as being the editor's pick) and that successfully pass peer review.

I argue that all of the these roles are either better served by preprint archives, or harmful and in need of dismissal, or take place anyway independently of the journals.

Peer review

It is true that peer-review improves papers. Peer-review takes place at many points throughout a manuscript's life. Peer-review will have generally taken place multiple times before the official peer-review commissioned by journal editors, by the expedient method of giving talks on unpublished data and, more formally, by requesting one's peers to read and comment on manuscript drafts. Peer-review even continues after a paper has received a journal's stamp of approval, in what is nowadays referred to as post-publication peer review: anybody who bothers to make public comments on published research. And of course as endorsements, reiterations and dismissals in papers published later on.

From the moment that a manuscript is sent to a journal, what changes is not the "review" part but the "peer" part: the authors are no longer able to restrict who reviews their manuscript. This is not unique to the journal-driven peer review: it is also the case in post-publication peer review. The work is out for universal scrutiny.

Soliciting unbiased reviews has been a major role of the journal route over the last 100 years or so. Whether successfully unbiased or not, and whether it's a useful filter at all, it's questionable [1], but at least, the reviewers are not chosen by the authors, which has merit. This role is in the process of being transferred to preprint servers like the bioRxiv and others, which offer the means for hosting and moderating commentary on uploaded manuscripts.

Post-publication peer-review was conducted, in ages past, mostly as private correspondence, verbal exchanges in conferences, and occasionally as letters to the editor in some journals that offer such a vehicle for commentary, in addition to referrals within subsequent published manuscripts by other authors. Only the last two can persist over time. Since the internet reached mass usage, and in particular since PubMed, the latter also allowed commentary on published papers, although I had never heard of it until a colleague pointed it out.

The question is then, what is so special about journal-commissioned peer-review?

There's nothing special about it, except that we academics have decided that anonymous peer-review is a significant stamp of approval. That, despite studies showing how the outcome of anonymous peer-review is not reproducible or reliable [1], and my personal and biased impression that anonymity is easily unveiled.

The journal-commissioned peer-review has one additional feature (or defect): to force authors to modify their manuscripts. For better [2] or for worse. And in almost all journals, the modifications transcur in a completely opaque way, for readers do not get to see neither the original manuscript nor the reviews or the reply to reviewers (eLife being a major exception, leading the way). Except, with preprint servers, authors are now uploading both versions: from before and from after closed-doors peer review. This increase in transparency brought about by the preprint servers' manuscript versioning feature is priceless. But authors are modifying their preprints in response to public and private feedback, therefore even this step is taken care of by preprints.

The issue remains for manuscripts in preprint servers that do not receive any public reviews or that are never cited. What to make of them? In my view, it is cheap to keep them and expensive to ban them. It is far better that a research finding articulated in a paper is released publicly than never. Either the manuscript will be ignored while still remaining discoverable by search engines, or its time will come: some findings require a whole field of research to shift position before they can be understood or accepted by the members of that field. In other words, there is a lot more to win in publishing all papers than there is to lose by publishing bad papers. Good papers have a way to float to the top regardless, and the bad ones to sink into oblivion. This is also true of peer-reviewed papers published in traditional journals.

The persistence problem

Publishing in a journal was traditionally the means for a manuscript to seek asylum, so to speak, from a publisher that commits to ensure its public availability for the foreseeable future, aided by libraries that act as independent repositories. Libraries act as backups, so that the modern equivalent of the burning down of the library of Alexandria cannot take down the collective knowledge of humanity.

Persistence of data over time is hard and expensive; it is certainly a skill. This role has now been transferred to preprint servers, pioneered by the arXiv.org. In addition, preprint archives serve the additional role of providing free access to the manuscript even when the journals put in place high barriers in the form of outrageous fees for the readers.

The filter problem

A journal that prints into paper has strong limits on the number of manuscripts that it can accept and the extent of each. This pressure leads to accepting less papers and to request that these are written more succinctly.

The limit on article length is being addressed by new initiatives such as the STAR methods section in Cell Press, which removes limits in the methods length by providing an online-only methods section. Similar "supplementary material" is available only in the online version of manuscripts but not on the printed pages, for many journals. These initiatives are at least a step in the right direction towards enabling the reproducibility of the research findings, albeit at the cost of making the printed journals worthless for an academic [3], and the printed copies incomplete, damaging the long-term storage aspect [4]. Online-only journals like eLife and PLoS Biology, among others, avoid this problem altogether by design.

The limit in the number of papers is pretty much unjustifiable. It is a text book example of the tragedy of the commons: all academics want their papers in Science and Nature, except that all academics would be better off if these glamour journals did not so strongly affect the decisions of search committees for recruiting young academics to faculty positions.

Preventing a well-written paper from being published merely because an editor fails to recognize its value, or because the reviewers fail to understand, or refuse to, the implications of the reported findings, or because there is not enough space for all desirable papers, is one of the shames of modern academia. I am reminded of the work of Lynn Margulis on the endosymbiosis theory of the origin of some cellular organelles such as mitochondria: the paper was rejected for years, until it was published in the Journal of Theoretical Biology, after which it proceeded to demolish prior bodies of knowledge and opened vast new fields of inquire. A preprint would have reached the eventual supporters just the same. Journal-based closed-doors peer review failed spectacularly.

The examples of delayed papers are the best advocate for preprints: the ban set back entire fields of research and should have never happened. Whereas flawed papers in, for example, the physics arXiv, have yet to be found to have done any harm other than wasting an infinitesimal amount of storage space. Given the biased, poor judgment that even the best humans exhibit at times, particularly when riddled with latent conflicts of interest or plain envy, the best we academics can do as a collective is to avoid the worst of our responses by designing them out of the process. Preprints are an excellent building block of this new technology stack supporting modern academic publishing.

Preprints are the way forward

All the above articulates why I am mostly and primarily reading preprints nowadays: I would like to assess the value of a study by myself, rather than read it through the filter of a tiny set of editors or peers.

Don't forget that, ultimately, you will have to evaluate by yourself the papers that you are reading. Regardless of publication venue [5] or of whether it underwent peer review [6].

And perhaps best of all: preprints are freely available, paid for by national grants to preprint servers or generous philanthropic donations. There should never be the need to break a law just to read a scientific research finding.

If there is an air of inevitability to preprints is because they are a better way to disseminate research findings. Just like the coal industry will shrink to serve the needs of steel foundries and pencil makers [7], preprints are displacing any other way to publish science. How and which journals will adapt and which will go the way of the dinosaurs remains to be seen, but I am rooting for Yann LeCun's Reviewer Entity model.

Notes

  1. An experiment on the reliability of peer review done by NIPS (annual conference on Neural Information Processing Systems, where many computer scientists publish their research findings) found that the success rate in gathering positive support from reviewers was indistinguishable from random. Quote:
    "In particular, about 57% of the papers accepted by the first committee were rejected by the second one and vice versa. In other words, most papers at NIPS would be rejected if one reran the conference review process (with a 95% confidence interval of 40-75%)."
  2. A wise editor once sent one of Einstein's manuscripts for peer-review without requesting permission from Einstein himself. The latter exploded in fury, and yet, the comments from the peer helped him find an error in his calculations. Scientists need more humility.
  3. One wonders who is the target audience of a printed journal if it is not academics. For the commentary-heavy journals such as Science and Nature, it is clear: for most readers, the papers add legitimacy, while the commentary add value. These readers are academic administrators, government agents, medical doctors, and the broader public. It's really a calamitous circumstance that the constraints on science outreach (a printed periodical) affect so much the lives of academics (by restricting the number of findings that can be reported on these journals, which are key to secure academic jobs).
  4. Sci-Hub's pirate access to academic journals, bypassing subscription barriers, has one negative side effect: scientists will read papers without access to the supplemental material. Banning supplemental material like the eLife journal does is a good mitigating strategy (eLife's supplemetary figures are an integral part of the manuscript). Other journals, like the Journal of Neuroscience, offers unlimited text for the results section and also ban supplemental figures.
  5. Richard Feynman's famous book, "Surely you are joking, Mr. Feynman", where he mocked those that judge books by the covers. The example he picked was about school board members who scored a book positively even though the book had all its pages printed blank by error, which led to question whether the scores for the other school books had been obtained by a similar method.
  6. The Retraction Watch would like a word with those who think that peer review is a guarantee of anything.
  7. Solar and wind power are proving every day to be cheaper and better for the environment and better for the political landscape than any established means of energy production. Local energy production and consumption remove the need for large installations for production and distribution (subject to government-led initiatives, which carries lobbying and other forms of corruption opportunities associated), removes maintenance costs of distribution networks and improves the landscape aesthetically (no more power lines), and removes ransom power from large companies or governments (the ability to prevent the population from accessing energy sources). Not to speak of the improvements in reducing mining waste disposal into rivers and seas, mountain-top removal, and other environmental atrocities conducted in the name of coal-based energy production. And I am not even getting into airborne particles causing lung disease, and the critical role of CO2 in our planet's so far moderate climate.