Millions of online papers have disappeared

Authors - Martin Paul Eve (Crossref and Birkbeck, University of London)

| ISSN: 2162-3309 | Published by Iowa State University Digital Press |

Research Article

Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

Abstract

Introduction: Digital preservation underpins the persistence of scholarly links and citations through the digital object identifier (DOI) system. We do not currently know, at scale, the extent to which articles assigned a DOI are adequately preserved. 

Methods: We construct a database of preservation information from original archival sources and then examine the preservation statuses of 7,438,037 DOIs in a random sample. 

Results: Of the 7,438,037 works examined, there were 5.9 million copies spread over the archives used in this work. Furthermore, a total of 4,342,368 of the works that we studied (58.38%) were present in at least one archive. However, this left 2,056,492 works in our sample (27.64%) that are seemingly unpreserved. The remaining 13.98% of works in the sample were excluded either for being too recent (published in the current year), not being journal articles, or having insufficient date metadata for us to identify the source. 

Discussion: Our study is limited by design in several ways. Among these are the facts that it uses only a subset of archives, it only tracks articles with DOIs, and it does not account for institutional repository coverage. Nonetheless, as an initial attempt to gauge the landscape, our results will still be of interest to libraries, publishers, and researchers. 

Conclusion: This work reveals an alarming preservation deficit. Only 0.96% of Crossref members (n = 204) can be confirmed to digitally preserve over 75% of their content in three or more of the archives that we studied. (Note that when, in this article, we write “preserved,” we mean “that we were able to confirm as preserved,” as per the specified limitations of this study.) A slightly larger proportion, i.e., 8.5% (n = 1,797), preserved over 50% of their content in two or more archives. However, many members, i.e., 57.7% (n = 12,257), only met the threshold of having 25% of their material in a single archive. Most worryingly, 32.9% (n = 6,982) of Crossref members seem not to have any adequate digital preservation in place, which is against the recommendations of the Digital Preservation Coalition.

Keywords: digital preservation, persistent identifiers, scholarly communications

How to Cite:

Eve, M. P., (2024) “Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles”, Journal of Librarianship and Scholarly Communication 12(1). doi: https://doi.org/10.31274/jlsc.16288

Rights: © 2024 The Author(s). License: CC BY 4.0

Previous
Previous

Lawmakers on the House Armed Services Committee would once again block the Air Force’s bid to retire older F-22 Raptor fighter jets

Next
Next

Air Force launches reorganization, as Kendall warns ‘We are out of time’ to match China