Model code and data policy

GMD code and data policy

The GMD code and data policy is fully compliant with the Copernicus data policy. Here we explain in particular the requirements in the context of GMD's focus on code and data directly related to numerical model development.

In this document, code refers to computer instructions and algorithms made available as plain text. Here, data refers to any other information that is found outside of the main body of the manuscript and is required to either fully appreciate or reproduce the results presented in the manuscript.

Core principles

Every paper must include a section at the end of the paper before the "Acknowledgements" entitled "Code and data availability".

  1. This section must include citations for the persistent public archives of the precise versions of all of the code and data associated with the paper. The generic means to access other versions of the code and data as well as the licence of the code should also be explained. The licence should conform to the Open Source Definition1. Suitable licences2 are for example GPL3 or MIT4.
  2. Where the authors cannot, for reasons beyond their control, publicly archive part or all of the code and data associated with a paper, they must clearly state the restrictions. They must also provide confidential access to the code and data for the editor and reviewers in order to enable peer review. The arrangements for this access must not compromise the anonymity of the reviewers. All manuscripts which do not make code and data available at this level are to be rejected. Where only part of the code or data is subject to these restrictions, the remaining code and/or data must still be publicly archived. In particular, authors must make every endeavour to publish any code whose development is described in the manuscript.

Code and data access must be provided at the time that the discussion paper is submitted. Embargoes, whether pending acceptance or for a defined period, are not acceptable.

The code and data associated with a paper which are subject to the above requirements include, depending on the paper type, the following:

  1. the source code for the complete model or module or other coded product described in the paper (must be provided for model description, development and technical, and methods for assessment paper types);
  2. the manual and any other model documentation (applies to model description, development and technical, and methods for assessment, to the extent the editor considers applicable);
  3. all configuration files, boundary conditions, and input data (must be provided for experiment description papers and any other papers in which results from model runs are reported);
  4. data sets for forcing of models or comparison with model output (must be provided for papers describing such data sets or for papers in which model output are compared with such data);
  5. preprocessing, run control and postprocessing scripts covering every data processing action for all the results reported in the paper (applies for all papers, to the extent the editor considers applicable).

In every case, the citation from the paper must identify the exact version of the code and/or data used.

Although the code and data will not be reviewed formally, the editor and reviewers are free to make general comments on any code or data, if they so wish. During the review process, the ease of model download, compilation, and running of test cases may be assessed.

Archive standards

A frozen version of the code and data as developed in the paper must be archived. Usually, a third-party archive is preferable. In some cases, such as when the code is a fragment from a larger model, authors may include the code in the supplement to the paper. Third-party archives must have the following:

  1. institutional support providing reasonable confidence that the material will remain available for many years/decades
  2. mechanisms preventing the depositor of the material from unilaterally removing it from the archive
  3. mechanisms for identifying the precise version of the material referred to in a persistent way. This will usually be a DOI.

Where code and data change during the revision process of the manuscript, the updated versions must also be archived. Authors must take care that the results in revised manuscripts are correctly associated with the corresponding archived data (with different DOIs referenced in the submitted and final manuscripts in cases where data have changed).

Many GMD authors find Zenodo5 a suitable archival location. Zenodo's GitHub integration6 makes archiving particularly easy for the large proportion of authors who manage their code using Git. Authors who need to archive a single documentation file, such as a technical report, may find the arXiv suitable7. Authors whose data are too large to be archived at Zenodo will need to identify a suitable alternative. Appropriate choices may depend on the topic of the paper, the funder of the research, and the country where the research was conducted. One of the repositories listed by Springer Nature8, PLOS9 or ESSD10 may be suitable. In any case, the requirements above must be satisfied.

Project or institution websites and online revision control sites such as GitHub11, GitLab12 or Bitbucket13 are made for code development but not suitable for archiving frozen code versions. Authors are encouraged to provide links to a website or revision control system as a preferred download location, so long as this is in addition to, and not instead of, the citation of an archive.

Template for code and data availability section

The following code and data availability section meets the requirements of this policy for papers focussed on development of models or development of methods for assessment of models. Other wordings are, of course, possible so long as the required information is all present. For larger models it is very helpful if authors can identify the location of the main parts of the code that are discussed in the manuscript. For experiment description papers, evaluation papers, and some technical and development papers where details for a variety of different data sets or models are required, the section will be considerably longer.

The current version of model is available from the project website: url under the licence name licence. The exact version of the model used to produce the results used in this paper is archived on Zenodo (citation), as are input data and scripts to run the model and produce the plots for all the simulations presented in this paper (citation).

In line with the FORCE11 Joint Declaration of Data Citation Principles, the data citations should appear in the bibliography and be referenced in the text in the same way as other publications (Martone, 2014).

Useful additions from Copernicus Publications' data policy

The output of research is not only journal articles but also data sets, model code, samples, etc. Only the entire network of interconnected information can guarantee integrity, transparency, reuse, and reproducibility of scientific findings. Moreover, all of these resources provide great additional value in their own right. Hence, it is particularly important that data and other information underpinning the research findings are "findable, accessible, interoperable, and reusable" (FAIR) not only for humans but also for machines.

Video supplements, video abstracts, International Geo Sample Numbers, and other digital assets should be linked to the article through DOIs in the assets tab. With Earth System Science Data (ESSD) Copernicus Publications provides a journal dedicated to the publication of data papers, including peer review of data sets. If seeking to publish novel data sets, authors may find ESSD a more appropriate journal than GMD.

Copernicus Publications follows best practice of the Joint Declaration of Data Citation Principles initiated by FORCE 11.

In addition to promoting these data citation principles, Copernicus Publications is a signatory of the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS) commitment statement and the Enabling FAIR Data Commitment Statement in the Earth, Space, and Environmental Sciences.

References

Martone, M. (Ed.): Data citation synthesis group: Joint declaration of data citation principles, FORCE11, https://doi.org/10.25490/a97f-egyk, 2014.

1 http://www.opensource.org/docs/osd (last access: May 2019)

2 http://www.opensource.org/licenses/alphabetical (last access: May 2019)

3 https://opensource.org/licenses/GPL-3.0 (last access: May 2019)

4 https://opensource.org/licenses/MIT (last access: May 2019)

5 https://zenodo.org (last access: May 2019)

6 https://guides.github.com/activities/citable-code/ (last access: May 2019)

7 https://arxiv.org (last access: May 2019)

8 https://doi.org/10.6084/m9.figshare.1434640.v11 (last access: May 2019)

9 https://fairsharing.org/recommendation/PLOS (last access: May 2019)

10 https://www.earth-system-science-data.net/for_authors/repository_criteria.html (last access: May 2019)

11 https://github.com (last access: May 2019)

12 https://gitlab.com (last access: May 2019)

13 https://bitbucket.org (last access: May 2019)