In a three-piece blog series, we introduce the concept of our FAIR Data Station. In the previous (first) part, we explained how to get started. This second part further clarifies why you should bother about being FAIR.
By the FAIR Data Platform / November 23, 2021
Why bother being FAIR
Beneficiaries of NWO, EU Horizon 2020 and of many other subsidized projects are required to facilitate access, reuse and preservation of research data generated during their research work. To this end, a Data Management Plan needs to be developed and open access to research data needs to be provided, if possible. In general, according to the EU two main routes towards open access to publications exist, both equally valid:
- Self-archiving or green open access. The published work or the final peer-reviewed manuscript and associated data that has been accepted for publication is made freely and openly accessible by the author, or a representative, in an online repository.
- Open access publishing or gold open access. In this case the published work and associated (meta)data is made available in open access mode by the publisher immediately upon publication. Note that that for both green and gold open access publishing the “publications” must be “machine-readable”. This means that is published in a format that can be used and understood by a computer. Therefore, they must be stored in text file formats that are either standardised or otherwise publicly known, so that anyone can develop new tools for working with the documents and data.
From a structured overview to a Data Management Plan
To make data FAIR, a Data Management Plant needs to be developed and open access to research data needs to be provided. For omics-related publications, for instance, it is custom to publish an Accession number for your data deposited at the European Nucleotide Archive (ENA). There is a minimum amount of information required during ENA registration and all samples must conform to a defined checklist of expected metadata values. The most suitable checklist for sample registration depends on the type of the sample and ENA provides a list of suitable MI-standards. Many of these standards are in fact the Minimum Information about any (x) Sequence (MIxS) standards developed by the Genomic Standards Consortium.
In other words, the Excel files generated and validated by the FAIR Data station do not only provide you with a structured overview of your samples and assays. It can also be directly as used an implementation of your DMP and assist in submitting your (omics) data to a public repository.
But there is more. In fact, with no extra effort a research group can build its own Semantic database which can be searched for Projects (Investigations), Studies and Assay metadata from other group members opening the way for efficient reuse of data generated in these studies. How? Read the next and final blog of this series.