In a three-piece blog series, we introduce you to our FAIR Data Station. In this first part, we explain how to get started with Data FAIRification – the easy way.
By the FAIR Data Platform / November 16, 2021
The ‘FAIR Guiding Principles for scientific data management and stewardship’ are built upon the use of machine-actionable metadata to find, access, interoperate, combine, and reuse data with minimal human intervention. The amount and level of detail of this metadata must be sufficient to allow for unambiguous interpretation of the associated data. To maximize the potential for reuse, Minimum information (MI) standards are used in the Life Sciences. MI standards consist of two parts.
- For each sample/assay and its associated data type there is a community-accepted checklist of (mandatory) reporting requirements.
- To ensure metadata machine-actionability, such data should be reported in a community-accepted data format. For example, dates should be supplied in the ISO8601 format (YYYY-MM-DD)
The data life cycle
In UNLOCK, data management is tightly linked with metadata management. We work according to the FAIR-by-design principles, which implies that project metadata is added and continuously updated from the start. This is not only crucial to obtain reusable data objects, but also because we believe that adding metadata from the start helps to increase the quality and reproducibility of research. Figure 1 represents the data life cycle. Several steps such as experiment planning, sample collection, processing, and analysis typically require human involvement. It is therefore essential that the metadata associated with these steps can easily be captured and integrated with the research workflows. To this end, we have developed the FAIR Data Station, a metadata ingestion platform that helps the researcher to improve the quality and safeguards the machine-actionability of experiment metadata (Figure 2): Data FAIRification the easy way!
Data FAIRification: A three-step process
The FAIR Data Station guides researchers in metadata management in a simple three-step process.
1. Selection of the appropriate metadata standard(s) resulting in the generation of a template spreadsheet in standard Excel format with a machine-actionable header.
- This Excel sheet is used for registration of sample and assay metadata. Sample and metadata registration is done offline.
2. Validation of sample and assay metadata content (completeness and format) according to requirements of the selected MI-standard.
- For this check, the Excel file is uploaded to the FAIR-Data Station and automatically checked for inconsistencies.
3. Generation of FAIR machine-actionable metadata in the RDF data model.
- Accepted Excel files will automatically be converted in a Resource Description Framework (RDF) data model that can be queried using the SPARQL query language.
Mininum Information standards
The FAIR Data Station can be used with any MI-standard. By default, it is loaded with the Minimum Information about any (x) Sequence (MIxS) standard developed by the Genomic Standards Consortium (see this Nature Biotechnology paper for motivations). The latest version of these MIxS standards can be obtained here. As mentioned above, sample and assay registration is done offline using the preformatted Excel spreadsheet. This setup offers advantages over web-based methods when dealing with time series, when you have many samples or multiple assays and when your experiments involves multiple researchers. Note that the workflow (Figure 2) allows for routine intermediate validation of registered samples and assays.
Accessing the FAIR Data Station
You can access the FAIR Data Station via this link. First, the project context metadata (project information, experimental aim and setup) is collected using the web interface (Figure 3) and this metadata is converted in the Investigation/Study/Assay (ISA) data model. Next, the appropriate MI-standard is selected. A MI-standard often consists of a checklist of mandatory, conditional mandatory, environment-dependent and optional reporting requirements. Mandatory items are selected by default and cannot be deselected. Here the researcher selects the relevant optional reporting metadata requirements. After completion of the webform, a preformatted multi-sheet Excel file is generated each representing an ISA level. The Sample and Assay worksheets hold sample details and analytical measurement details, and have machine-actionable column headers representing the mandatory and user-selected optional elements. Note that for convenience free-text comment columns can be added to these worksheets.