The FAIR data platform

Monitoring the physiological states of a microbial community and exploration of inter-microbial interactions within requires a data management system that is able to share information and services at and advanced level  and allows for tight integration of wet-dry lab approaches. The basic functionality of such a system includes data and metadata (i) collection, (ii) integration and (iii) delivery.

Maintaining a high degree of data interoperability is key and requires automatic integration of laboratory process execution (LIMS) data, collected (Omics) assay data and associated experimental meta-data in a Findable Accessible, Interoperable and Reusable (FAIR) format. Application of these four foundational principles will allow researchers to extract maximum benefit from the research investments made. 

Platform Technical details

UNLOCK Knowledge management consist of four parts: 

  • An integrated Rule-Oriented Data management System (iRODS) takes care of the collected (raw) assay data, transformed data and meta-data.
  • In the UNLOCK iRODS implementation, data files and folder are hierarchically organized through implementation of the Investigation/Study/Assay (ISA) format, an open general-purpose framework to collect and communicate complex metadata. In this set-up an ‘Investigation’ is collection of experiments revolving around a set of common research questions. The Investigation folder thus forms the root of a set of hierarchically organized folders and files containing data and meta-data derived from experiments related to the research questions. 
  • Experimental design meta-data is used to: (i) automatically create the appropriate ISA folder structure at the start of the Investigation and (ii) automatically start data crunching when raw data is obtained.
  • Standardized workflows and container technology is used to transform the raw data in information (see Figure below).

Maintenance of the UNLOCK iRODs infrastructure and long-term preservation of data generated within the UNLOCK infrastructure is outsourced to SURFsara.

A schematic representation of the data infrastructure used within UNLOCK. The iRODs data management system captures the experimental data streams. To enable the FAIR by Design principles, element- and data-wise experimental metadata generated by the lab equipment used and other required experimental meta-data is automatically linked with the data-streams and permanently stored within the iRODS infrastructure. High throughput analysis of the data is done using a scalable cloud-based infrastructure and dockerized open source applications. Compute results and corresponding metadata are stored in the iRODS platform using the ISA data model. Further post-processing can be done using structured data analysis processes integrated in Jupyter Notebooks.

Most used application pipelines. Examples and technical details