As a working example here we have used existing raw 16S rRNA gene data from the DIABIMMUNE project and NG-Tax 2.0 for data analyses.
Raw amplicon data of over 1800 samples was downloaded from the project and automatically ingested by the UNLOCK infrastructure and stored according to the ISA standard. Metadata of these samples was also captured.
The amplicon data was automatically analysed with NG-Tax 2.0 to generate Amplicon Sequence Variants (ASV). Using the RDF data model, these ASV’s are automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance. The graph database can be directly queried, allowing for comparative analyses over thousands of samples. Examples are given below.
The queries used to generate the examples shown can be embedded in or be part of standard operating procedures (SOPs). For instance, further post-processing can be done using structured data analysis processes integrated in Jupyter Notebooks.