Technical details amplicon analysis

NG-Tax 2.0 allows FAIR high-throughput analysis and classification of marker gene amplicon sequences. In this article, we describe the performance of NG-Tax 2.0. Moreover, we demonstrate its use with examplary data from the DIABIMMUNE project.

By Jasper Koehorst / July 7, 2021

KEY MESSAGES

Technical difficulty

 5/5

NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences. These include bacterial and archaeal 16S ribosomal RNA (rRNA), as well as eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. The framework can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input. From these, it generates de novo Amplicon Sequence Variants (ASV). Subsequently, using the RDF data model, ASVs can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance. Hereby, it achieves the level of interoperability required to utilize such data to its full potential.

Analysis of the data

Subsequently, the graph database can be directly queried, allowing for comparative analyses of over thousands of samples. It is, moreover, connected with an interactive Rshiny toolbox for analysis and visualization of (meta)data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. This file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed as well as classification confidence scores. Finally, it is backwards compatible. In summary, the figure below describes the whole NG-Tax 2.0 workflow.

Schematic figure describing NG Tax 2.0 pipeline — In summary, the NG-Tax 2.0 workflow consists of four main steps: First, barcode and primer filtering (A), followed by de novo OTU-picking of ASV sequences, artefact filtering, correction for the impact of error reads on ASV relative abundance estimates and taxonomic inference (B). The next step (C) is ASV object serialization and storage, in which ASV sequences, taxonomic inferences and data provenance (including library and sample names and used settings) are exported and stored as ASV objects in an RDF triple store graph database. Optionally, these are exported in the Biom 1.0 file format. Finally (D), the downstream analysis tool box is used to directly query and analyse the ASV data and meta-data through the SPARQL endpoint. Additionally, the Rshiny toolbox directly provides standard statistics and visualizations using predefined SPARQL queries.

Performance and availability

We compared the performance of NG-Tax 2.0 with DADA2 using the plugin in the QIIME 2 analysis pipeline. To this aim, we obtained and evaluated fourteen 16S rRNA gene amplicon mock community samples from the literature. As a result, precision of NG-Tax 2.0 turned out to be significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2. Meanwhile, recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. Under the MIT License, you can freely access the code, ontology, a Galaxy platform implementation, the analysis toolbox, as well as tutorials and example SPARQL queries.

Working example: DIABIMMUNE project

As a working example, we used existing raw 16S rRNA gene data from the DIABIMMUNE Microbiome project and NG-Tax 2.0 for data analyses. To this end, we downloaded raw amplicon data of over 1800 microbial samples from the project. Next, the data was automatically ingested by the UNLOCK infrastructure and stored according to the ISA standard. We also captured the metadata of these samples. Then, NG-Tax 20 automatically analysed the amplicon data to generate ASVs as we described above and exemplified below. The queries used to generate the examples shown can also be embedded in or be part of standard operating procedures (SOPs). Further post-processing can take place using structured data analysis processes integrated in Jupyter Notebooks.

Interesting links

Do you want to learn about more application of NG-Tax 2.0? Via this link, you can read articles that have cited NG-Tax 2.0. This blogpost comprises a short summary of the research paper below, which you can access below for more technical details or cite as:

Analysis of the data

Performance and availability

Working example: DIABIMMUNE project

Interesting links

Do you want to learn about more application of NG-Tax 2.0? Via this link, you can read articles that have cited NG-Tax 2.0. This blogpost comprises a short summary of the research paper below, which you can access below for more technical details or cite as:

Finally, this blogpost is linked to our FAIR Data Platform, of which you can visit the platform page for more information.

Please share this

Subscribe to our Newsletter

Analysis of the data

Performance and availability

Working example: DIABIMMUNE project

Interesting links

Do you want to learn about more application of NG-Tax 2.0? Via this link, you can read articles that have cited NG-Tax 2.0. This blogpost comprises a short summary of the research paper below, which you can access below for more technical details or cite as:

Finally, this blogpost is linked to our FAIR Data Platform, of which you can visit the platform page for more information.

Please share this

Posts