Bioinformatic approaches for the prediction of seafloor ecological state from Nanopore 16SrRNA gene data
Doctoral thesis
Accepted version

View/ Open
Date
2024Metadata
Show full item recordCollections
- Doctoral theses (KBM) [136]
Abstract
The increasing anthropogenic pressures, particularly from industries and activities like intensive aquaculture affect marine benthic ecosystems. This context requires immediate implementation of frequent and effective environmental DNA monitoring methods. Technologies such as Nanopore sequencing provides an advanced and quicker analysis than the currently employed methodologies. However, there are various challenges, such as the need for an environment-specific database, that must be addressed to employ these monitoring strategies.
This thesis aims to mitigate these challenges and then predict the ecological state of seafloor samples collected across different ecological gradients. The study introduces METASEED, a novel method that combines both amplicon and metagenome data for reconstructing the 16S rRNA marker gene sequence, which has proven successful and has outperformed similar existing approaches (Paper 1). The METASEED and other existing approaches were implemented to generate a targeted database for environmental samples, consisting of 14,545 16S rRNA gene sequences (Paper 2). The comparative database analysis shows that strategies used for database creation are efficient in enhancing the AQUAeD-DB database diversity, producing consistent, positively correlated results for both Illumina and Nanopore read counts better than other database (median correlation coefficient: 0.50). The database also had more sea-floor associated taxa than other database since it was developed from seafloor samples.
The ecological status of seafloor sediments was predicted by implementing AQUAeD-DB database to assign Illumina and Nanopore reads using two different tools and three machine learning algorithms (Paper 3). The predictions based on Illumina and Nanopore data were comparable, showing similar prediction errors. Statistical analysis supported these findings, as there were no significant effects between the technologies used. There were significant effects between ML algorithms, with PLS outperformed Random Forest and LASSO. The feature selection using LASSO reduced the matrix dimension immensely and halved the prediction error. A high correlation was presented between observed and predicted nEQR values, with a Pearson correlation coefficient of 0.98 for Illumina and 0.95 (mean prediction error: ±0.04) for Nanopore data (mean prediction error: ±0.06).
This study concludes that in order to accelerate the benthic monitoring practices, Nanopore-based analysis can be a viable approach. However, it also underlines the necessity for standardizing the bioinformatic approaches and databases, which may need to be tailored for environmental samples to achieve optimal performance. Though here Nanopore provides comparable ecological prediction results to Illumina, it can still benefit from using Illumina as a reference. Det økende menneskeskapte presset, spesielt fra industrier og aktiviteter som intensiv akvakultur, pa virker de marine bentiske økosystemene. Dette krever implementering av raske og effektive eDNÅ-overva kingsmetoder. Teknologier som Nanopore-sekvensering tilbyr raskere analyse enn de na værende metodene. Det innebærer imidlertid nye utfordringer, for eksempel behovet for miljøspesifikke databaser.
Denne oppgaven adresserer disse utfordringene ved a bestemme den økologiske tilstanden til havbunnsprøver pa tvers av ulike økologiske gradienter. Oppgaven introduserer METÅSEED, en ny metode som kombinerer ba de amplikon- og metagenomdata for a rekonstruere 16S rRNÅ-gensekvenser (Årtikkel 1). METÅSEED og andre eksisterende metoder ble implementert for a generere en ma lrettet database (ÅQUÅeD-DB) for miljøprøver, besta ende av 14 545 16S rRNÅ-gensekvenser (Årtikkel 2). Vi utførte en sammenligning av ÅQUÅeD-DB og andre eksisterende databaser. Denne analysen viser at strategiene vi brukte til a lage ÅQUÅeD-DB forbedrer databasens artsmangfold, samt gir konsistente resultater for ba de Illumina og Nanopore signaler sammenlignet med andre databaser (median korrelasjonskoeffisient: 0,50).