Show simple item record

dc.contributor.advisorLars-Gustav Snipen
dc.contributor.advisorKarin Lagesen
dc.contributor.authorAndersen, Terese Ryan
dc.date.accessioned2024-08-23T16:27:38Z
dc.date.available2024-08-23T16:27:38Z
dc.date.issued2024
dc.identifierno.nmbu:wiseflow:7110451:59111984
dc.identifier.urihttps://hdl.handle.net/11250/3147934
dc.description.abstractIn this study source attribution was combined with machine learning for the purposes of making models that can predict the source of new cases of infection caused by Listeria monocytogenes. L. monocytogenes causes the infection listeriosis in humans, and the main source of infection is through food. Although it is considered a low pathogenic bacterium, the mortality rate for infected humans makes it a public health issue. Listeriosis is particularly dangerous for the elderly, the immune suppressed and for pregnant individuals. A quick identification of the source of infection is key to stopping further spread of the bacteria. By using genomic data from L. monocytogenes with known sources, a machine learning model may be trained to classify the bacterial isolates by sources. The trained model can then predict the sources of new cases. The available information in the genomic data was also explored to investigate if it was diverse enough to be used for partitioning isolates by source. Machine learning has already shown potential for source attribution of L. monocytogenes and other pathogens in studies from other countries. The origin of the data set in this study was Norwegian and contained data of whole genome sequenced L. monocytogenes isolates. The possibility of separating the isolates and being able to predict the sources utilizing the genetic information were explored with different kinds of machine learning methods, representation of the genomes, and subsets of the genes in the data set. The results of this research suggest that allelic profiles from both core genome and whole genome Multi Locus Sequence Typing (MLST) methods gives input data that are diverse enough for machine learning models to use for source attribution. The machine learning method Random Forest could use the allelic profiles directly as input data and had good predicting performance for the isolates with food-associated sources in this study but had poorer performance for the isolates with not food-associated sources. The method Support Vector Machine needed scaling of the input data to predict well and had similar predicting performance as the Random Forest method. The last machine learning method in this study was a neural network which was the method with the highest use of time and computational resources. The neural network performed poorer than the other methods but showed potential and better predicting performance can possibly be obtained with more tuning to improve the model.
dc.description.abstract
dc.languageeng
dc.publisherNorwegian University of Life Sciences
dc.titleUsing machine learning for source attribution of Listeria monocytogenes
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record