Tsetlin machine for classifying genetic data from sea-floor species
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3147984Utgivelsesdato
2024Metadata
Vis full innførselSamlinger
- Master's theses (RealTek) [2009]
Sammendrag
With the amount of genetic data we can extract from nature with modern sequencing technology, there is a growing need for tools to help classify and analyze this data. Machine learningalgorithms like Random Forest and Artificial Neural Networks are already in use in this field ofbioinformatics.Tsetlin Machine is a new type of machine learning that has shown much promise in DNAclassification. It uses binary representation and logic that are close to how a computer operatesto create models. This thesis will try to test the Tsetlin Machine’s ability to classify genetic data.A database with the DNA of 709 species commonly found in deep-sea sediments that werepicked based on the results of the AQUAeD project. Will be split up into different datasets.The Tsetlin Machine, together with a random forest model, a Convolutional neural network,and a model that counts the number of GC bases, gets these datasets and tries to classify different classes on multiple taxonomic ranks. They are then evaluated based on the accuracy oftheir classification and the speed of training.The results show that the Tsetlin Machine has great promise in this field and acquired similarscores to the Random Forest Classifier and the convolutional Neural Network in accuracy andspeed.
