Vis enkel innførsel

dc.contributor.advisorTomic, Oliver
dc.contributor.advisorFutsæther, Cecilia Marie
dc.contributor.authorNaqvi, Muhammad Muntazir
dc.date.accessioned2023-01-09T13:00:53Z
dc.date.available2023-01-09T13:00:53Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/11250/3041992
dc.description.abstractAlzheimer’s disease (AD) is a neurodegenerative disorder that progresses over time and results in gradual loss of cognitive abilities. It affects the patient to an extent that they become unable to perform daily routine tasks, eventually causing death. Alzheimer’s disease is a significant health issue and it has no cure. Early detection of AD at preclinical or non-symptomatic stage allows for treatments that can slow down the progression of the disease. One of the biomarkers of AD that are measurable as early as in pre-clinical stage are deposits of amyloid beta peptides between neurons. In this study, our objective is to use machine learning and build a classification model to predict the presence of amyloid beta deposits given the patients' health history, and results of medical and cognitive ability tests. We build on the work done previously with the same data, and we follow a data-centric approach. The data is divided into five blocks based on the similarities between features in each block. We then plan a set of 17 data iterations, with each iteration using a different combination of five data transformation steps, i.e. (i) standardization, (ii) data distribution transformation, (iii) feature selection, (iv) oversampling, and (v) manifold learning. We repeat these iterations on four data blocks and train a dynamic ensemble selection classifier for each resulting dataset. We use Matthews Correlation Coefficient (MCC) as the primary performance metric to measure model performance. We also report five other performance metrics (accuracy, area under the ROC curve, F1 score, precision, and recall) to provide a comprehensive picture of model performance. We see that data iterations giving the best performance on training data mostly include transformation of data to a normal distribution, feature selection, and oversampling. However, the performance on test data varies greatly with the type of data (data block) and it is not clear which data iteration gives the best performance. In addition, the predictive performance is not very satisfactory for nearly all of the models and they suffer from overfitting. We believe that more research is needed on this data to determine the best performing classification approach.en_US
dc.language.isoengen_US
dc.publisherNorwegian University of Life Sciences, Åsen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.subjectAlzheimer's diseaseen_US
dc.subjectData-centric machine learningen_US
dc.subjectDynamic ensemble selectionen_US
dc.titleMachine learning for detecting biomarkers of Alzheimer’s disease : data-centric approach with dynamic ensemble selectionen_US
dc.typeMaster thesisen_US
dc.subject.nsiVDP::Technology: 500en_US
dc.description.localcodeM-DVen_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal