Missing Value Imputation and Survival Analysis for Treatment Outcome Prediction in High-Grade GEP NEN
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3147985Utgivelsesdato
2024Metadata
Vis full innførselSamlinger
- Master's theses (RealTek) [2009]
Sammendrag
The objective of the thesis was to create a reliable model for outcome prediction in gastrointestinal cancers using data with incomplete variables. This was a two-step pro- cess. The first part involved analysis and review of literature on missing values, with a separate experiment conducted to support decisions. The second step involved prepro- cessing data with assistance from expert knowledge and conducting survival analysis. We found that imputing missing values is always better than discarding information in variables and samples. This is especially important with a small sample size. k-Nearest Neighbor imputation provided accurate single imputations in the experiment and had the most promising impact on the survival models. The two survival models, Coxnet and Component-Wise Gradient Boosting, provided the highest test concordance, with the latter having the lowest integrated Brier score and time-dependent Brier score We argue that the choice of model depends on the application, as a model may excel in one metric and be less effective in another metric. If the intention is accurately predicting the order of events, the model that maximizes concordance should be used. Conversely, if accurate modelling of the survival times is of interest, then the model maximizing the integrated Brier score should be used. We also found that Number of Courses, Ki-67, and NSE were identified as having the most average importance across the four survival models. Additionally, features WHO Perf Stat, Ki-67, and Albumin were identified as equally important, consistent with the results reported by Jenul et al.(2023).
