Vis enkel innførsel

dc.contributor.advisorLiland, Kristian Hovde
dc.contributor.advisorIndahl, Ulf Geir
dc.contributor.advisorAndersen, Per-Arne
dc.contributor.authorFinnøy, Isak
dc.date.accessioned2023-07-13T16:27:34Z
dc.date.available2023-07-13T16:27:34Z
dc.date.issued2023
dc.identifierno.nmbu:wiseflow:6839553:54592057
dc.identifier.urihttps://hdl.handle.net/11250/3078670
dc.description.abstractIn this thesis, we evaluated the performance of two generative models, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), from the open-source library Synthetic Data Vault (SDV), for generating synthetic Near Infrared (NIR) spectral data. The aim was to assess the viability of these models in synthetic data generation for predicting Dry Matter Content (DMC) in the field of NIR spectroscopy. The fidelity and utility of the synthetic data were examined through a series of benchmarks, including statistical comparisons, dimensionality reduction, and machine learning tasks. The results showed that while both CTGAN and TVAE could generate synthetic data with statistical properties similar to real data, TVAE outperformed CTGAN in terms of preserving the correlation structure of the data and the relationship between the features and the target variable, DMC. However, the synthetic data fell short in fooling machine learning classifiers, indicating a persisting challenge in synthetic data generation. With respect to utility, neither synthetic dataset produced by CTGAN or TVAE could serve as a satisfactory substitute for real data in training machine learning models for predicting DMC. Although TVAE-generated synthetic data showed some potential when used with Random Forest (RF) and K-Nearest Neighbors (KNN) classifiers, the performance was still inadequate for practical use. This study offers valuable insights into the use of generative models for synthetic NIR spectral data generation, highlighting their current limitations and potential areas for future research.
dc.description.abstract
dc.languageeng
dc.publisherNorwegian University of Life Sciences, Ås
dc.titleCan Tabular Generative Models generate realistic synthetic Near Infrared spectroscopic data?
dc.typeMaster thesis
dc.description.localcodeM-TDV


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel