Comparison of multivariate methods to predict the quality of drinking water in Norway
Master thesis
View/ Open
Date
2015-08-03Metadata
Show full item recordCollections
- Master's theses (KBM) [927]
Abstract
Water quality in the Water Distribution System (WDS) varies over time. The quality of water in the Water Distribution System (WDS) is measured through Heterotrophic Plate Count (HPC) as an indicator organisms. Parameters such as color, pH, turbidity, conductivity, temperature, organic matters as well as the components of water distribution network system such as generic pipes and their ages, lubricants and storage tanks are linked with water quality. For multivariate modelling of these parameters data were collected from Norwegian Institute of Public Health (NIPH) as yearly average of HPC including physical, chemical and microbial water quality parameters.
Multivariate statistical methods have been applied to predict the quality of drinking water in water distribution system. Model such as Multiple Linear Regression (MLR), Principal Component Regression (PCR) and Partial Least Square Regression (PLSR) methods are adopted to identify the factors that affect the HPC in water distribution network system and consequently the quality of the water. Due to large number of insignificant variables a subset model was chosen using the criteria of Mallow’s Cp and Adj R2. The fitted models were validated through Leave One Out (LOO) cross validation method. Best subset model was performed well on both training and test data set but still suffered from multicollinearity. As an alternative approach PLSR model with three latent components which is predicted closer than PCR model with seven components. The number of components are chosen through prediction error during cross validation.