Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets

Stefansson, Petter; Indahl, Ulf Geir; Liland, Kristian Hovde; Burud, Ingunn

dc.contributor.author	Stefansson, Petter
dc.contributor.author	Indahl, Ulf Geir
dc.contributor.author	Liland, Kristian Hovde
dc.contributor.author	Burud, Ingunn
dc.date.accessioned	2020-08-04T13:43:57Z
dc.date.available	2020-08-04T13:43:57Z
dc.date.created	2020-01-15T11:30:10Z
dc.date.issued	2019
dc.identifier.citation	Journal of Chemometrics. 2019, 33 (11), 1-9.	en_US
dc.identifier.issn	0886-9383
dc.identifier.uri	https://hdl.handle.net/11250/2670824
dc.description.abstract	Feature selection is a challenging combinatorial optimization problem that tends to require a large number of candidate feature subsets to be evaluated before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection can be an immensely time-consuming process and is often left inadequately explored. Here, we propose a simple modification to the conventional sequence of calculations involved when fitting a number of feature subsets to the same response data with partial least squares (PLS) model fitting. The modification consists in establishing the covariance matrix for the full set of features by an initial calculation and then deriving the covariance of all subsequent feature subsets solely by indexing into the original covariance matrix. By choosing this approach, which is primarily suitable for tall design matrices with significantly more rows than columns, we avoid redundant (identical) recalculations in the evaluation of different feature subsets. By benchmarking the time required to solve regression problems of various sizes, we demonstrate that the introduced technique outperforms traditional approaches by several orders of magnitude when used in conjunction with PLS modeling. In the supplementary material, we provide code for implementing the concept with kernel PLS regression.	en_US
dc.language.iso	eng	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.source.pagenumber	1-9	en_US
dc.source.volume	33	en_US
dc.source.journal	Journal of Chemometrics	en_US
dc.source.issue	11	en_US
dc.identifier.doi	10.1002/cem.3141
dc.identifier.cristin	1773517
cristin.unitcode	192,15,6,0
cristin.unitcode	192,15,1,0
cristin.unitcode	192,0,0,0
cristin.unitname	Seksjon for realfag og teknologi
cristin.unitname	Seksjon for anvendte matematiske fag
cristin.unitname	Norges miljø- og biovitenskapelige universitet
cristin.ispublished	true
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: OrdersOfMagnitude.pdf
Størrelse:: 914.0Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Journal articles (peer reviewed) [4951]
Publikasjoner fra Cristin - NMBU [5893]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal