Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets

Stefansson, Petter; Indahl, Ulf Geir; Liland, Kristian Hovde; Burud, Ingunn

Stefansson, Petter; Indahl, Ulf Geir; Liland, Kristian Hovde; Burud, Ingunn

Peer reviewed, Journal article

Accepted version

View/Open

OrdersOfMagnitude.pdf (914.0Kb)

URI

https://hdl.handle.net/11250/2670824

Date

2019

Metadata

Show full item record

Collections

Journal articles (peer reviewed) [4897]
Publikasjoner fra Cristin - NMBU [5826]

Original version

Journal of Chemometrics. 2019, 33 (11), 1-9. 10.1002/cem.3141

Abstract

Feature selection is a challenging combinatorial optimization problem that

tends to require a large number of candidate feature subsets to be evaluated

before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection

can be an immensely time-consuming process and is often left inadequately

explored. Here, we propose a simple modification to the conventional sequence

of calculations involved when fitting a number of feature subsets to the same

response data with partial least squares (PLS) model fitting. The modification

consists in establishing the covariance matrix for the full set of features by an

initial calculation and then deriving the covariance of all subsequent feature

subsets solely by indexing into the original covariance matrix. By choosing this

approach, which is primarily suitable for tall design matrices with significantly

more rows than columns, we avoid redundant (identical) recalculations in the

evaluation of different feature subsets. By benchmarking the time required to

solve regression problems of various sizes, we demonstrate that the introduced

technique outperforms traditional approaches by several orders of magnitude

when used in conjunction with PLS modeling. In the supplementary material,

we provide code for implementing the concept with kernel PLS regression.

Journal

Journal of Chemometrics

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal