dc.contributor.author | Stefansson, Petter | |
dc.contributor.author | Indahl, Ulf Geir | |
dc.contributor.author | Liland, Kristian Hovde | |
dc.contributor.author | Burud, Ingunn | |
dc.date.accessioned | 2020-08-04T13:43:57Z | |
dc.date.available | 2020-08-04T13:43:57Z | |
dc.date.created | 2020-01-15T11:30:10Z | |
dc.date.issued | 2019 | |
dc.identifier.citation | Journal of Chemometrics. 2019, 33 (11), 1-9. | en_US |
dc.identifier.issn | 0886-9383 | |
dc.identifier.uri | https://hdl.handle.net/11250/2670824 | |
dc.description.abstract | Feature selection is a challenging combinatorial optimization problem that
tends to require a large number of candidate feature subsets to be evaluated
before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection
can be an immensely time-consuming process and is often left inadequately
explored. Here, we propose a simple modification to the conventional sequence
of calculations involved when fitting a number of feature subsets to the same
response data with partial least squares (PLS) model fitting. The modification
consists in establishing the covariance matrix for the full set of features by an
initial calculation and then deriving the covariance of all subsequent feature
subsets solely by indexing into the original covariance matrix. By choosing this
approach, which is primarily suitable for tall design matrices with significantly
more rows than columns, we avoid redundant (identical) recalculations in the
evaluation of different feature subsets. By benchmarking the time required to
solve regression problems of various sizes, we demonstrate that the introduced
technique outperforms traditional approaches by several orders of magnitude
when used in conjunction with PLS modeling. In the supplementary material,
we provide code for implementing the concept with kernel PLS regression. | en_US |
dc.language.iso | eng | en_US |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no | * |
dc.title | Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets | en_US |
dc.type | Peer reviewed | en_US |
dc.type | Journal article | en_US |
dc.description.version | acceptedVersion | en_US |
dc.source.pagenumber | 1-9 | en_US |
dc.source.volume | 33 | en_US |
dc.source.journal | Journal of Chemometrics | en_US |
dc.source.issue | 11 | en_US |
dc.identifier.doi | 10.1002/cem.3141 | |
dc.identifier.cristin | 1773517 | |
cristin.unitcode | 192,15,6,0 | |
cristin.unitcode | 192,15,1,0 | |
cristin.unitcode | 192,0,0,0 | |
cristin.unitname | Seksjon for realfag og teknologi | |
cristin.unitname | Seksjon for anvendte matematiske fag | |
cristin.unitname | Norges miljø- og biovitenskapelige universitet | |
cristin.ispublished | true | |
cristin.qualitycode | 1 | |