Show simple item record

dc.contributor.authorStefansson, Petter
dc.contributor.authorIndahl, Ulf Geir
dc.contributor.authorLiland, Kristian Hovde
dc.contributor.authorBurud, Ingunn
dc.date.accessioned2020-08-04T13:43:57Z
dc.date.available2020-08-04T13:43:57Z
dc.date.created2020-01-15T11:30:10Z
dc.date.issued2019
dc.identifier.citationJournal of Chemometrics. 2019, 33 (11), 1-9.en_US
dc.identifier.issn0886-9383
dc.identifier.urihttps://hdl.handle.net/11250/2670824
dc.description.abstractFeature selection is a challenging combinatorial optimization problem that tends to require a large number of candidate feature subsets to be evaluated before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection can be an immensely time-consuming process and is often left inadequately explored. Here, we propose a simple modification to the conventional sequence of calculations involved when fitting a number of feature subsets to the same response data with partial least squares (PLS) model fitting. The modification consists in establishing the covariance matrix for the full set of features by an initial calculation and then deriving the covariance of all subsequent feature subsets solely by indexing into the original covariance matrix. By choosing this approach, which is primarily suitable for tall design matrices with significantly more rows than columns, we avoid redundant (identical) recalculations in the evaluation of different feature subsets. By benchmarking the time required to solve regression problems of various sizes, we demonstrate that the introduced technique outperforms traditional approaches by several orders of magnitude when used in conjunction with PLS modeling. In the supplementary material, we provide code for implementing the concept with kernel PLS regression.en_US
dc.language.isoengen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.titleOrders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data setsen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber1-9en_US
dc.source.volume33en_US
dc.source.journalJournal of Chemometricsen_US
dc.source.issue11en_US
dc.identifier.doi10.1002/cem.3141
dc.identifier.cristin1773517
cristin.unitcode192,15,6,0
cristin.unitcode192,15,1,0
cristin.unitcode192,0,0,0
cristin.unitnameSeksjon for realfag og teknologi
cristin.unitnameSeksjon for anvendte matematiske fag
cristin.unitnameNorges miljø- og biovitenskapelige universitet
cristin.ispublishedtrue
cristin.qualitycode1


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal