In machine learning, feature selection for generalized additive models (GAMs) is computationally expensive and challenging. This comes from the fact that in GAMs not only the inclusion or exclusion of a feature is a decision point –like in the case of linear models– but the non-linear form of the features that are selected for a given model. The increased complexity for the task means that best subset methods can be computationally expensive even with a feature space of only 8 features.
One of the most important drawbacks of existing methods for GAM feature selection, is the lack of parsimony. The models selected by the existing methods usually have a phenomenon called concurvity. Concurvity occurs when some non-linear terms in a model can be approximated by one or more of the other non-linear terms in the model. In our previous works, we proposed a hybrid genetic-improved harmony search algorithm (Hybrid Algorithm, HA) that applies thin plate splines in order to produce a best subset feature selector that is capable to find concurvity-free models.
Our previous research focused on developing the HA for GAMs and improving its expected runtime through parallelization. Some recent algorithms, like the mRMRe (De Jay et al., 2013) and the block HSIC Lasso (Climente-González et al., 2019) were used as benchmarks. We showed on real world datasets that our proposed HA results in more parsimonious models than the models proposed by the mRMRe and the block HSIC-Lasso algorithms. However, expected runtime of the HA is more substantial than that of the two benchmarks.
In this study, we investigate the performance of the HA against several other feature selection algorithms applied for GAMs. These algorithms can be separated to three clusters. One is the cluster of stepwise methods implemented with the help of Wood (2017) when the GAM applies thin plate splines. The second cluster is for regularization methods such as the COSSO (Lin – Zhang, 2006) and the penalized thin plate splines (Marra – Wood, 2011). The third cluster contains methods that are utilizing popular boosting techniques, like the GAMBoost algorithm (Schmid – Hothorn, 2008) or the Modified Backfitting procedure (Belitz – Lang, 2008). We also apply Recursive Feature Elimination (RFE) combined with a Random Forest learner as a benchmark algorithm that is not based on GAM learners.
The performance of these algorithms against the HA, mRMRe and HSIC-Lasso is tested on two real world datasets. In a smaller database with 8 features we investigate which none-redundant features are most important in predicting comprehensive strength of concrete girders. This dataset is mainly used to fine tune the parameters of the examined algorithms. Next, a more realistic case with 27 features is used. Here, we investigate which features are most significant in predicting the default of credit card clients.
We show that our proposed HA with the application of thin plate splines results in more parsimonious models than the models proposed by the other examined algorithms without having significantly lower predictive performance on a separate test set. Expected runtime are generally better for models proposed by the benchmark algorithms than the HA on a large dataset. However, from our results we can see that the expected runtime of HA is better than that of the GAMBoost and not significantly different than the expected runtime of the RFE combined with Random Forest.
Belitz, C., & Lang, S. (2008). Simultaneous selection of variables and smoothing parameters in structured additive regression models. Computational Statistics & Data Analysis, 53(1), 61-81.
Climente-González, H., Azencott, C. A., Kaski, S., & Yamada, M. (2019). Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data.
Bioinformatics, 35(14), i427-i435.
De Jay, N., Papillon-Cavanagh, S., Olsen, C., El-Hachem, N., Bontempi, G., & Haibe-Kains, B. (2013). mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics, 29(18), 2365-2368.
Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5),
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372-2387.
Schmid, M., & Hothorn, T. (2008). Boosting additive models using component-wise P-splines. Computational Statistics & Data Analysis, 53(2), 298-311.
Wood, S. N. (2017) Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC.
If you wish to receive a link for the zoom meeting on the day of the event, please send an email to Tamás Solymosi (tamas dot solymosi at uni dash corvinus dot hu)