REGRESSION ESTIMATION AND FEATURE SELECTION USING MODIFIED CORRELATION-ADJUSTED ELASTIC NET PENALTIES

Authors

  • Olayiwola Babarinsa
    Federal University Lokoja image/svg+xml
    https://orcid.org/0000-0002-3569-0828
  • Helen Edogbanya
    Department of Mathematics, Federal University Lokoja, P.M.B. 1154, Lokoja, Nigeria
  • Ovye Abari
    Department of Computer Science, Federal University Lokoja, P.M.B. 1154, Lokoja, Nigeria
  • Isaac Adeniyi
    Department of Statistics, Federal University Lokoja, P.M.B. 1154, Lokoja, Nigeria

Keywords:

Variable selection, Regularization, High-dimensional data, Grouping effect, Machine Learning, LASSO

Abstract

Regularized regression techniques such as the least absolute shrinkage and selection operator (LASSO), elastic-net, and the type 1 and type 2 correlation adjusted elastic-net (CAEN1 and CAEN2 respectively) are used for simultaneously carrying out variable selection and estimation of coefficients in machine learning. Modified estimators based on the CAEN1 and CAEN2 are proposed in this study by rescaling the estimates to undo the double shrinkage incurred due to the application of two penalties. The scale factors are derived by decomposing the correlation matrix of the predictors. The derived scale factors, which depend on the magnitude of correlations among the predictors, ensure that the elastic-net is included as a special case. Estimation is carried out using a robust worst-case quadratic solver algorithm. Simulations show that the proposed estimators referred to as corrected correlation adjusted elastic-net (CCAEN1 and CCAEN2) perform competitively with the CAEN1, CAEN2, LASSO, and elastic-net in terms of variable selection, estimation and prediction accuracy with CCAEN1 yielding the best results when the number of predictors is more than the number of observations and CCAEN2 producing the best performance when there is grouping effect, where highly correlated predictors tend to be included in or excluded from the model together. Applications to two real-life datasets further demonstrate the advantage of the proposed methods for machine learning.

Author Biography

Olayiwola Babarinsa

Department of Mathematics

Federal University Lokoja

P.M.B 1154

Dimensions

Algamal, Z. Y. (2015). Penalized poisson regression model using adaptive modified elastic net penalty. Electronic Journal of Applied Statistical Analysis, 8(2), 236-245.

Anbari, M. E., & Mkhadri, A. (2014). Penalized regression combining the L 1 norm and a correlation based penalty. Sankhya B, 76, 82-102.

Babarinsa, O., Sofi, A. Z. M., Mohd, A. H., Eluwole, A., Sunday, I., Adamu, W., Daniel, L. (2022). Note on the history of (square) matrix and determinant. FUDMA JOURNAL OF SCIENCES, 6(3), 177-190.

Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1), 183-202.

Biecek, P., & Burzykowski, T. (2021). Explanatory model analysis: explore, explain, and examine predictive models: Chapman and Hall/CRC.

Bondell, H. D., & Reich, B. J. (2006). Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR. Retrieved from

Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The annals of statistics, 24(6), 2350-2383.

Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap method. Journal of the American statistical Association, 92(438), 548-560.

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap: Chapman and Hall/CRC.

Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360.

Fan, J., & Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. arXiv preprint math/0602133.

Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso. Journal of computational and graphical statistics, 7(3), 397-416.

Garba, W., Yahya, G., & Aremu, M. (2016). Multiclass Sequential Feature Selection and Classification Method for Genomic Data. Blood, 7(10).

Grandvalet, Y., Chiquet, J., & Ambroise, C. (2012). Sparsity by Worst-Case Penalties. arXiv preprint arXiv:1210.2077.

Hanke, M., Dijkstra, L., Foraita, R., & Didelez, V. (2024). Variable selection in linear regression models: Choosing the best subset is not always the best choice. Biometrical Journal, 66(1), 2200209.

Hapfelmeier, A., Babatunde, W., Yahya, R. R., & Ulm, K. (2012). 26 Predictive modeling of gene expression data. Handb Stat Clin Oncol, 4, 71.

Hoerl, A., & Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Paper presented at the International Joint Conference on Artificial Intelligence.

Ryan, T. (2008). Modern regression methods (Vol. 655): John Wiley & Sons.

Scheetz, T. E., Kim, K.-Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., . . . Casavant, T. L. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences, 103(39), 14429-14434.

Stamey, T. A., Warrington, J. A., Caldwell, M. C., Chen, Z., Fan, Z., Mahadevappa, M., . . . Zhang, Z. (2001). Molecular genetic profiling of Gleason grade 4/5 prostate cancers compared to benign prostatic hyperplasia. The Journal of urology, 166(6), 2171-2177.

Tan, Q. E. A. (2012). Correlation adjusted penalization in regression analysis. (Ph.D.), University of Manitoba Canada.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.

Tutz, G., & Ulbricht, J. (2009). Penalized regression with correlation-based penalty. Statistics and Computing, 19, 239-253.

Wang, X., Dunson, D., & Leng, C. (2016). No penalty no tears: Least squares in high-dimensional linear models. Paper presented at the International Conference on Machine Learning.

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 101

-1429.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.

Published

31-01-2025

How to Cite

REGRESSION ESTIMATION AND FEATURE SELECTION USING MODIFIED CORRELATION-ADJUSTED ELASTIC NET PENALTIES. (2025). FUDMA JOURNAL OF SCIENCES, 9(1), 29-40. https://doi.org/10.33003/fjs-2025-0901-2774

How to Cite

REGRESSION ESTIMATION AND FEATURE SELECTION USING MODIFIED CORRELATION-ADJUSTED ELASTIC NET PENALTIES. (2025). FUDMA JOURNAL OF SCIENCES, 9(1), 29-40. https://doi.org/10.33003/fjs-2025-0901-2774

Most read articles by the same author(s)