MODIFIED ADAPTIVE LASSO FOR CLASSIFICATION OF HIGH DIMENSIONAL DATA

Emmanuel Lekwot; Tukur Dahiru; Husseini Garba Dikko; Enoch Yabkwa Yanshak

doi:10.33003/fjs-2025-0902-3248

Emmanuel Lekwot Ahmadu bello University, Zaria
Tukur Dahiru Department of Community Medicine, Ahmadu Bello University, Zaria
Husseini Garba Dikko Department of Statistics, Ahmadu Bello University, Zaria
Enoch Yabkwa Yanshak Department of Statistics, Ahmadu Bello University, Zaria

DOI: https://doi.org/10.33003/fjs-2025-0902-3248

Keywords: High-dimensional data, Modified Adaptive LASSO (MALASSO), Penalized logistic regression, Gene expression analysis, Variable selection

Abstract

High-dimensional classification problems, such as gene expression analysis in medical research, require effective variable selection techniques to improve predictive accuracy and interpretability. Traditional penalized logistic regression methods, such as LASSO and Elastic Net, have been widely applied for simultaneous variable selection and coefficient estimation. However, these methods suffer from limitations, including selection bias and inefficiencies in handling correlated predictors. This study introduces the Modified Adaptive LASSO (MALASSO), a novel approach that enhances high-dimensional classification by incorporating an improved weighting mechanism based on ridge regression estimates. The new weighting scheme mitigates the selection bias observed in LASSO-based methods and improves classification performance in datasets with highly correlated features. To evaluate MALASSO’s effectiveness, extensive simulations and real-world applications were conducted using leukemia and colon cancer gene expression datasets. Results indicate that MALASSO outperforms existing methods, achieving superior classification accuracy (98.45% for leukemia and 100% for colon cancer) while selecting fewer, more relevant variables. Compared to Adaptive LASSO (ALASSO) and Adaptive Elastic Net (AEnet), MALASSO demonstrated improved robustness and model sparsity, highlighting its potential for high-dimensional medical diagnostics and biomarker discovery. This study contributes to the advancement of penalized regression techniques by addressing critical shortcomings in existing methods. Future work will explore MALASSO’s applicability to multiclass classification and other high-dimensional domains.

Author Biographies

Tukur Dahiru, Department of Community Medicine, Ahmadu Bello University, Zaria

Professor Dahiru Tukur

Senior Lecturer,

Department of Community Medicine,

Ahmadu Bello University,

Zaria

Husseini Garba Dikko, Department of Statistics, Ahmadu Bello University, Zaria

Professor Husseini Garba Dikko

Senior Lecturer,

Department of Statistics,

Ahmadu Bello University,

Zaria.

Enoch Yabkwa Yanshak, Department of Statistics, Ahmadu Bello University, Zaria

Mr Enoch Yabkwa Yanshak

Ms.c Student

Ahmadu Bello University , Zaria

References

Algamal, Z.Y. and Lee, M.H. (2015). Regularized Logistic Regression with Adjusted Adaptive Elastic net for Gene Selection in High Dimensional Cancer Classification.Computers in biology and medicine, 67:136-145. DOI: https://doi.org/10.1016/j.compbiomed.2015.10.008

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D. & Levine, A.J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences. USA 96(12): 6745-6750. DOI: https://doi.org/10.1073/pnas.96.12.6745

Araveeporn, A. (2021). The Higher-order of Adaptive LASSO and Elastic net Methods for Classification on High Dimensional Data. Mathematics, 9(10). 1091. DOI: https://doi.org/10.3390/math9101091

Bag, S., Gupta, K. & Deb, S. (2022). A review and recommendations on variable selection methods in regression models for binary data. arXiv preprint arXiv:2201.06063.

Buhlmann, P. and Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Science & Business Media, Heidelberg. DOI: https://doi.org/10.1007/978-3-642-20192-9

Fan, J. and Lv, J. (2008). Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society B, 70(5): 849-911. DOI: https://doi.org/10.1111/j.1467-9868.2008.00674.x

Farhadi, Z., Belaghi, R.A. & Alma, O.G. (2019). Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data. American Journal of Theoretical and Applied Statistics, 8(5): 185-192. DOI: https://doi.org/10.11648/j.ajtas.20190805.14

Golub T. R., Slonim, D. K., Tamayo, P. , Huard, C. M., Mesirov, J. P., Coller, H., & Loh,

M.L. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science. The Annals of Statistics, 286: 531-537. DOI: https://doi.org/10.1126/science.286.5439.531

Greenwood, C.J., Youssef, G.J., Letcher P, Macdonald, J.A., Hagg, L.J., & Sanson, A. (2020). A Comparison of penalized Regression Methods for Informing the Selection of Predictive Markers. PLoS ONE 15(11): e0242730. https://doi.org/10.1371/journal.pone.0242. DOI: https://doi.org/10.1371/journal.pone.0242730

Hastie, T., Tibshirani, R. & Friedman, J. (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. New York, Springer. DOI: https://doi.org/10.1007/978-0-387-21606-5

Hastie, T., Tibshirani, R. & Friedman, J. (2017). The Elements of Statistical Learning; Data Mining, Inference and Prediction. New York, Springer.

Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. 2nd Edition,Wiley, New York. DOI: https://doi.org/10.1002/0471722146

Ismah, K., Anwar, N. & Bagus, S. (2021). A Multicollinearity-Adjusted Adaptive LASSO for Zero- Infated Count Regression with Weight of Expectation Maximiza- tion Standard Error Adaptive LASSO for Zero Inflated Poisson Data. Journal of Physics, Conference Series. 1776 012050. DOI: https://doi.org/10.1088/1742-6596/1776/1/012050

Muhammad, A. B., Olawoyin, I. O., Yahaya, A., Gulumbe, S. U., Muhammad, A. A., & Salisu, I. A. (2024). Credit Risk Analysis: An Assessment of the Performance of Six Machine Learning Techniques in Credit Scoring Modelling. FUDMA Journal of Sciences, 8(6), 163-173. DOI: https://doi.org/10.33003/fjs-2024-0806-2893

Qian, W. and Yang, Y. (2013). Model Selection via Standard Error Adjusted Adaptive LASSO. Ann Inst Stat Math, 65:295-318. DOI: https://doi.org/10.1007/s10463-012-0370-0

Tibshirani, R. (1996). Regression Shrinkage and Selection via the LASSO. Journal of the Royal Statistical Society. Series B(Methodological) : 267-288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Tibshirani, R. , Hastie, T. & Wainwright, M. (2019). Statistical Learning with Sparsity-The LASSO and Generalizations .Chapman and hall book.

Wahid, A. (2022). Adaptive LASSO in High-dimensions. https://doi.org/10.31235/osf.io/yphxv. DOI: https://doi.org/10.31235/osf.io/yphxv

Wu, Y. (2021). Cant Ridge Regression Perform Variable Selection?. Technometrics, 63(2):263-271. DOI: https://doi.org/10.1080/00401706.2020.1791254

Zou, H. (2006). The Adaptive LASSO and its Oracle Properties. Journal of the American Statistical Association, 101: 1418-1429.

Zou. H, and Hastie. T. (2005). Regularization and Variable Selection via the Elastic net,Journal of the Royal Statistical Society.B 67 :301-320. DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x

Zou, H. (2006). The Adaptive LASSO and its Oracle Properties. Journal of the American Statistical Association, 101: 1418-1429. DOI: https://doi.org/10.1198/016214506000000735

Zou, H. and Zhang, T. (2009). On the Adaptive Elastic net with a Diverging Number of Parameters. Annals of Statistics. 37:1733-1751. DOI: https://doi.org/10.1214/08-AOS625

MODIFIED ADAPTIVE LASSO FOR CLASSIFICATION OF HIGH DIMENSIONAL DATA

Abstract

Author Biographies

References

Announcements

FUDMA Journal of Sciences - Vol. 9_2025 Call for Manuscript Submission