Research Article | | Peer-Reviewed

Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey

Received: 9 July 2025     Accepted: 24 July 2025     Published: 20 August 2025
Views:       Downloads:
Abstract

Under-five mortality remains a global health challenge with the rates of 43 deaths per every 1000 live births in Tanzania and 37 deaths per every 1000 live births globally. Although child mortality has significantly declined in the last twenty years, the current rates are far from reaching the anticipated Sustainable Development Goal of atmost 25 deaths per 1000 live births in 2030. This study intended to find the best performing classifier of under-five mortality status by comparing ten supervised machine learning algorithms. These machine learning algorithms are Decision Trees, Random Forest, Support Vector Machines, SMOTE-Based Boosted Random Forest, XGBoost, LightGBM, CatBoost, Logistic Regression, K-Nearest Neighbors and Stacked Ensemble Methods. The class imbalance of the dataset detected in the pre-processing stage was addressed using weighted categorical cross-entropy and SMOTE with a 5-folds cross validation and data splitting ratio of 80% for training set and 20% for testing set. With 20 experiments for each of the nine algorithms, the average results were reported to ensure that the findings were not by chance. Further, the stacking ensemble model was developed integrating six of the best performing algorithms using an inclusion criterion of AUC > 0.97. The findings revealed that ensemble algorithm consistently outperformed the other nine algorithms by achieving 100%, 100%, 99.97% and 99.24% for AUC, Accuracy, F1-Score and MCC respectively. This implies that stacking ensemble can uncover more insights than the individual algorithms in predicting under-five mortality status. This study recommends designing policies on under-five mortality that integrate insights from the stacking ensemble algorithm which shows the highest predictive performance.

Published in Machine Learning Research (Volume 10, Issue 2)
DOI 10.11648/j.mlr.20251002.12
Page(s) 110-123
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Under-five, Mortality, Modeling, Machine Learning, Prediction, Classification

References
[1] United Nations. (2024). Progress towards the sustainable development goals: Report of the secretary-general. Available at:
[2] Sharrow, D., Hug, L. and You, D. (2022). Global, Regional, and National Trends in Under-5 Mortality between 1990 and 2019 with Scenario-Based Projections Until 2030: A Systematic Analysis by the UN Inter-agency Group for Child Mortality Estimation. The Lancet Global Health, 10(2): 195-206. Available at: www.thelancet.com/lancetgh
[3] Mwanga, M. K., Mirau, S. S., Tchuenche, J.M and Mbalawata, I.S. (2025). Bayesian Prediction of Under-five Mortality Rates for Tanzania. Franklin Open, 10(2025): 100221. Available at:
[4] Mwijalilege, S. A, Kadigi, M. L. and Kibiki, C. (2025). Comparing ARFIMA and ARIMA Models in Forecasting under Five Mortality Rate in Tanzania. Asian Journal of Probability and Statistics, 27(1): 107-121. Available at:
[5] UNICEF and WHO. (2024). Levels and Trends Child Mortality-Report 2023: Estimates Developed by the United Nations Inter-Agency Group for Child Mortality Estimation. Available at:
[6] Sende, N. B., Saha, S., Ruganzu, L. and Kar, S. (2025). Prediction of Multidimensional Poverty Status with Machine Learning Classification at Household Level: Empirical Evidence from Tanzania. Available at:
[7] Olorunsogo, T. O., Ogugua, J. O., Mounde, M., Maduka, C. P. and Omotayo, O. (2024). Epidemiological Statistical Methods: A Comparative Review of their Implementation in Public Health Studies in the USA and Africa. World Journal of Advanced Research and Reviews, 21(1): 1479-1495. Available at:
[8] Mwanga, M. K., Mirau, S. S., Tchuenche, J. M and Mbalawata, I. S. (2024). Fuzzy Bayesian Inference for Under-five Mortality Data. Franklin Open, 8(2024) 100163. Available at:
[9] Singh, J. P. (2022). Statistical Methods in Public Health, in Healthcare System Management: Methods and Techniques. Springer. Available at:
[10] Ashwini, C., Bose, S. R., Padmavathy, M. D., Raj, C. and Ajay J. C. (2024). Predicting Child Mortality With Diverse Regression Algorithms Using a Machine Learning Approach. Advancing Intelligent Networks Through Distributed Optimization. Book Chapter 17, 329-352. IGI Global. Available at:
[11] Adebanji, A. O., Asare, C. and Gyamerah, S. A. (2024). Predictive Analysis on the Factors Associated with Birth Outcomes: A Machine Learning Perspective. International Journal of Medical Informatics, 189(2024): 105529. Available at:
[12] Pfaffenlehner, M., Behrens, M., Zoller, D., et al. (2025). Methodological Challenges using Routine Clinical Care Data for Real-World Evidence: A Rapid Review Utilizing a Systematic Literature Search and Focus Group Discussion. BMC Medical Research Methodology, 25(2025): 8. Available at:
[13] Nayebirad, S., Hassanzadeh., Vahdani, A. M., et al. (2025). Comparison of machine learning models with conventional statistical methods for prediction of percutaneous coronary intervention outcomes: A systematic review and meta-analysis. BMC Cardiovascular Disorders. 25(2025): 310. Available at:
[14] Scrutinio, D., Amitrano, F., Guida, P., et al. (2025). Prediction of Mortality in Heart Failure by Machine Learning. Comparison with Statistical Modeling. European Journal of Internal Medicine. Available at:
[15] Hamdoun, S. H., Abed, M. Q., Salman, S. M., Al-Bayati, H. N. A. and Balina, O. (2024). The Intersection of Statistics and Machine Learning: A Comprehensive Analysis. Journal of Ecohumanism, 3(5): 406-421. Available at:
[16] Satty, A., Khamis, G. S. M., Mohamed, Z. M., et al. (2025). Statistical Insights into Machine Learning Models for Predicting Under-five Mortality: An Analysis from Multiple Indicator Cluster Survey (MICS). IEEE Access, 13(2025): 45312-45320. Available at:
[17] Mboya, I. B., Mahande, M. J., Mohamed, M., Obure, J. and Mwambi, H.G. (2020). Prediction of Perinatal Death using Machine Learning Models: A Birth Registry-Based Cohort Study in Northern Tanzania. BMJ open, 10(10): e040132. Available at:
[18] Ogbo, F. A., Ezeh, O. K., Awosemo, A. O., et al. (2019). Determinants of Trends in Neonatal, Post-neonatal, Infant, Child and Under-five Mortalities in Tanzania from 2004 to 2016. BMC public health, 19(2019): 1-12. Available at:
[19] Alanazi, B. S. (2025). A Comparative Study of Traditional Statistical Methods and Machine Learning Techniques for Improved Predictive Models. International Journal of Analysis and Applications, 23(2025): 18. Available at:
[20] Mbunge, E., Fashoto, S. G., Muchemwa, B., et al. (2023). Application of Machine Learning Techniques for Predicting Child Mortality and Identifying Associated Risk Factors. in 2023 Conference on Information Communications Technology and Society (ICTAS). Durban, South Africa, 2023, pp. 1-5. Available at:
[21] Mbunge, E., Millham, R. C., Sibiya, M. N. and Takavarasha, S. (2022). Application of Machine Learning Models to Predict Malaria using Malaria Cases and Environmental Risk Factors. in 2022 Conference on Information Communications Technology and Society (ICTAS). Durban, South Africa, 2022, pp. 1-5. Available at:
[22] Sukums, F., Mzurikwao, D., Sabas, D., et al. (2023). The Use of Artificial Intelligence-based Innovations in the Health Sector in Tanzania: A Scoping Review. Health Policy and Technology, 12(2023): 100728. Available at:
[23] Ogallo, W., Speakman, S., Akinwande, V., et al. (2020). Identifying factors associated with neonatal mortality in sub-saharan africa using machine learning. in AMIA Annual Symposium Proceedings, 2020: 963. Available at:
[24] Kovacs, D., Msanga, D. R., Mshana, S. E., Bilal, M., Oravcova, K. and Mathews, L. (2021). Developing practical clinical tools for predicting neonatal mortality at a neonatal intensive care unit in Tanzania. BMC pediatrics, 21(2021): 1-10. Available at:
[25] Levy, J. J. and OaMalley, A. J. (2020). Do not dismiss logistic regression: The case for sensible extraction of interactions in the era of machine learning. BMC medical research methodology, 20(2020): 171. Available at:
[26] Wu, H., Liao, B., Ji, T., Ma, K., Luo, Y., and Zhang, S. (2025). Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients. Frontiers in Medicine, 11(2025): 1496869. Available at:
[27] Yang, Y., Tang, J., Ma, L., Wu, F. and Guan, X. (2025). A systematic comparison of short-term and long-term mortality prediction in acute myocardial infarction using machine learning models. BMC Medical Informatics and Decision Making, 25(2025): 208. Available at:
[28] Lee, H. and Tsoi, P. (2025). Feature-enhanced machine learning for all-cause mortality prediction in healthcare data. arXiv preprint arXiv:2503.21241. Available at:
[29] Mfateneza, E., Rutayisire, P. C., Bacyaza, E., Musafiri, S. and Mpabuka, G. (2022). Application of Machine Learning Methods for Predicting Infant Mortality in Rwanda: Analysis of Rwanda Demographic Health Survey 2014-15 Dataset. BMC Pregnancy and Childbirth, 22(2022): 388. Available at:
[30] Lawrence, S. L. (2021). Predicting Stunting Status among Children Under-five Years: The Case Study of Tanzania. Unpublished Ph.D. Dissertation, University of Rwanda. Available at:
[31] Noviandy, T. R., Naiggolan, S. I., Rahan, R., Firmansyah, I. and Idroes, R. (2023). Maternal health risk detectionusinglightgradientboostingmachineapproach. Infolitika Journal of Data Science, 1(2): 48-55. Available at:
[32] Silva, G. F. D. S., Wichmann, M., da Silva Junior, F. C. and Chavegatto Flho, A. D. P. (2025). Development and Evaluation of Machine Learning Training Strategies for Neonatal Mortality Prediction using Multi-country Data. Scientific Reports, 15(2025):24278. Available at:
[33] Keser, S. B. and Keskin, K. (2023). A Gradient Boosting-based Mortality Prediction Model for COVID-19 Patients. Neural Computing and Applications, 35(33): 23997-24013. Available at:
[34] Diallo, A. H., Shahid, A. S. M. S. B., Khan, A. F., et al. (2023). Characterising Paediatric Mortality During and After Acute Illness in Sub-Saharan Africa and South Asia: A Secondary Analysis of the CHAIN cohort using a Machine Learning Approach. Available at:
[35] Samuel, O., Zewotir, T. and North, T. (2024). Application of machine learning methods for predicting underfive mortality: Analysis of nigerian demographic health survey 2018 dataset. BMC Medical Informatics and Decision Making, 24(2024): 88. Available at:
[36] Kurniawan, M., Yuliastuti, G. E., Rachman, A., Budi, A.P. and Zaqiah, H.N. (2024). Implementing K-Nearest Neighbors (K-NN) Algorithm and Backward Elimination on Cardiotocography Datasets. International Journal on Informatics Visualization, 8(3): 1239-1245. Available at:
[37] Bitew, F. H., Nyarko, S. H., Potter, L. and Sparks, C. S. (2020). Machine Learning Approach for Predicting Under-five Mortality Determinants in Ethiopia: Evidence from the 2016 Ethiopian Demographic and Health Survey. Genus, 76(2020): 1-16. Available at:
[38] Mollalo, A., Vahedi, B., Bhattarai, S., Hopkins, L. C. and Banik, S. (2020). Predicting the Hotspots of Age-adjusted Mortality Rates of Lower Respiratory Infection Across the Continental United States: Integration of GIS, Spatial Statistics and Machine Learning Algorithms. International Journal of Medical Informatics, 142(2020): 104248. Available at:
[39] Tedese, Z. B., Nigatu, A. M., Yehuala, T. Z. and Sebastian, Y. (2024). Prediction of Incomplete Immunization Among Under-five Children in East Africa from Recent Demographic and Health Surveys: A Machine Learning Approach. Scientific Reports, 14(2024): 11529. Available at:
[40] Yehuala, T. Z., Derseh, N. M., Tewelgne, M. F. and Wubante, S. M. (2024). Exploring Machine Learning Algorithms to Predict Diarrhea Disease and Identify its Determinants among Under-five Years Children in East Africa. Journal of Epidemiology and Global Health, 14(3): 1089-1099. Available at:
[41] Abdulhafedh, A. (2022). Comparison between Common Statistical Modeling Techniques used in Research, Including: Discriminant Analysis vs Logistic Regression, Ridge Regression vs Lasso, and Decision Tree vs Random Forest. Open Access Library Journal, 9(2): 1-19. Available at:
[42] Bizzego, A., Gabrieli, G., Bornstein, M. H., et al. (2021). Predictors of Contemporary Under-5 Child Mortality in Low-and Middle-income Countries: A Machine Learning Approach. International Journal of Environmental Research and Public Health, 18(3): 1315. Available at:
[43] Dereje, T., Abuhay, T. M., Letta, A. and Alelign, M. (2021). Investigate Risk Factors and Predict Neonatal and Infant Mortality Based on Maternal Determinants using Homogeneous Ensemble Methods. In 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA). Bahir Dar, Ethiopia, 2021, pp. 18-23. Available at:
[44] Santos, H., Eilertson, K., Lambert, B., Hauryski, S., Patel, M. and Ferrari, M. (2021). Ensemble Model Estimates of the Global Burden of Measles Morbidity and Mortality from 2000 to 2019: A Modeling Study. MedRxiv. Available at:
[45] Lee, S., Kim, Y., Ji, B. and Kim, Y. (2025). Addressing Missing Data in Slope Displacement Monitoring: Comparative Analysis of Advanced Imputation Methods. Buildings, 15(2): 236. Available at:
[46] Zang, L. and Xiong, F. (2025). Harnessing Machine Learning to Address High Levels of Missing Data in CrossNational Studies: From Bias to Precision in Public Service Research. Journal of Comparative Policy Analysis: Research and Practice, pp. 1-21. Available at:
[47] Popovich, D. (2025). How to Treat Missing Data in Survey Research. Journal of Marketing Theory and Practice, 33(1): 43-59. Available at:
[48] Camillieri, G. (2024). Missing Data and Imputation. In: Lyman, S., Ayeni, O.R., Koh, J.L., Nakamura, N., Karlsson, J. (eds) Introduction to Surgical Trials. Springer, Cham. Available at:
[49] Wu, Z., Zhu, M., Kang, Y., et al. Do we Need Different Machine Learning Algorithms for QSAR Modeling? A Comprehensive Assessment of 16 Machine Learning Algorithms on 14 QSAR Datasets. Briefings in Bioinformatics, 22(4): bbaa321. Available at:
[50] Zakariaee, S. S, Naderi, N., Ebrahimi, M. and Kazemi-Arpanahi, H. (2023). Comparing Machine Learning Algorithms to Predict COVID-19 Mortality Using a Dataset Including Chest Computed Tomography Severity Score Data. Scientific Reports, 13(2023): 11343. Available at:
[51] Moslehi, S., Rabiei, N., Soltania, A.R. and Mamani, M. (2022). Application of Machine Learning Models Based on Decision Trees in Classifying the Factors Affecting Mortality of COVID-19 Patients in Hamadan, Iran. BMC Medical Informatics and Decision Making, 22(2022): 192. Available at:
[52] Raihan, M., Saha, P. K., Gupta, R. D., et al. (2024). A deep learning and machine learning approach to predict neonatal death in the context of sao paulo. International Journal of Public Health Science, 13(1): 179-190. Available at:
[53] Leo, J., Luhanga, E. and Michael, K. (2019). Machine Learning Model for Imbalanced Cholera Dataset in Tanzania. The Scientific World Journal, 2019(1): 9397578. Available at:
[54] Ramosaj, B. and Pauly, M. (2019). Consistent Estimation of Residual Variance with Random Forest out-of-bag Errors. Statistics & Probability Letters, 151(2019): 49-57. Available at:
[55] Caie, P. D., Dimitiou, N. and Arandjelovic, O. (2025). Precision Medicine in Digital Pathology via Image Analysis and Machine Learning. Elsevier, pp. 233-257. Available at:
[56] Imani, M., Beikmohammadi, A. and Arabnia, H.R. (2025). Comprehensive Analysis of Random Forest and Xgboost Performance with SMOTE, ADASYN, and GNUS under Varying Imbalance Levels. Technologies, 13(3): 88. Available at:
[57] Newaz, A., Mohosheu, M. S., Al Noman, M. A., and Jabid, T. (2024). IBRF: Improved Balanced Random Forest Classifier. In 2024 IEEE 35th Conference of Open Innovations Association (FRUCT), pp. 501-508. Available at:
[58] Alsharkawi, A., Al-Fatyani, M., Dawas, M., Saadeh, H. and Alyaman, M. (2021). Poverty Classification using Machine Learning: The Case of Jordan. Sustainability, 13(3): 1412. Available at:
[59] Raihan, M., Saha, P. K., Gupta, R. D., et al. (2025). A Deep Learning and Machine Learning Approach to Predict Neonatal Death in the Context of Sao Paulo. arXiv preprint arXiv:2506.16929.
[60] Sende, N. B., Saha, S. and Uwimbabazi, L. F. R. (2025). Spatial Distribution of Poverty Clusters and its Prediction Algorithms: A Visual Analytics Approach to Understanding the Disparities of Poverty Across Zones. In in IEEE Access, 13(2025): 96302-96316. Available at:
[61] Agrawal, S., Gupta, G. K., Gopalakrishna, P. K., Balasubramaniam, V. S., Goel, L. and Mahadik, S. (2024). Hybrid Machine Learning Models: Combining Strengths of Supervised and Unsupervised Learning Approaches. In 2024 7th International Conference on Contemporary Computing and Informatics (IC3I), Greater Noida, India, 7(2024): 1056-1061. Available at:
[62] Yakovyna, V., Shakhovska, N. and Szpakowska, A. (2024). A novel hybrid supervised and unsupervised hierarchical ensemble for covid-19 cases and mortality prediction. Scientific Reports, 14(2024): 9782. Available at:
[63] Rodriguez, A., Mendoza, D., Ascuntar, J. and Jaimes, F. (2021). Supervised Classification Techniques for Prediction of Mortality in Adult Patients with Sepsis. The American Journal of Emergency Medicine, 45(2021): 392-397. Available at:
Cite This Article
  • APA Style

    Mabula, S., Too, R., Kerich, G. (2025). Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey. Machine Learning Research, 10(2), 110-123. https://doi.org/10.11648/j.mlr.20251002.12

    Copy | Download

    ACS Style

    Mabula, S.; Too, R.; Kerich, G. Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey. Mach. Learn. Res. 2025, 10(2), 110-123. doi: 10.11648/j.mlr.20251002.12

    Copy | Download

    AMA Style

    Mabula S, Too R, Kerich G. Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey. Mach Learn Res. 2025;10(2):110-123. doi: 10.11648/j.mlr.20251002.12

    Copy | Download

  • @article{10.11648/j.mlr.20251002.12,
      author = {Salyungu Mabula and Robert Too and Gregory Kerich},
      title = {Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey
    },
      journal = {Machine Learning Research},
      volume = {10},
      number = {2},
      pages = {110-123},
      doi = {10.11648/j.mlr.20251002.12},
      url = {https://doi.org/10.11648/j.mlr.20251002.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20251002.12},
      abstract = {Under-five mortality remains a global health challenge with the rates of 43 deaths per every 1000 live births in Tanzania and 37 deaths per every 1000 live births globally. Although child mortality has significantly declined in the last twenty years, the current rates are far from reaching the anticipated Sustainable Development Goal of atmost 25 deaths per 1000 live births in 2030. This study intended to find the best performing classifier of under-five mortality status by comparing ten supervised machine learning algorithms. These machine learning algorithms are Decision Trees, Random Forest, Support Vector Machines, SMOTE-Based Boosted Random Forest, XGBoost, LightGBM, CatBoost, Logistic Regression, K-Nearest Neighbors and Stacked Ensemble Methods. The class imbalance of the dataset detected in the pre-processing stage was addressed using weighted categorical cross-entropy and SMOTE with a 5-folds cross validation and data splitting ratio of 80% for training set and 20% for testing set. With 20 experiments for each of the nine algorithms, the average results were reported to ensure that the findings were not by chance. Further, the stacking ensemble model was developed integrating six of the best performing algorithms using an inclusion criterion of AUC > 0.97. The findings revealed that ensemble algorithm consistently outperformed the other nine algorithms by achieving 100%, 100%, 99.97% and 99.24% for AUC, Accuracy, F1-Score and MCC respectively. This implies that stacking ensemble can uncover more insights than the individual algorithms in predicting under-five mortality status. This study recommends designing policies on under-five mortality that integrate insights from the stacking ensemble algorithm which shows the highest predictive performance.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Comparative Analysis of Machine Learning Algorithms for Predicting Under-Five Mortality: Evidence from Tanzania Demographic and Health Survey
    
    AU  - Salyungu Mabula
    AU  - Robert Too
    AU  - Gregory Kerich
    Y1  - 2025/08/20
    PY  - 2025
    N1  - https://doi.org/10.11648/j.mlr.20251002.12
    DO  - 10.11648/j.mlr.20251002.12
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 110
    EP  - 123
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20251002.12
    AB  - Under-five mortality remains a global health challenge with the rates of 43 deaths per every 1000 live births in Tanzania and 37 deaths per every 1000 live births globally. Although child mortality has significantly declined in the last twenty years, the current rates are far from reaching the anticipated Sustainable Development Goal of atmost 25 deaths per 1000 live births in 2030. This study intended to find the best performing classifier of under-five mortality status by comparing ten supervised machine learning algorithms. These machine learning algorithms are Decision Trees, Random Forest, Support Vector Machines, SMOTE-Based Boosted Random Forest, XGBoost, LightGBM, CatBoost, Logistic Regression, K-Nearest Neighbors and Stacked Ensemble Methods. The class imbalance of the dataset detected in the pre-processing stage was addressed using weighted categorical cross-entropy and SMOTE with a 5-folds cross validation and data splitting ratio of 80% for training set and 20% for testing set. With 20 experiments for each of the nine algorithms, the average results were reported to ensure that the findings were not by chance. Further, the stacking ensemble model was developed integrating six of the best performing algorithms using an inclusion criterion of AUC > 0.97. The findings revealed that ensemble algorithm consistently outperformed the other nine algorithms by achieving 100%, 100%, 99.97% and 99.24% for AUC, Accuracy, F1-Score and MCC respectively. This implies that stacking ensemble can uncover more insights than the individual algorithms in predicting under-five mortality status. This study recommends designing policies on under-five mortality that integrate insights from the stacking ensemble algorithm which shows the highest predictive performance.
    
    VL  - 10
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Sections