Research Article | | Peer-Reviewed

Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique

Received: 28 March 2025     Accepted: 20 May 2025     Published: 13 June 2025
Views:       Downloads:
Abstract

Climate regionalization provides valuable insights into the climatic challenges faced by a country, enabling better preparedness for climate change impacts and the development of targeted strategies. In this study, the climate regionalization of Bangladesh was performed based on nine climatic factors from 34 weather stations using unsupervised machine learning techniques. The exploratory data analysis was performed to assess the characteristics of the parameters, revealing distributional patterns. Principal Component Analysis (PCA) was then applied to reduce the dimensionality of the data and extract significant climate patterns. Following this, the non-hierarchical k-means clustering algorithm was used to group the locations into homogeneous clusters. The optimal number of clusters was determined using three widely recognized methods: the average silhouette score, the gap statistic, and the elbow method, before applying the clustering. While both the Silhouette Method and Gap statistic suggested three clusters, the elbow method identified nine clusters, which provided a more detailed regionalization. The locations Barisal, Jessore, Khepupara, Khulna, Mongla, Potuakhali, Satkhira from the south-west region form a significant cluster with Faridpur. The second largest cluster includes Bogra, Dinajpur, Ishurdi, Rajshahi, Rangpur, and Saidpur from the North-West region of Bangladesh. The findings of this study demonstrate that clustering offers a systematic approach to understanding the spatial distribution of climatic characteristics, facilitating informed decision making, resource allocation, and the development of policies tailored to the specific needs of different geographic regions in Bangladesh.

Published in International Journal of Data Science and Analysis (Volume 11, Issue 3)
DOI 10.11648/j.ijdsa.20251103.11
Page(s) 46-62
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Climate, Principal Component, Silhouette, Gap Statistic, Elbow, k-means, Bangladesh

References
[1] Khatun, M. A., Rashid, M. B., Hygen, H. O., 2016. Climate of Bangladesh, MET report, Norwegian Meteorological Institute.
[2] Ashraf, F. B., Kabir, M. R., Shafi, M. S. R. and Rifat, J. I. M., 2020, December. Finding homogeneous climate zones in Bangladesh from statistical analysis of climate data using machine learning technique. In 2020 23rd International Conference on Computer and Information Technology (ICCIT) (pp. 1-6). IEEE.
[3] Nicholson, S. E. and Dezfuli, A. K., 2013. The relationship of rainfall variability in western equatorial Africa to the tropical oceans and atmospheric circulation. Part I: The boreal spring. Journal of climate, 26(1), pp. 45-65.
[4] Abadi, A. M., Rowe, C. M. and Andrade, M., 2020. Climate regionalization in Bolivia: a combination of non-hierarchical and consensus clustering analyses based on precipitation and temperature. International Journal of Climatology, 40(10), pp. 4408-4421.
[5] Satti, S., Zaitchik, B. F., Badr, H. S. and Tadesse, T., 2017. Enhancing dynamical seasonal predictions through objective regionalization. Journal of Applied Meteorology and Climatology, 56(5), pp. 1431-1442.
[6] Mongi, C. E., Langi, Y. A. R., Montolalu, C. E. J. C. and Nainggolan, N., 2019, July. Comparison of hierarchical clusteringmethods(casestudy: Dataonpovertyinfluence in North Sulawesi). In IOP Conference Series: Materials Science and Engineering (Vol. 567, No. 1, p. 012048). IOP Publishing.
[7] Johnson, R. A. and Wichern, D. W., 2002. Applied multivariate statistical analysis. 6th Edi. Pearson.
[8] MacQueen, J., 1967, January. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Vol. 5, pp. 281-298). University of California press.
[9] Manning, C. D., Raghavan, P. and SchÃtze, H. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
[10] Badr, H. S., Zaitchik, B. F. and Dezfuli, A. K., 2015. A tool for hierarchical climate regionalization. Earth Science Informatics, 8, pp. 949-958.
[11] Rahman, M. M., Sarkar, S., Najafi, M. R. and Rai, R. K., 2013. Regional extreme rainfall mapping for Bangladesh using L-moment technique. Journal of Hydrologic Engineering, 18(5), pp. 603-615.
[12] Siraj-Ud-Doulah, M. and Islam, M. N., 2019. Defining homogenous climate zones of Bangladesh using cluster analysis. Int. J. Stat. Math, 6, pp. 119-129.
[13] Moron, V., Acharya, N. and Hassan, S. Q., 2023. Storm types in Bangladesh: duration, intensity and area of intra- daily wet events. International Journal of Climatology, 43(2), pp. 850-873.
[14] Di Giuseppe, E., Jona Lasinio, G., Esposito, S. and Pasqui, M., 2013. Functional clustering for Italian climate zones identification. Theoretical and applied climatology, 114, pp .39-54.
[15] Mosley, M. P., 1981. Delimitation of New Zealand hydrologic regions. Journal of Hydrology, 49(1-2), pp. 173-192.
[16] Modarres, R., 2006. Regional precipitation climates of Iran. Journal of Hydrology (New Zealand), 45(1), pp. 13-27.
[17] Ahmad, N. H., Othman, I. R. and Deni, S. M., 2013, April. Hierarchical cluster approach for regionalization of Peninsular Malaysia based on the precipitation amount. In Journal of Physics: Conference Series (Vol. 423, No. 1, p. 012018). IOP Publishing.
[18] Awan, J. A., Bae, D. H. and Kim, K. J., 2015. Identification and trend analysis of homogeneous rainfall zones over the East Asia monsoon region. International Journal of Climatology, 35(7).
[19] Littmann, T., 2000. An empirical classification of weather types in the Mediterranean Basin and their interrelation with rainfall. Theoretical and Applied Climatology, 66, pp. 161-171.
[20] Sahin, S. and Cigizoglu, H. K., 2010. Homogeneity analysis of Turkish meteorological data set. Hydrological Processes: An International Journal, 24(8), pp. 981-992.
[21] Goyal, M.K., Shivam, G.andSarma, A.K., 2019. Spatial homogeneity of extreme precipitation indices using fuzzy clustering over northeast India. Natural Hazards, 98, pp. 559-574.
[22] Lyra, G. B., Oliveira-Junior, J. F. and Zeri, M., 2014. Cluster analysis applied to the spatial and temporal variability of monthly rainfall in Alagoas state, Northeast of Brazil. International Journal of Climatology, 34(13), pp. 3546-3558.
[23] Basalirwa, C. P. K., 1995. Delineation of Uganda into climatological rainfall zones using the method of principal component analysis. International Journal of climatology, 15(10), pp. 1161-1177.
[24] Turkes, M. and Tatli, H., 2011. Use of the spectral clustering to determine coherent precipitation regions in Turkey for the period 1929-2007. International Journal of Climatology, 31(14), pp. 2055-067.
[25] Kwon, J. and Choi, Y., 2023. Application of synoptic patterns to the definition of seasons in the Republic of Korea. International Journal of Climatology, 43(13), pp. 6268-6284.
[26] Torralba, V., Gonzalez-Reviriego, N., Cortesi, N., Manrique-Sunen, A., Lledo, L., Marcos, R., Soret, A. and Doblas-Reyes, F. J., 2021. Challenges in the selection of atmospheric circulation patterns for the wind energy sector. International Journal of Climatology, 41(3), pp. 1525-1541.
[27] Mace, A., Sommariva, R., Fleming, Z. and Wang, W., 2011. Adaptive K-means for clustering air mass trajectories. In Intelligent Data Engineering and Automated Learning-IDEAL 2011: 12th International Conference, Norwich, UK, September 7-9, 2011. Proceedings 12 (pp. 1-8). Springer Berlin Heidelberg.
[28] Thomas, S. R., MartÃnez-Alvarado, O., Drew, D. and Bloomfield, H., 2021. Drivers of extreme wind events in Mexico for windpower applications. International Journal of Climatology, 41, pp. E2321- E2340.
[29] Kassomenos, P., Vardoulakis, S., Borge, R., Lumbreras, J., Papaloukas, C. and Karakitsios, S., 2010. Comparison of statistical clustering techniques for the classification of modelled atmospheric trajectories. Theoretical and applied climatology, 102, pp. 1-12.
[30] Farukh, M. A. and Yamada, T. J., 2018. Synoptic climatology of winter daily temperature extremes in Sapporo, northern Japan. International journal of climatology, 38(5), pp. 2230-2238.
[31] Sandeep Kumar, E., Talasila, V., Rishe, N., Suresh Kumar, T. V. and Iyengar, S. S., 2019. Location identification for real estate investment using data analytics. International Journal of Data Science and Analytics, 8(3), pp. 299-323.
[32] Almazroui, M., Dambul, R., Islam, M. N. and Jones, P. D., 2015. Principal components-based regionalization of the Saudi Arabian climate. International Journal of Climatology, 35(9).
[33] Rahman, M. H., Matin, M. A. and Salma, U., 2018. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling. Theoretical and applied climatology, 134, pp. 689-705.
[34] Mahmud, S., Sumana, F. M., Mohsin, M. and Khan, M. H. R., 2022. Redefining homogeneous climate regions in Bangladesh using multivariate clustering approaches. Natural Hazards, 111(2), pp. 1863-1884.
[35] Fovell, R. G. and Fovell, M. Y. C., 1993. Climate zones of the conterminous United States defined using cluster analysis. Journal of climate, 6(11), pp. 2103-2135.
[36] Xiong, J., Yao, R., Grimmond, S., Zhang, Q. and Li, B., 2019. A hierarchical climatic zoning method for energy efficient building design applied in the region with diverse climate characteristics. Energy and Buildings, 186, pp. 355-367.
[37] Hargrove, W. W. and Hoffman, F. M., 1999. Using multivariate clustering to characterize ecoregion borders. Computing in science & engineering, 1(4), pp. 18-25.
[38] Rodriguez, M. Z., Comin, C. H., Casanova, D., Bruno, O. M., Amancio, D. R., Costa, L. D. F. and Rodrigues, F. A., 2019. Clustering algorithms: A comparative approach. PloS one, 14(1), p. e0210236.
[39] Unal, Y., Kindap, T. and Karaca, M., 2003. Redefining the climate zones of Turkey using cluster analysis. International journal of climatology, 23(9), pp. 1045- 1055.
[40] Matulla, C., Penlap, E. K., Haas, P. and Formayer, H., 2003. Comparative analysis of spatial and seasonal variability: Austrian precipitation during the 20th century. International Journal of Climatology: A Journal of the Royal Meteorological Society, 23(13), pp. 1577- 1588.
[41] Gerstengarbe, F. W., Werner, P. C. and Fraedrich, K., 1999. Applying non-hierarchical cluster analysis algorithms to climate classification: some problems and their solution. Theoretical and applied climatology, 64, pp. 143-150.
[42] Rahman, M. H., 2022. Prediction of homogeneous region over Bangladesh based on temperature: a non-hierarchical clustering approach. Theoretical and Applied Climatology, 148(3), pp. 1127-1149.
[43] Comrie, A. C. and Glenn, E. C., 1998. Principal components-based regionalization of precipitation regimes across the southwest United States and northern Mexico, with an application to monsoon precipitation variability. Climate research, 10(3), pp. 201-215.
[44] Eklundh, L. and Pilesjo, P., 1990. Regionalization and spatial estimation of Ethiopian mean annual rainfall. International Journal of Climatology, 10(5), pp. 473-494.
[45] Kozjek, K., Dolinar, M. and Skok, G., 2017. Objective climate classification of Slovenia. International journal of climatology, 37, pp. 848-860.
[46] Han, J., Kamber, M. and Pei, J., 2012. Data mining: Concepts and. Techniques, Waltham: Morgan Kaufmann Publishers.
[47] Bholowalia, P. and Kumar, A., 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. International Journal of Computer Applications, 105(9).
[48] Batool, F. and Hennig, C., 2021. Clustering with the average silhouette width. Computational Statistics & Data Analysis, 158, p.107190.
[49] Charrad, M., Ghazzali, N., Boiteau, V. and Niknafs, A., 2014. NbClust: an R package for determining the relevant number of clusters in a data set. Journal of statistical software, 61, pp. 1-36.
[50] Yan, M. and Ye, K., 2007. Determining the number of clusters using the weighted gap statistic. Biometrics, 63(4), pp. 1031-1037.
[51] Tibshirani, R., Walther, G. and Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the royal statistical society: series b (statistical methodology), 63(2), pp. 411-423.
[52] Thorndike, R. L., 1953. Who belongs in the family? Psychometrika, 18(4), pp. 267-276.
[53] Gong, X. and Richman, M. B., 1995. On the application of cluster analysis to growing season precipitation data in North America east of the Rockies. Journal of climate, 8(4), pp. 897-931.
[54] Nathan, R.J.andMcMahon, T.A., 1990. Identificationof homogeneous regions for the purposes of regionalisation. Journal of Hydrology, 121(1-4), pp. 217-238.
Cite This Article
  • APA Style

    Rahman, M. H., Sadia, H. (2025). Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique. International Journal of Data Science and Analysis, 11(3), 46-62. https://doi.org/10.11648/j.ijdsa.20251103.11

    Copy | Download

    ACS Style

    Rahman, M. H.; Sadia, H. Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique. Int. J. Data Sci. Anal. 2025, 11(3), 46-62. doi: 10.11648/j.ijdsa.20251103.11

    Copy | Download

    AMA Style

    Rahman MH, Sadia H. Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique. Int J Data Sci Anal. 2025;11(3):46-62. doi: 10.11648/j.ijdsa.20251103.11

    Copy | Download

  • @article{10.11648/j.ijdsa.20251103.11,
      author = {Md. Habibur Rahman and Humayra Sadia},
      title = {Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique
    },
      journal = {International Journal of Data Science and Analysis},
      volume = {11},
      number = {3},
      pages = {46-62},
      doi = {10.11648/j.ijdsa.20251103.11},
      url = {https://doi.org/10.11648/j.ijdsa.20251103.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251103.11},
      abstract = {Climate regionalization provides valuable insights into the climatic challenges faced by a country, enabling better preparedness for climate change impacts and the development of targeted strategies. In this study, the climate regionalization of Bangladesh was performed based on nine climatic factors from 34 weather stations using unsupervised machine learning techniques. The exploratory data analysis was performed to assess the characteristics of the parameters, revealing distributional patterns. Principal Component Analysis (PCA) was then applied to reduce the dimensionality of the data and extract significant climate patterns. Following this, the non-hierarchical k-means clustering algorithm was used to group the locations into homogeneous clusters. The optimal number of clusters was determined using three widely recognized methods: the average silhouette score, the gap statistic, and the elbow method, before applying the clustering. While both the Silhouette Method and Gap statistic suggested three clusters, the elbow method identified nine clusters, which provided a more detailed regionalization. The locations Barisal, Jessore, Khepupara, Khulna, Mongla, Potuakhali, Satkhira from the south-west region form a significant cluster with Faridpur. The second largest cluster includes Bogra, Dinajpur, Ishurdi, Rajshahi, Rangpur, and Saidpur from the North-West region of Bangladesh. The findings of this study demonstrate that clustering offers a systematic approach to understanding the spatial distribution of climatic characteristics, facilitating informed decision making, resource allocation, and the development of policies tailored to the specific needs of different geographic regions in Bangladesh.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Prognosticate the Analogous Region in Bangladesh Utilizing an Unsupervised Machine Learning Technique
    
    AU  - Md. Habibur Rahman
    AU  - Humayra Sadia
    Y1  - 2025/06/13
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ijdsa.20251103.11
    DO  - 10.11648/j.ijdsa.20251103.11
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 46
    EP  - 62
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20251103.11
    AB  - Climate regionalization provides valuable insights into the climatic challenges faced by a country, enabling better preparedness for climate change impacts and the development of targeted strategies. In this study, the climate regionalization of Bangladesh was performed based on nine climatic factors from 34 weather stations using unsupervised machine learning techniques. The exploratory data analysis was performed to assess the characteristics of the parameters, revealing distributional patterns. Principal Component Analysis (PCA) was then applied to reduce the dimensionality of the data and extract significant climate patterns. Following this, the non-hierarchical k-means clustering algorithm was used to group the locations into homogeneous clusters. The optimal number of clusters was determined using three widely recognized methods: the average silhouette score, the gap statistic, and the elbow method, before applying the clustering. While both the Silhouette Method and Gap statistic suggested three clusters, the elbow method identified nine clusters, which provided a more detailed regionalization. The locations Barisal, Jessore, Khepupara, Khulna, Mongla, Potuakhali, Satkhira from the south-west region form a significant cluster with Faridpur. The second largest cluster includes Bogra, Dinajpur, Ishurdi, Rajshahi, Rangpur, and Saidpur from the North-West region of Bangladesh. The findings of this study demonstrate that clustering offers a systematic approach to understanding the spatial distribution of climatic characteristics, facilitating informed decision making, resource allocation, and the development of policies tailored to the specific needs of different geographic regions in Bangladesh.
    
    VL  - 11
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Sections