Research Article
Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya
Issue:
Volume 11, Issue 6, December 2025
Pages:
158-170
Received:
2 October 2025
Accepted:
17 October 2025
Published:
7 November 2025
DOI:
10.11648/j.ijdsa.20251106.11
Downloads:
Views:
Abstract: HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
Abstract: HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County,...
Show More
Research Article
Stacked Ensemble Classifier for Adoption of Point-of-Collection Water Treatment Technology Among Households in Western Kenya
Issue:
Volume 11, Issue 6, December 2025
Pages:
171-177
Received:
17 September 2025
Accepted:
29 September 2025
Published:
10 November 2025
DOI:
10.11648/j.ijdsa.20251106.12
Downloads:
Views:
Abstract: TheDispensersforSafeWaterprogramunderEvidenceActionpromotespoint of collection water treatment through the installation of chlorine dispenser gadgets in rural parts of Kenya. Although the initiative has improved access to safe drinking water, monitoring household adoption remained a challenge during the COVID-19 pandemic, which limited field-based data collection and led to increased dependence on phone surveys. In addition, technology adoption data are often imbalanced, which poses difficulties for traditional classification methods. This study aimed to develop and implement a stacking ensemble classifier to model the adoption of chlorine dispensers among households in western Kenya. Data were collected from 27,457 households. The analysis used structured household, promoter and spot check survey data. The key variables included chlorine availability, user knowledge, household demographics, and engagement with promoters. RF, ANN, and NB models were trained and evaluated individually, then combined using a stacked ensemble approach. The ensemble model outperformed all base learners, achieving the highest accuracy (69.1%) and AUC (0.6959). The variable importance analysis revealed that the presence of chlorine and the knowledge of the user were the strongest predictors of adoption. In conclusion, ensemble learning provides a reliable method for modeling behavioral adoption in public health interventions. The findings offer practical insights for programs and demonstrate the potential of machine learning in improving, targeting and monitoring of safe water initiatives in low-resource settings.
Abstract: TheDispensersforSafeWaterprogramunderEvidenceActionpromotespoint of collection water treatment through the installation of chlorine dispenser gadgets in rural parts of Kenya. Although the initiative has improved access to safe drinking water, monitoring household adoption remained a challenge during the COVID-19 pandemic, which limited field-based ...
Show More
Research Article
Rapid Data Sorting Technique with Efficient and Dynamic Approach
Issue:
Volume 11, Issue 6, December 2025
Pages:
178-185
Received:
16 September 2025
Accepted:
20 October 2025
Published:
12 November 2025
DOI:
10.11648/j.ijdsa.20251106.13
Downloads:
Views:
Abstract: In the data science world, massive amounts of data need to be processed efficiently as part of a high-volume of data processing. As the input data sets are highly disordered, we need to embed the appropriate algorithm to arrange the data in the required order for SQL queries to process the data quickly. Processing data in billions or trillions of rows has become common use cases. Robust data management strategies are required to handle increasing data volume. The main reason for data growth is use of IoT devices, ERP platforms, Social media apps, e-Commerce platforms, streaming data and AI / ML creates more data for data insights. A delay in few milliseconds for each input data sorting can make a difference of several minutes to hours when the system is processing larger data sets. The data sorting mechanisms are measured by their time complexity with the input element size benchmarking the processing time and resources consumed on a specific system. The data sorting performance can be improved by reducing the number of intensive operations (number of CPU cycles) and memory usage for each process when the data is sorted. “Rapid Data Sorting” provides much more efficiency to the program and thereby helps to improve the overall data processing speed. After extensive research and rigorous testing, the proposal below was formulated.
Abstract: In the data science world, massive amounts of data need to be processed efficiently as part of a high-volume of data processing. As the input data sets are highly disordered, we need to embed the appropriate algorithm to arrange the data in the required order for SQL queries to process the data quickly. Processing data in billions or trillions of r...
Show More
Research Article
Enhancing Early Tuberculosis Detection Using CGAN Augmentation and Deep Transfer Learning Models
Issue:
Volume 11, Issue 6, December 2025
Pages:
186-204
Received:
13 October 2025
Accepted:
29 October 2025
Published:
28 November 2025
DOI:
10.11648/j.ijdsa.20251106.14
Downloads:
Views:
Abstract: Tuberculosis (TB) remains a leading infectious disease worldwide, and early, reliable screening using chest X-rays (CXRs) is essential in low-resource settings. The scarcity of labeled TB-positive CXR images limits the effectiveness of deep learning models. This study investigates whether Conditional Generative Adversarial Networks (CGANs) can generate realistic TB-positive CXR images to balance training data and improve the classification performance of fine-tuned deep transfer learning (DTL) models. We trained a CGAN (LSGAN formulation) to synthesize class-conditional grayscale CXR images at 128x128 resolution and used the generated images to augment the Shenzhen TB dataset. Three pre-trained DTL architectures (DenseNet121, VGG16, and MobileNetV3Small) were fine-tuned on both original and CGAN-augmented datasets. Experiments used stratified 70/10/20 train/validation/test splits and a fixed random seed (random_state=42) to ensure reproducibility. Model performance was evaluated using accuracy, precision, recall (sensitivity), F1-score, confusion matrices, and ROC/AUC curves. The experiments were executed on an NVIDIA Tesla P100 GPU (16GB) in a Kaggle runtime environment; total CGAN+classifier processing reported a wall-clock runtime of 39 minutes 30 seconds for the baseline experimental run. CGAN augmentation produced consistent improvements across models: DenseNet121 improved from 93.0% to 94.6% test accuracy, VGG16 improved from 96.3% to 96.8%, and MobileNetV3Small improved from 93.0% to 93.5%. Class-conditional GAN augmentation can modestly but usefully improve DTL classifier performance in TB detection when labeled data are scarce, though further cross-dataset validation is required before clinical deployment.
Abstract: Tuberculosis (TB) remains a leading infectious disease worldwide, and early, reliable screening using chest X-rays (CXRs) is essential in low-resource settings. The scarcity of labeled TB-positive CXR images limits the effectiveness of deep learning models. This study investigates whether Conditional Generative Adversarial Networks (CGANs) can gene...
Show More