Thesis Title: Determinants of Double and Triple Burden of Non-communicable Diseases and Its Prediction Using Machine Learning Techniques: Evidence from Bangladesh Demographic and Health Survey

Background and objectives: Globally, non-communicable diseases (NCDs) are a primary public health issue and have taken the top position in the cause of death in Bangladesh. This study aimed to estimate the prevalence of double burden of NCDs (DBNCDs) and triple burden of NCDs (TBNCDs) considering hypertension, diabetics and overweight & obesity, and explore the risk factors of DBNCDs and TBNCDs, as well as predict its performance using machine learning techniques.

 

Materials and methods: A total sample of 12,151 participants’ (5238 males and 6913 females) from 2017-18 Bangladesh Demographic and Health Survey (BDHS) were included for the purpose of analysis. Descriptive statistics were performed to calculate the distribution and prevalence of DBNCDs and TBNCDs. Bivariate and multilevel logistic regression analysis were used to assess the individual-and community-level determinants of DBNCDs and TBNCDs. Furthermore, the study had adopted six classifiers like k-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), naive bayes (NB), random forest (RF), extreme gradient boosting (XGBoost) to predict the DBNCDs and TBNCDs. Three types of partition protocols (K2, K5 & K10) had also adopted to measure the performance of six classifiers and repeated these protocols into 10 trials. The accuracy (ACC) and area under the curve (AUC) were used to assess the performance of the classifiers.

 

Results: The prevalence of DBNCDs and TBNCDs were 14.3% and 2.3%, respectively. At individual-level, higher age, female gender, currently and formerly/ever married, richest, higher educated were more likely to suffer from the DBNCDs and TBNCDs. Furthermore, at community-level, division had significant association with DBNCDs and TBNCDs. In addition, family size had significant effect on DBNCDs and caffeinate drink and community poverty had significant effect on TBNCDs. RF-based classifier gave highest ACC and AUC for all the three partition protocols in case of both DBNCDs (for K2, ACC = 77.88% & AUC = 0.91; for K5, ACC = 80.19% & AUC = 0.91; for K10, ACC = 81.06% & AUC = 0.93) as well as TBNCDs ((for K2, ACC = 87.39% & AUC = 0.96; for K5, ACC = 88.54% & AUC = 0.96; for K10, ACC = 88.61% & AUC = 0.97).

Conclusion: This study identifies several individual-and community level factors i.e. age, gender, marital status, wealth index, education level and division which are significantly associated with both DNCDs and TNCDs. Moreover, RF-based classifier provided the best performance. Government and nongovernment health organization should pay proper attention to handle the burden of NCDs in Bangladesh.

Details
Role Supervisor
Class / Degree Masters
Students

Md. Akib Al-Zubayer (Student ID: MS 202013)

Start Date 2019
End Date 2022