Show simple item record

dc.contributor.authorLeo, Judith
dc.date.accessioned2020-09-14T07:40:36Z
dc.date.available2020-09-14T07:40:36Z
dc.date.issued2020-08
dc.identifier.urihttp://doi.org/10.58694/20.500.12479/897
dc.descriptionA Thesis Submitted in Fulfilment of the Requirements for the Degree of Doctor of hilosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and Technologyen_US
dc.description.abstractThe Cholera epidemic remains a public threat throughout history, affecting vulnerable populations living with unreliable water and sub-standard sanitary conditions. Studies have observed that the occurrence of cholera has also, strong linkage with seasonal weather patterns. Over the past decades, there have been great achievements in developing cholera epidemic models which have focused on using mathematical techniques. However, most existing prediction systems have some challenges such as lack of flexibility, not user friendly, in-effective and also, lack integration of essential weather variables. In addition, the use of advanced technology such as machine learning (ML) have not been explicitly deployed in modeling cholera epidemics in developing countries including Tanzania; due to the challenges that come with its datasets such as missing-information, data-inconsistency, imbalance-class and other uncertainties. The aim of this work was to overcome and complement the existing challenges of cholera epidemic models by taking the advantages of ML techniques. Hence, by developing an ML model that is capable of predicting cholera epidemic outbreaks based-on seasonal weather changes linkages in Tanzania. Secondary datasets from Tanzania Meteorological Agency (TMA), the Ministry of Health and Social Welfare, and Dar es Salaam Water and Sewerage Authority (DAWASCO) were used. Then, Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were applied to restore sampling balance and dimensions of the dataset. In order to determine which ML algorithms were best able to predict (yes/no) whether cholera epidemic would occur given the weather variables, ten classification algorithms were evaluated using F1-score, sensitivity and balancedaccuracy metrics. The Friedman-test was then used to determine whether the performance of the models was statistically significant. Results showed that Random Forest, Bagging, and ExtraTree classifiers had the best performance, with 74%, 74.1% and 71.9% accuracy respectively. The ensemble method of model fine-tuning was then applied in order to obtain one model from the three, and an overall accuracy of 78.5% was achieved. Lastly, a model evaluation process was performed on the selected final model. The model validation process involved four processes: The first evaluation process re-ran the final model using the same dataset but without the weather variables; which resulted into confirming that the model with weather variables to have higher performance compared to the model without the weather variable. The second evaluation process re-ran the model-development procedure using datasets from Tanga and Songwe regions in order to illustrate on how the adaptive reference model can be referenced by other researchers. The third and fourth model evaluation involved mixed-design approach of quantitative and qualitative methods using focus group discussions and interviewer-administered questionnaires with 500 and 20 stakeholders (including; medical officers, epidemiological analysts, nurses, environmental experts, ICT experts and cholera patients) respectively. The results of the third evaluation process proved that 90% of the responses agreed that, the developed model is robust and appropriate to work in least developing countries towards effective prediction of cholera epidemics. Whereas, the results of the fourth evaluation process proved also that cholera ML model is better in terms of their usability, expandability and computational complexity compared to the cholera statistical models. Overall, the study improved our understanding of the significant roles of ML strategies in health-care data. However, the study could not be treated as a time series problem due to data collection bias such as data-inconsistency in terms of time. The study recommends a review of health-care systems in order to facilitate quality data collection and further deployment of ML techniques in the health sector in Tanzania.en_US
dc.language.isoenen_US
dc.publisherNM-AISTen_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/*
dc.subjectResearch Subject Categories::NATURAL SCIENCESen_US
dc.titleA reference machine learning model for prediction of cholera epidemics based-on seasonal weather changes linkages in Tanzaniaen_US
dc.typeThesisen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International