• Login
    View Item 
    •   NM-AIST Home
    • Computational and Communication Science Engineering
    • Research Articles [CoCSE]
    • View Item
    •   NM-AIST Home
    • Computational and Communication Science Engineering
    • Research Articles [CoCSE]
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Machine Learning Model for Imbalanced Cholera Dataset in Tanzania.

    Thumbnail
    View/Open
    Full text (1.574Mb)
    Date
    2019-07-25
    Author
    Leo, Judith
    Luhanga, Edith
    Michael, Kisangiri
    Metadata
    Show full item record
    Abstract
    Cholera epidemic remains a public threat throughout history, affecting vulnerable population living with unreliable water and substandard sanitary conditions. Various studies have observed that the occurrence of cholera has strong linkage with environmental factors such as climate change and geographical location. Climate change has been strongly linked to the seasonal occurrence and widespread of cholera through the creation of weather patterns that favor the disease's transmission, infection, and the growth of , which cause the disease. Over the past decades, there have been great achievements in developing epidemic models for the proper prediction of cholera. However, the integration of weather variables and use of machine learning techniques have not been explicitly deployed in modeling cholera epidemics in Tanzania due to the challenges that come with its datasets such as imbalanced data and missing information. This paper explores the use of machine learning techniques to model cholera epidemics with linkage to seasonal weather changes while overcoming the data imbalance problem. Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were used to the restore sampling balance and dimensional of the dataset. In addition, sensitivity, specificity, and balanced-accuracy metrics were used to evaluate the performance of the seven models. Based on the results of the Wilcoxon sign-rank test and features of the models, XGBoost classifier was selected to be the best model for the study. Overall results improved our understanding of the significant roles of machine learning strategies in health-care data. However, the study could not be treated as a time series problem due to the data collection bias. The study recommends a review of health-care systems in order to facilitate quality data collection and deployment of machine learning techniques.
    URI
    https://doi.org/10.1155/2019/9397578
    https://dspace.nm-aist.ac.tz/handle/20.500.12479/976
    Collections
    • Research Articles [CoCSE]

    Nelson Mandela-AIST copyright © 2021  DuraSpace
    Theme by 
    Atmire NV
     

     

    Browse

    All PublicationsCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Nelson Mandela-AIST copyright © 2021  DuraSpace
    Theme by 
    Atmire NV