Big data analytics framework for childhood infectious disease surveillance system using modified mapreduce algorithm: a case study of Tanzania
Abstract
Tanzania has been affected with a potential emerging and re-emerging of infectious diseases such
as diarrhea, acute respiratory infections, pneumonia, hepatitis, and measles. There is an
increasing trend for the occurrences of new emerging pandemic diseases such as the coronavirus
(Covid-19) in 2020 as well as re-occurrence of old infectious diseases such as cholera epidemic
in 2015-2017, chikungunya and dengue fever outbreak in 2010, 2012, 2014, 2018, and 2019
which affected different regions in Tanzania. These diseases by far are the main causes of the
high mortality rate for women and children of 0-5 years of age. The traditional disease
surveillance system as the foundation of the public healthcare practices has been facing
challenges in data collection and analysis using health big data sources to prevent and control
infectious diseases. Health big data sources on infectious diseases have been recognized as the
potential supplement for the provision of evidence-based decision-making worldwide. Tanzania
as one of the resource-limited setting countries has lagged because of the challenges in
information technology infrastructure and public healthcare resources. The traditional disease
surveillance system is still paper-based, semi-automated, and limited in scope which relies on
clinical-oriented patient data sources and leaving out nontraditional and pre-diagnostic
unstructured big data sources. This research study aimed to improve the traditional infectious
disease surveillance system to employ big data analytics technology in healthcare data collection
and analysis to improve decision-making. Big data analytics framework for the childhood
infectious disease surveillance system was developed which guides healthcare professionals to
streamline the collection and analysis of health big data for infectious disease surveillance. The
framework was then fairly compared with the existing framework in its performance using
infrastructures, data size and transformation, and running-time execution of the systems. The
experimental results indicate the efficiency of the framework system performance with the
highest running time execution of about 56% quicker over the traditional system. Also, it has the
best performance in processing multiple data structures using additional processing units. In
particular, the proposed framework can be adopted to improve the prenatal and postnatal
healthcare system in Tanzania.