Data driven approach for predicting student dropout in secondary schools

Show simple item record

dc.contributor.author Mduma, Neema
dc.date.accessioned 2020-09-14T07:46:45Z
dc.date.available 2020-09-14T07:46:45Z
dc.date.issued 2020-06
dc.identifier.uri https://dspace.nm-aist.ac.tz/handle/123456789/898
dc.description A Thesis Submitted in Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and Technology en_US
dc.description.abstract Student dropout is among the challenges that face most schools in developing countries particularly in Africa. In Tanzania alone, student dropout in secondary schools is pronounced to be around 36%. In addressing the student dropout problem, a thorough understanding of the fundamental factors that cause the student dropout is essential. Several researchers have identified and proposed causes, methods and strategies that will help to reduce or stop the student dropout problem, however, most of the proposed solutions didn’t show promising results and the students dropout trend continue to increase over time. This study focused on developing a data driven approach that will help to identify and predict students who are at risk of dropping out of school in order to facilitate an intervention program as an active measure in eliminating the problem of dropout in Tanzania. In doing so, (a) 122 research articles were examined, (b) 4 focus group discussions and 2 round table surveys with 38 respondents from 5 districts (Arusha, Mbeya, Kisarawe, Rufiji and Nzega) were conducted, and (c) 3 datasets from Tanzania and India were used in order to identify factors that contribute significantly to student dropout problem, disclose the best classifier from the commonly used classifiers (Logistic Regression, Random Forest, K-nearest Neighbor and Multilayer Perceptron) and assessing the data balancing techniques for predictive performance of the model. Results revealed that, most of the respondents mentioned students’ gender, age, parent’s income, number of qualified teachers and remoteness as the main contributing factors to the students’ dropout problem in secondary schools. Furthermore, results from the examined articles indicated that, most studies conducted in developing countries focused on the social aspects of student dropout, and a paltry mentioned the use of other approaches such as machine learning. Nevertheless, results from data driven approach development shows that the Logistic Regression and Multilayer perceptron achieved the highest performance when over-sampling technique was employed. Also, the hyper parameter tuning improved the algorithm's performance compared to its baseline settings, and stacking of the classifiers improved the overall predictive performance of the developed approach. The study, therefore, recommends the developed approach to be considered by relevant authorities in identifying and predicting students at risk of dropping out for early intervention, planning and informative decisions making on addressing the student dropout problem. en_US
dc.language.iso en en_US
dc.publisher NM-AIST en_US
dc.rights Attribution-NonCommercial-ShareAlike 4.0 International *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/ *
dc.subject Research Subject Categories::TECHNOLOGY en_US
dc.title Data driven approach for predicting student dropout in secondary schools en_US
dc.type Thesis en_US

Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International

Search Our NM-AIST IR


My NM-AIST IR Account