Data Balancing Techniques for Predicting Student Dropout Using Machine Learning

dc.contributor.authorMduma, Neema
dc.date.accessioned2023-02-28T07:09:02Z
dc.date.available2023-02-28T07:09:02Z
dc.date.issued2023-02-27
dc.descriptionThis research article was published MDPI, 2023en_US
dc.description.abstractPredicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates.en_US
dc.identifier.urihttps://doi.org/10.3390/data8030049
dc.identifier.urihttps://dspace.nm-aist.ac.tz/handle/20.500.12479/1804
dc.language.isoenen_US
dc.publisherMDPIen_US
dc.subjectStudent dropouten_US
dc.subjectMachine learningen_US
dc.subjectData samplingen_US
dc.subjectImbalanced datasetsen_US
dc.titleData Balancing Techniques for Predicting Student Dropout Using Machine Learningen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JA_CoCSE_2023.pdf
Size:
2 MB
Format:
Adobe Portable Document Format
Description:
Full text

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: