dc.description.abstract | Data scarcity is a significant challenge in the field of Machine Learning (ML), as data
collection can be expensive, time‐consuming, and difficult, particularly in developing countries.
This challenge is exaggerated on the need to use dataset for livestock disease predictions for early
intervention and surveillance. To address this challenge, this paper presents a data synthesis
method that has been used to accurately generate new data samples from few real‐world data. With
much data available to train the ML models, overfitting is eliminated. We present the use of
Generative Adversarial Networks mainly the Conditional Tabular Generative Adversarial Network
to synthesize categorical data for training machine learning models for prediction of the Pestes des
Petits Ruminants (PPR) disease. The results showed that training score became 0.89 and the cross‐
validation score was 0.87 after synthesized data was used with Random Forest algorithm. The
resulting dataset can be used to support the prediction and surveillance of the Pestes des Petits
Ruminants (PPR) disease. The proposed method can also be applied to any domain with categorical
data, and has the potential to improve the performance of machine learning models with increased
data availability. | en_US |