Swahili questions and answers dataset for aflatoxin knowledge domain

View/ Open
Date
2025-03-20Author
Chogo, Pamela
Mkoba, Elizabeth
Kassim, Neema
Metadata
Show full item recordAbstract
Aflatoxin contamination is a challenge facing food security, health, and trade in Tanzania and other parts of the world. This contamination affects maize, groundnuts, and other crops and animal products. Once contamination occurs, the contaminated crops and animal products become toxic causing illness or death to humans and animals who consume them. Lack of awareness and knowledge of the contamination is seen to be one of the reasons for its continued occurrence. Various awareness-creation and knowledge-sharing techniques have been used but the situation is still not appealing. For this case, the use of a Natural Language Processing (NLP) chatbot in sharing aflatoxin knowledge is proposed. This is because NLP chatbots have been successful in knowledge sharing in various contexts. This data article presents a Swahili text-based aflatoxin knowledge questions and answers dataset. Data were collected through 7 focus group discussion (FGD) sessions conducted in Arusha, Dodoma, Mtwara, Tabora, Morogoro, and Iringa regions in Tanzania. Respondents for the study were farmers, traders, and consumers of maize and groundnuts. The collected data were processed and analyzed using R qualitative data analysis tool. This allowed the identification of 6 themes with respective questions under each theme. The questions were shared with experts through 9 interview sessions and the experts gave answers to the questions. The set of questions and answers were then translated into Swahili language using google translate and manual verification. Finally, an aflatoxin knowledge dataset containing 221 paired questions and answers organized into 6 knowledge areas Swahili dataset was developed. With this dataset, an NLP-based chatbot that uses Swahili language can be developed. This will be beneficial to farmers, traders, consumers, researchers, and policymakers. They can use it to learn more about aflatoxin and be able to make informed decisions. Moreover, the dataset can be adopted and modified to create NLP chatbots that can share aflatoxin knowledge in other languages apart from Swahili. The dataset also contributes to the availability of Swahili language datasets.