An Experimental Analysis of Traditional Machine Learning Algorithms for Maize Yield Prediction

Authors

  • Souand P.G. Tahi Laboratory of Biomathematics and Forest Estimations, Faculty of Agronomic Sciences, University of Abomey-Calavi, 04 BP 1525 Cotonou, Benin https://orcid.org/0000-0003-1101-2624
  • Castro G. Hounmenou Laboratory of Biomathematics and Forest Estimations, Faculty of Agronomic Sciences, University of Abomey-Calavi, 04 BP 1525 Cotonou, Benin https://orcid.org/0000-0002-2306-6083
  • Vinasetan Ratheil Houndji Laboratory of Biomathematics and Forest Estimations, Faculty of Agronomic Sciences, University of Abomey-Calavi, 04 BP 1525 Cotonou, Benin https://orcid.org/0000-0002-5467-9448
  • Romain Glèlè Kakaï Laboratory of Biomathematics and Forest Estimations, Faculty of Agronomic Sciences, University of Abomey-Calavi, 04 BP 1525 Cotonou, Benin https://orcid.org/0000-0002-6965-4331

DOI:

https://doi.org/10.37256/cm.5420244481

Keywords:

classic machine learning, ensemble learning, maize, yield prediction, secondary data

Abstract

Maize plays a significant role in the African diet and is one of the main staple foods in many parts of the continent. Accurate yield estimations ensure an adequate food supply, contributing to food security and reducing the risk of food shortages. They also enable market planning and price setting. Machine learning is well known as one of the most advanced statistical methods for predicting crop yields. This paper provides extensive experiment results of machine learning models on maize production. Thirteen basic supervised learning algorithms classified into classic and ensemble learning are compared using three datasets of different sizes and from various sources (Kaggle, Zenodo). These datasets are from three main origins: experimentation, specifically covering crop data with 240 observations; predictions on crop yield from the FAO (Food and Agriculture Organization) and World Data Bank with 4,121 observations; and historical data from China with 975 observations. The metrics used to evaluate the models are the coefficient of determination, the mean absolute error, the root mean square error, and the explained variance score. Moreover, permutation importance is used on the best models to identify the most relevant predictors for the models according to the data. The results show that extremely randomized trees (ERT) and extreme gradient boosting (XGBoost) are more suitable for predicting maize yield with a coefficient of determination between 0.75 and 0.96 and 0.73 and 0.96, respectively. With the other metrics, the ERT model shows a low performance. Its training time varies between 2,547 and 7,814 seconds as obtained from a computer with characteristics of HP core i5, CPU @ 1.00 GHz, 1.9 GHz, and 8 GB RAM under 134 Windows 10. ERT and XGBoost are best suited to these databases of varying dimensions, making them perfect for predicting maize yield and streamlining decision-making processes.

Downloads

Published

2024-12-19

How to Cite

1.
Tahi SP, Hounmenou CG, Houndji VR, Kakaï RG. An Experimental Analysis of Traditional Machine Learning Algorithms for Maize Yield Prediction. Contemp. Math. [Internet]. 2024 Dec. 19 [cited 2024 Dec. 21];5(4):6208-24. Available from: https://ojs.wiserpub.com/index.php/CM/article/view/4481