Abstract:
To improve the identification accuracy for tobacco leaf production areas and their category prediction accuracy using near-infrared (NIR) spectroscopy analysis and when dealing with a large number of samples with high similarity and numerous classifications. A total of 4 625 tobacco leaf samples were collected from eight small production regions in Yunnan Province, and one-dimensional near-infrared spectral data were transformed into two-dimensional image data. The convolutional neural network (CNN) algorithm was used to build an identification model for tobacco leaves from these small regions, and the effects of different machine-learning algorithms were also compared. The results showed that: 1) Conventional machine-learning algorithms such as principal component analysis (PCA) and support vector machine (SVM) were generally not very effective in classifying tobacco leaves from multiple adjacent regions. The overall accuracies of the training and test sets of the SVM algorithm were 78.86% and 69.08%, respectively. 2) The accuracies of the training and test sets of CNN reached 97.41% and 92.54%, respectively, which were 18.55 and 23.46 percentage points higher than those of the SVM algorithm. By transforming the dimension of the NIR spectral data and combining with the CNN algorithm, more sample characteristics could be extracted and effectively applied to the classification and identification of tobacco leaves from small regions.