Abstract:
To accurately identify the blending ratios of strips and stems in cut tobacco blends, terahertz spectroscopy combined with machine learning was applied to study the spectral characterization of blended tobacco samples at different ratios to refine the classification and identification methods. The absorption spectra of tobacco blends at different blending ratios were analyzed in the bandwidth of 0-2.0 THz. Four classification models including Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) classification models were developed and tested. The classification outcomes of the above models were validated for tobacco samples with different cut stem contents. The results showed that the SVM classification model based on the absorption coefficient had the best general result. For the tobacco samples with cut stem contents ranging from 2% to 10%, the SVM model achieved an accuracy of 91.19% in internal validation and an identification rate of 80.56% in external validation. For the tobacco samples with cut stem contents ranging from 10% to 50%, the SVM model achieved 92.27% accuracy in internal validation and 86.25% identification rate in external validation. The machine learning method based on absorption coefficients therefore can be used to identify tobacco blends with different cut stem contents, which provides a reference for detecting tobacco blending uniformity in production.