Abstract:
To objectively evaluate the non-volatile components of domestic and foreign tobacco extracts, UPLC-Q-Orbitrap/HRMS technology was used to conduct a non-targeted analysis of these components in typical tobacco extract samples. Two supervised learning algorithms, orthogonal partial least squares discriminant analysis (OPLS-DA) and support vector machine (SVM), were used to construct models exploring the differences in the non-volatile components of domestic and foreign tobacco extracts. The predictive abilities of these models for leaf origin traceability were then compared. The results showed that: 1) A total of 167 non-volatile components were identified and classified into 14 compound categories from the tobacco extracts. The numbers of flavonoids, organic acids, and terpenoids were highest, while the number of esters was relatively low. 2) Utilizing the constructed OPLS-DA model, 26 characteristic components were screened (
VIP > 1 and
P < 0.05). Of these, the contents of 11 components, including DL-proline and rutin, were significantly higher in domestic samples than in foreign samples, while the contents of 15 components, such as D-(+)-tryptophan and 5-hydroxymethylfurfural, were significantly higher in foreign samples than in domestic samples. The model was used to identify and trace the domestic and foreign tobacco extracts. The prediction accuracy for single extract samples and flavor base modules was 100.0% and 67.5%, respectively. 3) An SVM prediction model was established, and the identification accuracy for the origin traceability of single extract samples reached 100.0%, while the prediction accuracy for flavor base modules reached 75.0%, which was slightly higher than that of the OPLS-DA model. This study provides new analytical methods for identifying the composition of tobacco extracts, and can be used as technical references for identifying and tracing the origin of domestic and foreign tobacco extracts, as well as for their flavoring.