Abstract:
In order to investigate the relationships between materials for cigarette design and components in cigarette mainstream smoke and to achieve rapid prediction of tar release, this study proposed a data homogeneity evaluation method for material parameters and smoke chemistry data across different cigarette formulas and specifications to overcome the shortcomings of traditional methods, which necessitate separate material parameter combination experiments and modeling for different cigarette formulas and specifications, resulting in a large workload, long analysis time, narrow model applicability and low data utilization efficiency. The new method employed the distribution analysis, check analysis and cluster analysis to comprehensively evaluate data characteristics and select appropriate datasets for integration. A nonlinear machine learning algorithm, random forest (RF), was used to establish prediction models for routine cigarette smoke components and puffing counts, with cigarette blend formula and material design parameters as independent variables. The results showed that: 1) The established models for different cigarette formulas and specifications based on integrated datasets exhibited enhanced predictive performance, achieving a mean absolute percentage error of 2.7% of five-fold cross-validation test set. 2) For a given set of cigarette samples with new formulas or specifications, only three or more sets of actual measurement data were required to adjust and optimize the model using this method, enabling rapid prediction of smoke component releases under different formulas and cigarette specifications with a mean absolute percentage error around 10%.