本平台为互联网非涉密平台,严禁处理、传输国家秘密或工作秘密

跨配方、跨规格卷烟烟气成分数据同质性分析与随机森林建模

Homogeneity analysis and random forest modeling of cigarette smoke data cross blend formulas and specifications

  • 摘要: 为探究烟支材料设计与烟气成分之间的关系,实现焦油释放量等的快速预测,克服传统方法需针对不同配方及规格卷烟分别进行材料参数组合实验与建模工作量大、耗时长且模型适用范围窄、数据使用效率低的缺点,针对不同配方及烟支规格下烟支材料设计参数和烟气成分数据,本研究中提出了一种数据同质性评估方法。该方法采用分布分析、检验分析及聚类分析,综合评估数据特征,筛选合适的数据集进行整合,采用随机森林(RF)这一非线性机器学习算法,以卷烟配方和烟支材料设计参数为自变量,建立了烟气常规成分及抽吸口数预测模型。结果表明:①基于整合数据集构建的跨配方、跨规格模型的预测性能提升,五折交叉验证测试集的平均绝对百分比误差低至2.7%。②新配方或规格卷烟仅需提供3组及以上实测数据,即可采用该方法调整优化模型,实现不同配方和不同烟支规格下烟气成分释放量的快速预测,平均绝对百分比误差总体约为10%。

     

    Abstract: In order to investigate the relationships between materials for cigarette design and components in cigarette mainstream smoke and to achieve rapid prediction of tar release, this study proposed a data homogeneity evaluation method for material parameters and smoke chemistry data across different cigarette formulas and specifications to overcome the shortcomings of traditional methods, which necessitate separate material parameter combination experiments and modeling for different cigarette formulas and specifications, resulting in a large workload, long analysis time, narrow model applicability and low data utilization efficiency. The new method employed the distribution analysis, check analysis and cluster analysis to comprehensively evaluate data characteristics and select appropriate datasets for integration. A nonlinear machine learning algorithm, random forest (RF), was used to establish prediction models for routine cigarette smoke components and puffing counts, with cigarette blend formula and material design parameters as independent variables. The results showed that: 1) The established models for different cigarette formulas and specifications based on integrated datasets exhibited enhanced predictive performance, achieving a mean absolute percentage error of 2.7% of five-fold cross-validation test set. 2) For a given set of cigarette samples with new formulas or specifications, only three or more sets of actual measurement data were required to adjust and optimize the model using this method, enabling rapid prediction of smoke component releases under different formulas and cigarette specifications with a mean absolute percentage error around 10%.

     

/

返回文章
返回