本平台为互联网非涉密平台,严禁处理、传输国家秘密或工作秘密

基于随机森林回归的制丝过程参数影响权重分析

Weight analysis of primary processing parameters based on random forest regression

  • 摘要: 为提高制丝工艺质量评价中参数赋权分析的科学性和客观性,选取“云烟”某规格一类卷烟制丝过程全批次数据的稳态数据样本,通过Pearson相关性矩阵筛选各工序出口含水率的解释变量,然后利用随机森林回归进行建模分析,采用拟合优度和五折交叉验证的测试集标准化均方误差分别验证模型的拟合效果和外推预测性能,最终根据OOB均方误差的平均递减值进行解释变量影响权重的测度和关键参数的筛选。结果表明:①综合Pearson相关性矩阵和设备控制原理,筛选得到37个解释变量;②制丝过程5个工序随机森林回归模型的拟合优度均大于0.9、五折交叉验证测试集的标准化均方误差均小于1,表明模型的拟合效果和外推预测性能较好;③根据解释变量影响权重的测度分析,筛选得到18个关键参数;④基于全样本数据建立的制丝过程关键参数筛选和赋权方法,可为制丝关键质量特性精准控制和工艺质量评价提供参考。

     

    Abstract: In order to weight the parameters in primary processing scientifically and objectively, the explanatory variables of moisture content in output tobacco from each processing step were screened by Pearson correlation matrix based on the steady state data samples of a whole processed batch of a selected specification of brand "Yunyan". The prediction models were established by random forest regression, and further verified by goodness of fit and NMSE(Normalized Mean Square Error)of test set of 5-fold cross-validation. Finally, the weights of explanatory variables were calculated and key parameters were screened by the average decrement of OOB (Out of bag) mean square error. The results showed that:1) Thirty-seven explanatory variables were screened out by Pearson correlation matrix combined with equipment control principle. 2)The goodness of fit of models for five processing steps was above 0.9 and NMSE of test set of 5-fold cross-validation was less than 1, which proved the fitting and extrapolation prediction performance of the models. 3)Eighteen key parameters were screened out on the basis of the weight of explanatory variables. 4)The proposed method for screening and weighting key parameters in primary processing provided a reference for the precise control of primary processing and the assessment of processing quality.

     

/

返回文章
返回