本平台为互联网非涉密平台,严禁处理、传输国家秘密或工作秘密
寇冉冉, 王聪, 杨松, 苏明亮, 宛然, 郭榕, 张建平, 李庆祥, 毕一鸣, 郭军伟, 王洪波, 赵乐, 付瑜锋, 许衡, 刘泽春, 聂聪. 基于支持向量机-混合核算法的不均衡多产区烤烟产地判别模型构建[J]. 烟草科技, 2025, 58(8): 19-27, 65. DOI: 10.16135/j.issn1002-0861.2025.0236
引用本文: 寇冉冉, 王聪, 杨松, 苏明亮, 宛然, 郭榕, 张建平, 李庆祥, 毕一鸣, 郭军伟, 王洪波, 赵乐, 付瑜锋, 许衡, 刘泽春, 聂聪. 基于支持向量机-混合核算法的不均衡多产区烤烟产地判别模型构建[J]. 烟草科技, 2025, 58(8): 19-27, 65. DOI: 10.16135/j.issn1002-0861.2025.0236
KOU Ranran, WANG Cong, YANG Song, SU Mingliang, WAN Ran, GUO Rong, ZHANG Jianping, LI Qingxiang, BI Yiming, GUO Junwei, WANG Hongbo, ZHAO Le, FU Yufeng, XU Heng, LIU Zechun, NIE Cong. Construction of an origin discrimination model for flue-cured tobacco from imbalanced multi-regions based on Support Vector Machine with hybrid kernel algorithm[J]. Tobacco Science & Technology, 2025, 58(8): 19-27, 65. DOI: 10.16135/j.issn1002-0861.2025.0236
Citation: KOU Ranran, WANG Cong, YANG Song, SU Mingliang, WAN Ran, GUO Rong, ZHANG Jianping, LI Qingxiang, BI Yiming, GUO Junwei, WANG Hongbo, ZHAO Le, FU Yufeng, XU Heng, LIU Zechun, NIE Cong. Construction of an origin discrimination model for flue-cured tobacco from imbalanced multi-regions based on Support Vector Machine with hybrid kernel algorithm[J]. Tobacco Science & Technology, 2025, 58(8): 19-27, 65. DOI: 10.16135/j.issn1002-0861.2025.0236

基于支持向量机-混合核算法的不均衡多产区烤烟产地判别模型构建

Construction of an origin discrimination model for flue-cured tobacco from imbalanced multi-regions based on Support Vector Machine with hybrid kernel algorithm

  • 摘要: 为建立稳健、准确的不均衡多产区烤烟产地判别模型,选取某企业3年库存醇化片烟,覆盖国内外12个产区,基于近红外-化学成分快速分析技术获取烟叶70种化学指标(68种化学成分、pH和二氯甲烷提取物)的含量或数据,采用粒子群优化算法(PSO)对支持向量机(SVM)各核函数进行参数优化,构建不均衡多产区烤烟产地判别模型,并与反向传播神经网络(BPNN)、随机森林(RF)、Fisher判别分析(FDA)等模型进行对比评估。结果表明:①基于SVM-混合核算法构建的烤烟产地判别模型能有效学习关键特征,实现各个产区样品的高精度分类,其训练集、测试集总体判别准确率分别达99.69%和99.59%。②与BPNN、RF、FDA模型相比,SVM-混合核模型测试集总体判别准确率分别提高4.55、6.20、6.61百分点。③在样品数量分布极不均衡的12个产区的预测上,SVM-混合核模型macro召回率、macro精确率、macro F1分数分别为0.995 1、0.998 5、0.996 8,相较于BPNN、RF、FDA模型,macro召回率分别提高了0.299 1、0.326 4、0.406 5;macro精确率分别提高了0.347 6、0.291 3、0.412 4;macro F1分数分别提高了0.324 1、0.309 4、0.409 5。与BPNN、RF、FDA模型相比,基于SVM-混合核算法构建的烤烟产地判别模型可实现对不均衡多产区样品的快速、稳健和精准判别。

     

    Abstract: To establish a robust and accurate model for discriminating the origin of flue-cured tobacco samples from imbalanced multi-regions, tobacco strip samples aged for three years were selected from a specific company. The growing areas of these samples cover 12 domestic and international regions. The contents of 68 chemical components, the pH values and the dichloromethane extract yield of the samples were obtained using near-infrared chemical component rapid analysis technology. The Particle Swarm Optimization (PSO) algorithm was used to optimize the parameters of the Support Vector Machine (SVM) kernels to construct the imbalanced multi-region flue-cured tobacco origin discrimination model. This model was then compared and evaluated against the Backpropagation Neural Network (BPNN), Random Forest (RF), and Fisher Discriminant Analysis (FDA) models. The results showed that: 1) The flue-cured tobacco origin discrimination model based on the SVM with hybrid kernel algorithm effectively learned key features and achieved high discrimination accuracy for these samples from different regions. The overall discrimination accuracies of the training set and test set reached 99.69% and 99.59%, respectively. 2) Compared with the BPNN, RF, and FDA models, the SVM with hybrid kernel model achieved an increased overall discrimination accuracy of 4.55, 6.20, and 6.61 percentage points on the test set, respectively. 3) When predicting samples from 12 regions with a highly imbalanced distribution of numbers, the macro recall, macro precision, and macro F1 score of the SVM with hybrid kernel model were 0.995 1, 0.998 5, and 0.996 8, respectively. Compared with the BPNN, RF, and FDA models, the macro recall increased by 0.299 1, 0.326 4, and 0.406 5; the macro precision increased by 0.347 6, 0.291 3, and 0.412 4; and the macro F1 score increased by 0.324 1, 0.309 4, and 0.409 5, respectively. The flue-cured tobacco origin discrimination model based on SVM with hybrid kernel algorithm outperformed BPNN, RF, and FDA models when discriminating tobacco samples from imbalanced multi-regions.

     

/

返回文章
返回