Abstract:
In order to obtain feedback from smokers and gain insights into factors affecting their cigarette consumption behavior, a large-scale multi-sentiment consumers' evaluation dataset was constructed, and the resultant smoker evaluation sentiment analysis model, ECBHA, was proposed by integrating key multi-level features. Based on this model, an intelligent data mining system for typical opinions of smokers was developed. The ECBHA model used a pre-trained model, Enhanced Representation through Knowledge Integration (ERNIE), to generate dynamic word vectors with contextual information, and extracted local and global features through convolutional neural network (CNN) and bi-directional long short-term memory neural network (BiLSTM). A hierarchical attention network (HAN) was employed to extract key features for sentiment judgment at the word and sentence levels used by smokers. Experimental results based on smokers' evaluation dataset showed that ECBHA model outperformed nine baseline methods of machine learning or deep learning, including Support Vector Machine (SVM), Multinomial Logistic Regression (LG), and Text Convolutional Neural Network (TextCNN) across all major indexes. Among which, the overall classification accuracy of ECBHA model was 85.29%, and the
F1 scores for positive, neutral and negative sentiment classification were 90.51%, 67.96% and 86.21% respectively, which were 3.28, 2.02, 5.32 and 4.77 percentage points higher than those of ERNIE, the baseline method displayed with best performance. The intelligent mining system for typical opinions of smokers built on the ECBHA model enabled functions such as generation of cigarette product portraits and comparison of sentiment analysis results, assisting manufacturers in swiftly understanding smokers' attitudes towards cigarette products and providing support for product research and development as well as precise marketing.