Abstract:
To address issues such as difficulties in searching, accessing, and using data related to raw materials of tobacco flavors, a dataset of flavor materials based on multi-source heterogeneous data was created. Published data, including basic information, physicochemical properties, and sensory characteristics were collected. Sensory evaluation and chemical analysis were performed on flavor material specimens to obtain testing data. Heterogeneous data from different origins were processed through entry standardization, structure integration, and annotation. The created dataset includes over 1 000 flavor materials and comprises 10 data modules. Meantime, the "Tobacco Flavor Material Central Database" platform was set up. Main flavor type distribution, olfactory aroma note distribution, the correlations between aroma notes and cigarette flavoring effect were analyzed. The results showed that: 1) The dataset offered data for tobacco flavor blending from multiple dimensions and supported a range of data retrieval routes to adapt to diverse application scenarios. 2) Data analysis revealed the distribution features of tobacco flavor materials, the rules of tobacco flavoring aided by the dataset were basically in consistence with practical experiences. 3) The dataset was accessed more than 15 000 times per year. This research supports the digital transformation of tobacco flavor blending.