Design and implementation of DW2.0-based massive tobacco data analysis system
-
Abstract
To solve the problems associated with traditional data warehouse system in data storage, processing and presentation, a massive tobacco data analysis system was designed by taking the practical application of China Tobacco Jiangsu Industrial Limited Corporation into account. An integrated and coordinated data warehouse was configured via referring to DW2.0 theory and big data application technology, introducing distributive processing architecture and fusing traditional data warehouse with Hadoop. The system's high response ability was achieved by data lifecycle management. Unstructured data were processed by Hadoop HBase, and the parallel computing framework of Hadoop MapReduce was used as the communication layer to schedule and coordinate the computing and communication at nodes in clusters. The test results indicated that comparing with traditional methods, the response time of the new system was promoted by 30% and 80% when the magnitude of data reached 100 million and 1 billion, respectively. It effectively improved the application level of data warehouse system.
-
-