Abstract:Aiming at the problems in traditional frequent item sets computing, such as low efficiency and large memory consumption, this paper presents a new frequent mining algorithm based on MapReduce parallel computing model. Firstly, in order to compress and transmit data, the data are divided into small pieces. Secondly, the calculation of frequent data distribution is in the load-balanced data nodes, which can improve the efficiency greatly. Finally, the dataset generated by each node are merged. The theoretical analysis and experimental results show that the algorithm is effective and feasible for dealing with the frequent item sets of data flow in parallel processing.
朱付保,白庆春,汤萌萌,朱颢东. 基于MapReduce的数据流频繁项集挖掘算法[J]. 华中师范大学学报(自然科学版), 2017, 51(4): 429-434.
ZHU Fubao,BAI Qingchun,TANG Mengmeng,ZHU Haodong. An algorithm for mining frequent item sets from data streams based on MapReduce. journal1, 2017, 51(4): 429-434.