基于MapReduce的数据流频繁项集挖掘算法

摘要
图/表
参考文献
相关文章 (10)

全文: PDF (4423 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对传统数据流频繁项集计算中效率低、内存消耗大等问题，本文采用并行计算的思想设计了一种基于MapReduce的数据流频繁项集挖掘算法，首先，对进行数据分块压缩和传输，其次，将数据频繁项的计算分布在负载均衡的数据节点，可以有效保证数据的执行效率.最后通过一次调度处理合并各个节点产生的频繁项集并进行合并.理论分析和实验对比结果均表明，该算法对于并行处理数据流频繁项集的统计问题是有效可行的.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	朱付保
	白庆春
	汤萌萌
	朱颢东

关键词 ： MapReduce, 频繁项集, 数据流, 并行计算, 数据挖掘

Abstract：Aiming at the problems in traditional frequent item sets computing， such as low efficiency and large memory consumption， this paper presents a new frequent mining algorithm based on MapReduce parallel computing model. Firstly， in order to compress and transmit data， the data are divided into small pieces. Secondly， the calculation of frequent data distribution is in the load-balanced data nodes， which can improve the efficiency greatly. Finally， the dataset generated by each node are merged. The theoretical analysis and experimental results show that the algorithm is effective and feasible for dealing with the frequent item sets of data flow in parallel processing.

Key words： MapReduce item sets data streams parallel computation data mining

收稿日期: 2017-07-07

引用本文:

朱付保,白庆春,汤萌萌,朱颢东. 基于MapReduce的数据流频繁项集挖掘算法[J]. 华中师范大学学报(自然科学版), 2017, 51(4): 429-434.
ZHU Fubao,BAI Qingchun,TANG Mengmeng,ZHU Haodong. An algorithm for mining frequent item sets from data streams based on MapReduce. journal1, 2017, 51(4): 429-434.

链接本文:

http://journal.ccnu.edu.cn/zk//CN/ 或 http://journal.ccnu.edu.cn/zk//CN/Y2017/V51/I4/429