陈季梦, 陈佳俊, 刘杰, 黄亚楼, 王嫄, 冯霞. 基于结构相似度的大规模社交网络聚类算法[J]. 电子与信息学报, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 引用本文: 陈季梦, 陈佳俊, 刘杰, 黄亚楼, 王嫄, 冯霞. 基于结构相似度的大规模社交网络聚类算法[J]. 电子与信息学报, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 Chen Ji-Meng, Chen Jia-Jun, Liu Jie, Huang Ya-Lou, Wang Yuan, Feng Xia. Clustering Algorithms for Large-scale Social Networks Based on Structural Similarity[J]. Journal of Electronics & Information Technology, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 Citation: Chen Ji-Meng, Chen Jia-Jun, Liu Jie, Huang Ya-Lou, Wang Yuan, Feng Xia. Clustering Algorithms for Large-scale Social Networks Based on Structural Similarity[J]. Journal of Electronics & Information Technology , 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 陈季梦, 陈佳俊, 刘杰, 黄亚楼, 王嫄, 冯霞. 基于结构相似度的大规模社交网络聚类算法[J]. 电子与信息学报, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 引用本文: 陈季梦, 陈佳俊, 刘杰, 黄亚楼, 王嫄, 冯霞. 基于结构相似度的大规模社交网络聚类算法[J]. 电子与信息学报, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 Chen Ji-Meng, Chen Jia-Jun, Liu Jie, Huang Ya-Lou, Wang Yuan, Feng Xia. Clustering Algorithms for Large-scale Social Networks Based on Structural Similarity[J]. Journal of Electronics & Information Technology, 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 Citation: Chen Ji-Meng, Chen Jia-Jun, Liu Jie, Huang Ya-Lou, Wang Yuan, Feng Xia. Clustering Algorithms for Large-scale Social Networks Based on Structural Similarity[J]. Journal of Electronics & Information Technology , 2015, 37(2): 449-454. doi: 10.11999/JEIT140512 针对社交网络的有向交互性和大规模特性,该文提出一种基于结构相似度的有向网络聚类算法(DirSCAN),以及相应的分布式并行算法(PDirSCAN)。考虑社交网络中节点间的有向交互性,将行为结构相似的节点聚集起来,并进行节点功能分析。针对社交网络规模巨大的特点,提出MapReduce框架下的分布式并行聚类算法,在确保聚类结果一致的前提下,提高处理性能。大量真实数据集上的实验结果表明,DirSCAN比无向网络聚类算法(SCAN)在F1上可提高2.34%的性能,并行算法PDirSCAN比DirSCAN运行速度提升1.67倍,能够有效处理大规模的有向网络聚类问题。 社交网络 /  有向网络聚类 /  并行算法 /  MapReduce Abstract: To cluster the directed and large-scale social networks, a Structural Clustering Algorithm for Directed Networks (DirSCAN) and a corresponding Parallel algorithm (PDirSCAN) are proposed. Considering oriented behavioral relation between two vertices, DirSCAN is constructed based on action structural similarity and function analysis. To meet the need of large-scale social network analysis, a lossless PDirSCAN based on MapReduce distributed parallel architecture is designed to improve the processing performance. A large number of experimental results on real-world network datasets show that DirSCAN improves performance of SCAN up to 2.34% on F1, PDirSCAN runs 1.67 times faster than DirSCAN. Key words: Social networks /  Directed network clustering /  Parallel algorithm /  MapReduce

中国科学院电子学研究所, 北京市2702信箱, 邮编:100190

电话:010-58887066 传真:021-64253812 Email: jeit@mail.ie.ac.cn

北京仁和汇智信息技术有限公司