基于自适应子空间学习的迭代文本聚类方法

Iteration text clustering method based on self-adaptation subspace study

Abstract

本发明公开了一种基于自适应子空间学习的迭代文本聚类方法,包括以下步骤:(1)初始化:将文本语料表示成文本向量空间,采用仿射传播聚类方法产生初始K个聚类,所有文本的聚类类别表示为初始类归属指示矩阵。(2)子空间投影与聚类之间的迭代:将初始类归属指示矩阵作为先验知识,以最大化平均邻域边缘为目标求解子空间投影矩阵,将文本向量空间投影到子空间,并在子空间中采用仿射传播聚类方法产生K个聚类,从而更新类归属指示矩阵;基于子空间投影矩阵和类归属指示矩阵计算收敛函数,直到函数收敛,退出迭代,完成文本聚类。本发明对文本数据的大小和分布无限制,子空间求解和聚类被融合到统一框架下,通过迭代的策略取得全局最优的聚类结果。
The invention discloses an iteration text clustering method based on self-adaptation subspace study. The method includes the following steps: (1) initiation: text linguistic data is expressed as a text vector space, initial K clusters are generated through an affine propagation clustering method, and all text clustering categories are expressed as an initial category affiliation indication matrix; and (2) iteration between the subspace projection and the clusters: the initial category affiliation indication matrix is used as prior knowledge, a maximum average neighborhood edge is used as a target to solve a subspace projection matrix, the text vector space is projected to a subspace, K clusters are generated through the affine propagation clustering method in the subspace, and a category affiliation indication matrix is updated; and a convergent function is calculated based on the subspace projection matrix and the category affiliation indication matrix till the function is converged, iteration exits, and text clustering is finished. The iteration text clustering method does not limit the capacity and distribution of text data, subspace solution and clusters are fused under a uniform frame, and an overall optimal clustering result is obtained through an iteration strategy.

Claims

Description

Topics

Download Full PDF Version (Non-Commercial Use)

Patent Citations (2)

    Publication numberPublication dateAssigneeTitle
    CN-102214181-AOctober 12, 2011无锡科利德斯科技有限公司Fuzzy evolution calculation-based text clustering method
    CN-102332012-AJanuary 25, 2012南方报业传媒集团Chinese text sorting method based on correlation study between sorts

NO-Patent Citations (1)

    Title
    TAO LI ETC.: "Document Clustering via Adaptive Subspace Iteration", 《SIGIR 2004》, 29 July 2004 (2004-07-29)

Cited By (2)

    Publication numberPublication dateAssigneeTitle
    CN-103886072-BAugust 24, 2016河南理工大学煤矿搜索引擎中检索结果聚类系统
    CN-104573710-AApril 29, 2015北京交通大学一种基于潜在空间平滑自表征的子空间聚类方法