• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

基于类别信息优化的潜在语义分析分类技术

A latent semantic analysis classification technique based on optimized categorization information

  • 摘要: 潜在语义索引作为一种公认有效的矩阵降维技术,在关键词检索、文本分类等多种基于统计的机器文本学习任务中被广泛应用.基于专业文献的文本分类任务,结合严格分类体系下同类与不同类文本的特点,以专利文献分类为例,提出了一种基于类别信息优化的潜在语义分析分类技术.该方法根据分类文本各类别的特征信息,将原始文档分解为多种伪文档,强化不同分类的专属特征出现频率,进而优化构建潜在语义空间,提升模型分类性能.实验结果证明,专利文本分类任务结合该方法时,可以有效地提高分类的准确性.

     

    Abstract: As an effective method in the way of dimensionality reduction, latent semantic analysis( LSA) has been widely applied to many text learning missions, such as information retrieval and text categorization. Based on professional literature text classification tasks, features of text from same and different categories were analyzed under a strict classification system, patent documents classification was taken as an example, an optimized LSA classification technique was purposed based on categorization information. Utilizing features information from different category text, the technique divided original documents into a variety of fake documents, strengthens occurrence frequency of exclusive features from different categories, thus building optimized latent semantic space and improving the performance of the classification model. The experimental result shows that the method effectively improves categorization precision when applied to text categorization.

     

/

返回文章
返回