• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

一种解决稀疏数据和冷启动问题的组合推荐方法

A novel combination recommendation method for solving sparse and cold start problems

  • 摘要: 针对传统推荐算法所面临的冷启动与稀疏数据问题以及现有ARM(association rule mining)算法大多用于购物篮顾客行为分析,并不适用于特定用户推荐业务且效率较低等现象,提出一种基于相似度的关联推荐模式,实现一种新的结合关联规则推荐与协同过滤推荐方法.采用基于指定后件项的关联规则推荐,直接对目标用户和目标项目进行关联规则挖掘,并利用兴趣因子对活跃用户(或项目)与非活跃用户(或项目)进行权值均衡,以加权方法推荐最优解(规则).同时,采用相似度测量方法,过滤低相似度的项目,为用户推荐既有高评分又具有较高相似度的项目集合.最后,结合规则推荐与CF(collaborative filter)推荐形成最终推荐结果,实现基于用户(或项目)的协同过滤推荐.在MovieLens数据集上的实验结果表明,同已有成果相比本文方法能够更好地处理稀疏数据和冷启动问题,推荐质量明显提高.

     

    Abstract: Considering the problems resulting from the traditional recommended approaches which are powerless to address the well-known cold-start and data sparseness, and the fact that most currently existing association rule mining(ARM) algorithms were designed with basket-oriented analysis in mind, which are inefficient for collaborative recommendation because they mine many rules that are not relevant to a given user, this paper introduces a novel association recommendation method based on combination similarity, and proposes a solution to the cold start problem by combining association rules and collaborative filtering techniques. The proposed method focuses on mining rules for only one target user or target item at a time, while utilizing the interest factor to balance the weight between active users (or items) and non active users (or items), which in order to recommend an optimal solution (rules) via weighted method. To recommend both high ratings and collection of items with high similarity, the similarity measurement method was used to filter low similarity items, and to provide the final results by combining the association rules and CF recommendation, realizing user-based or item-based collaborative filtering recommendation. Experiments on the MovieLens data set reveals that the results obtained from employing this method has significantly better than the publishecl results and that it is better able to deal with sparse data and cold start problems.

     

/

返回文章
返回