基于Web日志的实时推荐模型研究

2020-02-20 22:57

工程硕士学位论文

基于Web日志的实时推荐模型研究

A Real-Time Recommendation Model

Based on Web Log

摘 要

随着WWW的广泛应用,在WWW服务器上聚集了大量的Web日志,这些数据在很多领域中都至关重要,如Web站点的系统设计、商业市场策略和网站个性化等等。实时推荐就是数据挖掘在Web日志数据中的一个应用,其目的是方便用户对网站的访问,可以预测用户的喜好,并为电子商务提供决策依据。

本文在概述Web挖掘以及Web日志挖掘的相关领域的发展和技术及其理论基础上,详细研究了Web日志挖掘的预处理技术,使用一个基于页面访问时间阈值与会话重组的会话识别算法,并通过实际的Web日志数据加以验证;在数据预处理的基础上,利用模糊聚类技术,根据用户对Web页面的浏览情况分别建立Web用户和Web页面的模糊集,然后用最大-最小法的模糊相似性度量构造模糊相似矩阵,并由此构造了模糊动态聚类算法;在上述工作的基础上,设计了一个基于Web日志挖掘的实时推荐模型系统RTRS(real-time recommendation system)。它分为离线部件和在线部件两部分,其中,前者进行数据收集、预处理、用户和页面的模糊聚类;后者根据用户当前访问行为生成推荐集,它采用的推荐算法是通过用户的访问记录构造BP树(Brows Pattern Tree),产生频繁访问集,从而生成推荐集,该算法只需扫描数据库一次,得到的频繁序列模式可以满足实时推荐的快速需求。

该论文有图10幅,表7个,参考文献52篇。

关键词:数据挖掘;数据预处理;会话识别;模糊聚类;实时推荐

II

Abstract

With the extensive use of WWW, the WWW server has assembled a large number of Web logs, these data are crucial in many fields, such as the Web site, system design, business marketing strategies and Web site personalization and so on. Real-time recommendation is a Web application of data mining in log data in which aims to facilitate users to access the site, you can predict the user's preferences, and offer basis.for making e-commerce decision.

On the basis of an overview of related areas of development and technology as well as theoretical basis of Web mining and Web log mining, this paper made a detailed study of the Web log mining preprocessing techniques, using a threshold based on page access time of the session with a session re-recognition algorithm, inspecting and verifying through the actual Web log data, on the basis of the data preprocessing, using fuzzy clustering technique, establishing Web users and Web pages of the fuzzy set based on user browsing of Web pages, then using the maximum - minimization of fuzzy similarity measure to construct fuzzy similar matrix so as to construct fuzzy dynamic clustering algorithm. Based on the above-mentioned work, the paper designed a real-time recommendation model system RTRS (real-time recommendation system) based on Web-log mining that is divided into two parts, offline part and online part, in which the former is for data collection, preprocessing, user and page of the fuzzy clustering;and the latter is formed into recommendation set,according to user current access behavior, for the recommended algorithm adopted in it is to construct BP tree (Brows Pattern Tree) through the user's access records and produces frequently accessed set, thus generates the recommendation set. This algorithm scans the database only once, and obtains the frequent sequence pattern to meet the rapid demand for real-time recommendation.

The paper has 10 Figures, 7 Tables and 52 references.

Keywords:data mining; data preprocessing; session identification; fuzzy clustering ;

real-time recommendation;

III

目 录

摘要???????????????????????????????I 目录???????????????????????????????Ⅲ 图清单??????????????????????????Ⅶ 表清单????????????????????????????Ⅷ 变量注释表???????????????????????????Ⅸ 1 绪论??????????????????????????????1 1.1 研究的背景和意义????????????????????????1 1.2 研究的现状???????????????????????????3 1.3 论文研究的主要内容???????????????????????6 1.4 论文的结构安排?????????????????????????6 2 相关知识基础?????????????????????????8 2.1 数据挖掘简介?????????????????????????8 2.2 Web挖掘简介???????????????????????8 2.3 Web挖掘的过程????????????????????????12 2.4 小结????????????????????????????16 3 数据预处理???????????????????????18 3.1 Web日志的形成?????????????????????????18 3.2 基本概念???????????????????????????20 3.3 Web日志数据预处理过程????????????????????21 3.4 试验数据???????????????????????????26 3.5 小结?????????????????????????????27 4 基于模糊矩阵的Web日志的动态聚类??????????????? 29 4.1 常用的聚类算法比较与分析???????????????????29 4.2 模糊聚类??????????????????????????32 4.3利用最大-最小相似性度量的实例?????????????????35 4.4 小结???????????????????????????36 5 基于Web日志的实时推荐模型???????????????????37 5.1 问题的提出??????????????????????????37 5.2 RTRS系统的模型???????????????????????38

IV

5.3 在线模块实时推荐???????????????????????39 5.4 实验结果与分析????????????????????????42 5.5 小结???????????????????????????43 6 总结与展望?????????????????????????44 6.1 总结?????????????????????????????44 6.2 进一步工作展望????????????????????????44 参考文献????????????????????????????46 作者简历????????????????????????????49 论文原创性声明?????????????????????????50 学位论文数据集?????????????????????????51

V


基于Web日志的实时推荐模型研究.doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:“四做四不做”优秀教师

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: