Contents
Abstract………………………………………………………………………………I Contents………………………………………………………………………………V List of Figures………………………………………………………………………Ⅶ List of Tables………………………………………………………………………Ⅷ List of Variables……………………………………………………………………Ⅸ 1 Introduction…………………………………………………………………1 1.1 Research Background and Significance…………………………………………1 1.2 Present Research Situation………………………………………………………3 1.3 The Main Contents in the Paper ……………………………………………6 1.4 Structure Arrangment in the Paper………………………………………………6 2 Related Knowledge Foundation……………………………………………8 2.1 Brief Introduction on Data Mining………………………………………………8 2.2 Brief Introduction on Web Data Mining…………………………………………8 2.3 The Web Mining Process…………………………………………………………12 2.4 Summary…………………………………………………………………………16 3 Data Preprocessing………………………………………………………18 3.1 Web Log Formation ……………………………………………………………18 3.2 The Basic Concept ………………………………………………………………20 3.3 Web Log Data Preprocessing……………………………………………………21 3.4 Test Data …………………………………………………………………………26 3.5 Summary………………………………………………………………………27 4 Dynamic Cluster Base on Fuzzy Similarity Matrix Web Log…………… 29 4.1 Comparison and Analysis of Commonly Used Clustering Algorithms………29 4.2 Fuzzy Clustering ………………………………………………………………32
4.3 Real Examples Using the Maximum - Minimization of Fuzzy Similarity Measure…………………………………………………………………………35 4.4 Summary ………………………………………………………………………36 5 Model Base on Web Log Real-Time Recommendation System ………… 37 5.1 Set up Problem…………………………………………………………………37
VI
5.2 RTRS System Model……………………………………………………………38 5.3 Real-Time Recommented Online Model………………………………………39 5.4 The Experimental Results and Analysis…………………………………………42 5.5 Summary ………………………………………………………………………43 6 Conclusions and Prospect …………………………………………………44 6.1 Conclusion………………………………………………………………………44 6.2 Further Work Prospect…………………………………………………………44 References………………………………………………………………………46 Author’s Resume…………………………………………………………………49 Declaration of Thesis Originality…………………………………………………50 ThesisData Collection………………………………………………………51
VII
图清单
图序号 图2-1 Figure 2-1 图2-2 Figure 2-2 图3-1 Figure 3-1 图3-2 Figure 3-2 图3-3 Figure 3-3 图4-1 Figure 4-1 图4-2 Figure 4-2 图5-1 Figure 5-1 图5-2 Figure 5-2 图5-3 Figure 5-3 图名称 Web挖掘分类 Categories of Web data mining Web访问信息挖掘的过程 Process of web usage mining 原始日志 Primitive log 数据预处理过程 The process of data preprocessing 数据清理各部分所占比例 The various parts proportion of data cleaning Web模糊聚类过程模型 Web fuzzy clustering process model 网站拓扑结构图 Site topology structure map RTRS系统的模型 RTRS system model BP树的建立 Establishment of BP tree BP挖掘的实例 BP mining examples 页码 10 10 12 12 20 20 21 21 22 22 35 35 35 35 39 39 42 42 42 42
VIII
表清单
表序号 表2-1 Table 2-1 表3-1 Table 3-1 表3-2 Table 3-2 表3-3 Table 3-3 表3-4 Table 3-4 表4-1 Table 4-1 表5-1 Table 5-1 表名称 Web挖掘的三种方法比较 Three methods comparison in Web mining Web日志记录的主要信息 The main information in Web log records 常用的用户识别方法 Recognition methods for common user 原始日志记录 Primitive log record 经过数据预处理后的用户会话记录 User session record after data pre-processing 聚类算法的分类 Cluster algorithm classification 用户会话 User session 页码 12 12 19 19 24 24 27 27 27 27 29 29 42 42
IX
变量注释表
L Log rij R R′ R\S U δ θ λ ξ 用户访问日志集合 日志文件 相似系数 日志记录 模糊关联度矩阵 模糊相似矩阵 会话集合 用户集合 会话时间延时 会话时间阈值 模糊相似阈值 最小支持度阈值
X