中文文本分类特征提取方法的研究与实现(4)

2020-12-24 16:57

中文文本分类特征提取方法的研究与实现

ABSTRACT

With the development of society, especially the rapid development of network technology, various types of information get an exponential growth. Text classification can manage huge and heterogeneous data effectively. Information retrieval and filtering, which based on text classification, helps people get the required information in the huge data and helps people work more effectively. Text classification techniques have become popular and significant research topic.

This thesis does the detailed study and analysis on key techniques of text classification firstly, then focuses on the study of feature selection and proposes a new feature selection method. Finally, we design and realize the TC system by new method.

① Do analysis on the process and key techniques of TC, and do study on text feature selection methods. We find that negative feature and poor correlation feature effect the quality of selected feature by comparing several common methods which based filter model. Feature selection, this paper proposes a new approach of feature selection for TC, which is based on the strong class correlation and positive class correlation, named SP. SP can eliminate the effect of negative feature and poor correlation feature effectively by selecting positive and strong features, and then get high quality features.

② SP has been applied in designing and realizing the Chinese text classification system (CTCS), we do the overall design of CTCS and detailed design of modules of CTCS. This paper study on Chinese grammar analysis tool package ICTCLAS and Full-text search package Lucene, and then combines ICTCLAS and Lucene to be a solution of realizing CTCS, finally realize the CTCS.

③ We do many comparison experiments on new feature selection method SP and common method, such as DF, CHI.etc. This paper evaluates the result of classification by several classification performance evaluations. The result of experiments indicates the new feature selection method SP can select quality features, construct low- dimensional feature vector and reduce the dimensionality of feature space. SP has a good performance on feature selection in Chinese text classification, reflecting the degree of difference among classes.

Keywords: Text Classification, Feature Dimensionality Reduction,

Feature Selection, Class Positive Correlation, Class Strong Correlation


中文文本分类特征提取方法的研究与实现(4).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:LJY_2机器人的结构组成

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: