
 2021-12-23 08:12


摘 要







As an important part of the Internet, the establishment of the campus network has produced a new form of campus, which is called information campus and provided a new technology environment for the development of the campus culture. The network has become an important platform for campus culture exchange. Teachers and students in the campus network state their views and express their emotions, forming public opinion in campus with special characteristics, which has a profound impact on the construction of the campus culture. Therefore, the implementation of opinion mining on campus hot topics and the real-time grasp of public opinion trends have a significant effect on guiding campus value trends and building a harmonious campus cultural environment.

To provide basis for the group’ research, this paper used campus BBS as research carrier, by considering the characteristics of writing and emotion expressing in college students’ postings, compared the applicability of different text feature selection algorithms and machine learning methods in emotional analysis of public opinion in campus. In the text pro-processing module, considering the impact of emotional words speech and the constructing of user dictionary and stop words dictionary, we achieved a more pure, comprehensive Chinese participle. In the text modeling module, we compared the effect of four different feature selection methods on text tendency analysis, including mutual information, information gain, chi-square statistic and weighted log likelihood ratio. Meanwhile, the impact of different dimensions of the feature vector was also in consideration. In the classifier training module, we compared the classification results of four different machine learning methods, including Naive Bayes, Support Vector Machine, K-Nearest Neighbor and Decision Tree. Finally, we evaluated the emotional tendency analysis algorithm on three aspects, accuracy, recall and f-measure.

KEY WORDS: campus public opinion, emotional tendency analysis, feature selection, machine learning, text classification

目 录

摘 要 I

Abstract II

第一章 绪论 1

1.1 研究背景 1

1.2 文本倾向性分析研究现状 1

1.3 研究内容与研究目标 2

1.4 论文的组织与结构 2

第二章 相关理论与关键技术 4

2.1 文本倾向性分析的概念 4

2.2 文本倾向性分析的主要方法 4

2.3 文本建模 4

2.3.1 文本表示模型 5

2.3.2 文本特征选择方法 6

2.4 文本倾向性分析算法 7

2.4.1 朴素贝叶斯 7

2.4.2 支持向量机 8

2.4.3 K-最近邻 9

2.4.4 决策树 10

2.5 本章小结 10

第三章 基于机器学习的文本倾向性分析 11

3.1 文本爬取 11

3.1.1 定义Item 11

3.1.2 编写Spider 12

3.1.3 设置Item Pipeline 12

3.2 文本预处理 12

3.2.1 断句 13

3.2.2 词典构建 13

3.2.3 词性标注 14

3.2.4 中文分词 15

3.3 文本特征选择 15

3.4 情感倾向性分析 16

3.5 本章小结 17

第四章 实验结果及分析 18

4.1 实验环境 18

4.2 实验数据介绍 18

4.3 实验工具介绍 19

4.4 实验性能评估指标 20

4.5 实验设计与结果分析 20

4.5.1特征维度的选择对情感倾向性分析的影响 21

4.5.2 不同的特征选择方式和机器学习方法对情感倾向性分析的影响 21

4.6 本章小结 23

第五章 总结与展望 24

5.1 工作总结 24

5.2 后续工作展望 24

致谢 25

参考文献(References): 26

  1. 绪论

1.1 研究背景



您需要先支付 80元 才能查看全部内容!立即支付
