决策树算法的改进和应用研究

 2022-04-15 07:04

论文总字数:31043字

摘 要

随着社会的发展和进步,客户价值分析成为企业客户关系管理的关键,决策树算法中的ID3算法和FuzzyID3算法则是客户价值分析的重要工具之一。然而这两种算法存在着计算复杂度高和建模时间长的问题,因此,本文研究决策树相关算法的改进并以客户价值分析为应用对象,确定企业潜力客户群体,为企业客户关系管理提供科学的决策支持,具有现实意义和应用价值。

本文在系统综述国内外相关研究的基础上,比对分析提出ID3算法和FuzzyID3算法具有简单易懂、计算精简和便于形成分类规则等优点,并将其作为本文的基础算法。在此基础上,引入粗糙集中的决策协调度概念,代替信息增益作为分类的决策树值,有效减少了建模时间和计算复杂度,成功改进了两种算法。最后,本文将改进后的算法应用于A公司客户价值分析中,利用改进前后共四种算法分别对预处理后的数据进行分类,建立客户潜力模型。实际应用表明改进后的算法不仅可以维持原先算法的优势,还可以有效增加测试精度。除此之外,由于本文处理的数据是不连续的,因此在数据的预处理阶段插入了数据模糊化处理,通过比较模糊化前后的决策树模型发现数据模糊化可以有效增加模型精度,也证实了数据模糊化在处理该类数据的重要性。

关键词:数据挖掘,客户价值分析,决策树,潜力客户

ABSTRACT

With the development and progress of society, customer value analysis has become the key of enterprise customer relationship management. ID3 algorithm and Fuzzy ID3 algorithm in decision tree algorithm are one of the important tools of customer value analysis. However, these two algorithms have the problems of high computational complexity and long modeling time. Therefore, this paper studies the improvement of decision tree related algorithms and takes customer value analysis as the application object to determine the potential customer groups of enterprises and provide scientific decision support for enterprise customer relationship management, which has practical significance and application value.

Based on a systematic review of the relevant research at home and abroad, this paper compares and analyses ID3 algorithm and Fuzzy ID3 algorithm, which have the advantages of simplicity, easy formation of classification rules, and takes them as the basic algorithm of this paper. On this basis, the concept of Decision Coordination Degree in rough set is introduced to replace information gain as the decision tree value of classification, which effectively reduces the modeling time and computational complexity, and successfully improves the two algorithms. Finally, this paper applies the improved algorithm to the customer value analysis of A company. Four kinds of improved algorithms are used to classify the pre-processed data and establish the customer potential model. Practical application shows that the improved algorithm can not only maintain the advantages of the original algorithm, but also effectively increase the test accuracy. In addition, because the data processed in this paper are not continuous, data fuzzification is inserted in the data pre-processing stage. By comparing the decision tree models before and after fuzzification, it is found that data fuzzification can effectively increase the accuracy of the model, and also confirms the importance of data fuzzification in processing such data.

KEY WORDS:data mining, customer value analysis, Decision Tree, potential customer

目 录

第一章 绪论 1

1.1研究背景和意义 1

1.2研究思路和方法 1

1.3 研究框架 2

第二章 国内外研究现状 2

2.1 数据挖掘研究现状 2

2.2 决策树算法研究现状 3

2.3 决策树算法应用研究现状 3

第三章 基于粗糙集理论的算法改进 4

3.1 ID3算法介绍 4

3.1.1 信息论 4

3.1.1.1信息熵 4

3.1.1.2信息增益 5

3.1.1.3信息增益率 5

3.1.2 ID3算法 5

3.2 FuzzyID3算法介绍 6

3.2.1 模糊集 6

3.2.2 Fuzzy ID3算法 7

3.3 ID3算法和FuzzyID3算法对比 7

3.4 ID3算法的改进 8

3.4.1 ID3算法分析演示 10

3.4.2 ID3算法的改进 13

3.4.2.1 决策系统 14

3.4.2.2 决策协调度 14

3.4.2.3 改进后的ID3算法建树过程 14

3.4.2.4 具体过程展示 14

3.4.3 比对分析 15

3.5 FuzzyID3算法的改进 15

3.5.1 FuzzyID3算法分析演示 15

3.5.1.1 数据模糊化 15

3.5.1.2 采用FuzzyID3算法建树 17

3.5.2 改进后的FuzzyID3算法建树过程 19

3.5.3 比对分析 19

3.6 本章小结 19

第四章 改进算法的应用研究——以企业客户价值分析为例 20

4.1 数据预处理 20

4.1.1 数据准备 20

4.1.2 数据集成 20

4.1.3 属性选择 21

4.1.4 数据清洗 21

4.1.5 数据转换 21

4.2 清晰(经典)决策树算法建树 22

4.2.1 ID3算法建树 22

4.2.2 改进后ID3算法建树 24

4.3 模糊决策树算法建树 25

4.3.1 数据模糊化 25

4.3.2 FuzzyID3算法建树 26

4.3.3 改进后FuzzyID3算法建树 27

4.4 模型评估 28

4.5 分类规则 30

4.6 潜力客户排序 31

4.7 模型解释 32

4.8 本章小节 32

第五章 总结与展望 33

5.1 总结 33

5.2 展望 33

参考文献 35

致 谢 37

剩余内容已隐藏,请支付后下载全文,论文总字数:31043字

您需要先支付 80元 才能查看全部内容!立即支付

该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;