
 2022-05-01 09:05


摘 要








Emotion Recognition is a significant part of affective computing. It makes computers recognize people’s emotion through external emotional features they can perceive to understand and express humans’ emotion better so that we can achieve effective and harmonious human-computer interaction. In recent years, multi-modal emotion recognition becomes research focus. Multi-modal emotion recognition can integrate more emotional information than single-modal emotion recognition. Multi-modal emotion recognition can realize mutual benefits of all kinds of emotional information to get better representative for emotional states than single emotional representative of single-modal emotion recognition. Facial expressions and audio features are the most obvious and easiest information to analyze. Thus this paper focuses on emotion recognition on facial expressions and audio features. Specific contents are as follows:

  1. Audio and visual feature Extraction

As for audio feature extraction, we extract INTERSPEECH 2010 Paralinguistic Challenge feature set by the openSMILE toolbox, feature dimension of which is 1582. With regard to visual feature extraction, dynamic 3-dimensional geometric features are extracted. To be mentioned, we use slow feature analysis algorithm to get emotional peek frame and then get dynamic features of fixed length.

  1. Classifiers for emotion recognition

In this paper, we discuss separate and joint learning. About separate learning, first we use auto-encoders for feature learning and feature reduction. Then, we update the neutral network by supervised learning. As for joint learning, reconstruction error and classification error are taken into consideration at the same time while training the network.

  1. Bio-modal fusion methods

First, we research on single emotion recognition on either audio or audio features. Then, we use an adaptive weighted method for decision-level fusion to get results after fusion.

  1. Experiments on IEMOCAP

We do experiments on IEMOCAP. we select reliable samples in improvised data, extract audio and visual emotional features and make classification. Results of tests on single-modal emotion recognition show that speech emotion recognition has a good performance on recognizing negative emotion while emotion recognition on facial expressions has a good performance on positive emotion. Results of tests on multi-modal emotion recognition show that correct recognition rate of bio-modal emotion recognition is 10% higher than single-modal emotion recognition.

KEY WORDS: Bio-modal emotion recognition, speech emotion recognition, facial expression recognition, 3-dimensional dynamic geometric feature extraction, auto-encoder, decision-level fusion

目 录

摘 要 I

Abstract II

第一章 绪论 1

1.1 选题背景和意义 1

1.2 国内外研究现状 1

1.2.1 语音情感识别的国内外研究现状 1

1.2.2 人脸表情识别的国内外研究现状 3

1.2.3 多模态情感识别的国内外研究现状 5

1.3 本文研究内容及组织结构 5

1.3.1 研究内容 5

1.3.2 组织结构 6

第二章 双模情感识别综述 8

2.1 双模情感识别系统框架 8

2.2 情感的定义与分类 8

2.3 双模情感数据库 9

2.4 特征提取方法 11

2.4.1 语音特征提取 11

2.4.2 表情特征提取 11

2.5 情感识别分类方法 12

2.6 双模态融合方法 13

2.7 本章小结 13

第三章 基于语音和图像的双模态情感识别特征提取 14

3.1 语音特征提取 14

3.1.1 语音预处理 14

3.1.2 常用语音特征和语音特征集 15

3.1.3 openSMILE提取语音特征 17

3.2 人脸表情特征提取 18

3.2.1 人脸表情数据集 18

3.2.2 人脸表情数据预处理 19

3.2.3 峰值表情自动检测 20

3.2.4 动态特征提取 22

3.3 本章小结 23

第四章 基于自动编码器的深度学习 24

4.1 BP神经网络 24

4.2 自动编码器和Softmax分类器 29

4.2.1 自动编码器 29

4.2.2 Softmax分类 31

4.3 基于自动编码器的两阶段分离学习 32


您需要先支付 80元 才能查看全部内容!立即支付
