基于深度学习的双耳语音分离研究

 2022-05-05 08:05

论文总字数:34455字

摘 要

目前有很多传统的语音分离算法在低信噪比和混响环境下的分离效果不是很理想。本文利用计算听觉场景分析CASA(Computational Auditory Scene Analysis),结合人耳的听觉感知特性,提出了基于深度学习的双耳语音分离算法。本文主要利用卷积神经网络CNN(Convolutional Neural Network)作为训练模型,故提出基于CNN的双耳语音分离算法。

本文的算法结构分为训练阶段和测试阶段。训练时,采用Gammatone滤波器组模拟人耳的听觉感知特性对语音信号分子带,再将其分帧加窗后得到时频单元,对时频单元提取双耳空间特征:耳间互相关函数CCF,耳间时间差ITD、耳间强度差ILD作为卷积神经网络CNN的输入。利用理想二值掩蔽IBM(Ideal Binary Mask)作为掩蔽目标函数来进行模型训练。测试时,分别选取TIMIT语音库的语音信号和在实验室消声室利用KEMAR人工头实际采集的语音信号作为测试语音。本文采用SAR、SDR、SIR、PESQ作为评价指标。实验结果表明此算法优于传统的基于DUET(Degenerate Unmixing Estimation Technique)算法,本文算法在各信噪比和混响环境以及在实测环境下分离语音的评价指标均有明显提升。

关键词:卷积神经网络,语音分离,计算听觉场景分析

Abstract

There are many traditional speech separation algorithms that are not ideal for separation in low SNR and reverberation environments. In this paper, using the Computational Auditory Scene Analysis (CASA), combined with the auditory perception characteristics of the human ear, a binaural speech separation algorithm based on deep learning is proposed. In this paper, we use the Convolutional Neural Network (CNN) as the training model, so we propose a binaural speech separation algorithm based on CNN.

The algorithm structure of this paper is divided into training period and testing period. During training, the Gammatone filter bank is used to simulate the human ear and the speech signal is divided into bands, and then the frame is windowed to obtain the time-frequency unit, and the time-frequency unit extracts the binaural spatial feature: CCF, ITD, ILD as the input of CNN. Model training is performed using the ideal binary masking Ideal Binary Mask (IBM) as a masking objective function. During the test, the speech signal of the TIMIT speech library and the speech signal actually acquired by the KEMAR artificial head in the laboratory anechoic chamber are selected as the test speech. In this paper, SAR, SDR, SIR, PESQ are used as evaluation indexes. The experimental results show that the proposed algorithm is superior to the traditional Degenerate Unmixing Estimation Technology (DUET) algorithm. The proposed algorithm has improved in the noise and reverberation environment and in the actual measured environment.

Keyword: Convolutional Neural Network, Speech Separation, Computational Auditory Scene Analysis

目 录

摘要 I

Abstract II

目 录 III

第一章 绪论 1

1.1研究背景及意义 1

1.2研究历史及现状 1

1.3本文的主要研究内容 2

1.3.1基于CNN的双耳语音分离 2

1.4论文组织结构 3

第二章 双耳语音分离方法概述 4

2.1人耳的生理结构及听觉特性 4

2.2听觉特征线索 5

2.2.1 Gammatone滤波器组 5

2.2.2双耳空间特征 5

2.3 掩蔽目标 7

2.3.1 理想二值掩蔽IBM 7

2.3.2 理想比值掩蔽IRM 8

2.4传统语音分离算法 8

2.4.1基于DUET的语音分离算法 8

2.5语音分离算法的评价指标 9

2.6 本章小结 9

第三章 深度学习理论概述 10

3.1感知机和多层感知机 10

3.2深度神经网络DNN 11

3.2.1 DNN结构及基本原理 11

3.2.2 模型技巧 14

3.3 卷积神经网络CNN 16

3.3.1 CNN结构及基本原理 16

3.3.2 CNN特点 17

3.4本章小结 17

第四章 基于CNN的双耳语音分离研究 18

4.1 算法流程 18

4.2 双耳空间特征参数提取 18

4.2.1 预处理 18

4.2.2双耳子带空间特征参数提取 20

4.3卷积神经网络CNN 21

4.3.1 CNN结构 21

4.3.2 CNN训练算法 22

4.3.3 优化方法 23

4.3.4 正则化方法 24

4.4掩蔽目标函数 24

4.5语音重构 25

4.6 双耳语音分离实验 25

4.6.1实验数据 26

4.6.2 实验结果与分析 28

4.7 本章小结 33

第五章 总结与展望 35

5.1 总结 35

5.2展望 35

参考文献 36

致 谢 38

第一章 绪论

剩余内容已隐藏,请支付后下载全文,论文总字数:34455字

您需要先支付 80元 才能查看全部内容!立即支付

该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;