语音唤醒深度神经网络训练与电路设计

 2022-05-14 07:05

论文总字数:30906字

摘 要

随着语音交互在智能化时代的重要性愈加增强,语音唤醒作为语音交互的基石,其重要性也开始增强。语音唤醒要求系统不间断运行,并且快速准确地识别出关键词,但由于当前只能设备受到供电续航的局限性,因此低功耗语音唤醒成为研究的热点。基于此方向,本文设计了一种基于深度神经网络的极低功耗语音唤醒系统。

本文将系统分为两大模块,分别实现语音特征提取和语音特征识别。语音特征提取模块是以梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient)分析为基础,使用流水线型傅里叶变换算法(R22SDF),并分级精度量化实现对特征的提取。语音特征识别模块是基于深度可分离卷积神经网络模型(Depthwise Separable Convolution Neural Network),利用二值化、三值化等优化方法缩减权重参数存储大小,并修改模型网络结构,最终训练得到一个实现二分类任务的轻量型深度可分离卷积神经网络模型。

本文以优化过后的模型算法为基础,分模块进行电路设计,最后实现二分类任务的识别功能,准确率可达到95.6%,以TSMC28nm工艺仿真下系统整体功耗只有0.4,达到极低功耗要求。

关键词:语音唤醒、特征提取、R22SDF算法、深度可分离卷积神经网络

Abstract

As the importance of voice interaction in the era of intelligence has increased, voice awakening as the cornerstone of voice interaction has become more important. Voice wake-up requires the system to run uninterruptedly and quickly and accurately identify the keywords, but since the current device can only be limited by the power supply life, low-power speech wake-up has become a research hotspot. Based on this direction, this paper designs a very low power speech wake-up system based on deep neural network.

In this paper, the system is divided into two modules, which respectively implement speech feature extraction and keyword recognition. The speech feature extraction module is based on the analysis of Mel Frequency Cepstrum Coefficient, using the pipelined Fourier transform algorithm (R22SDF), and hierarchical feature quantization to achieve feature extraction. The keyword recognition module is based on the Depthwise Separable Convolution Neural Network. It uses the optimization methods such as binary and ternary to reduce the storage size of the weight parameters and modify the model network structure. A lightweight deep separable convolutional neural network model for binary tasking.

Based on the optimized model algorithm, the circuit is designed on sub-modules, and finally the identification function of the two-class task will be realized. The accuracy rate can reach 95.6%. Under the TSMC28nm process simulation, the overall power consumption of the system is only 0.4μw, achieving extremely low power consumption.

KEY WORDS: Voice wakeup, feature extraction, R22SDF algorithm, Depthwise Separable Convolution Neural Networks

目 录

摘要 I

Abstract II

第一章 绪论 1

1.1研究背景 1

1.2 国内外研究现状 1

1.2.1 语音特征提取 1

1.2.2 语音特征识别 2

1.3 论文研究内容 3

第二章 系统的构建及评估方法 4

2.1 语音唤醒深度神经网络系统 4

2.1.1 实验平台 4

2.1.2 实验数据集 4

2.2 MFCC特征提取 4

2.2.1 预加重 4

2.2.2 分帧加窗 4

2.2.3 FFT 5

2.2.4 MEL滤波 7

2.2.5 DCT变换和动态特征提取 9

2.3 神经网络语音特征识别 9

2.3.1 深度可分离卷积模型构建 9

2.3.2 深度可分离卷积模型优化 11

2.3.3 深度可分离卷积模型训练 13

2.3.4模型评价指标 13

2.4 本章小结 15

第三章 软件仿真结果 16

3.1特征提取仿真测试 16

3.2神经网络仿真测试 17

3.2.1参数量化-优化 17

3.2.2 调整DS-CNN结构-优化 18

3.2.3输入量化-优化 19

3.3本章总结 19

第四章 硬件电路设计 20

4.1特征提取电路设计 20

4.1.1预加重模块 20

4.1.2分帧加窗模块 20

4.1.3 FFT模块 21

4.1.4 Mel滤波模块 22

4.1.5 取对数和DCT模块 23

4.2 神经网络电路设计 23

4.2.1 神经网络电路基本架构 23

4.2.2 PE单元设计 24

4.3 硬件仿真 26

4.4 本章小结 27

第五章 总结和展望 28

致谢 29

参考文献 30

第一章 绪论

剩余内容已隐藏,请支付后下载全文,论文总字数:30906字

您需要先支付 80元 才能查看全部内容!立即支付

该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;