分类号 学 号 M200770091 学校代码 10487 密 级
硕士学位论文
听觉系统中语音信号处理相关性质研究
学位申请人: 罗海风 学科专业: 指导教师: 答辩日期:
理论物理 龙长才 教 授 2011年 1月 2日
A Thesis Submitted in Partial Fulfillment of the Requirements
for the Degree of Master of Science
Auditory System Property about Speech
Signal Process
Candidate: Luo HaiFeng Major : Theoretical Physics Supervisor: Professor Long Zhangcai
Huazhong University of Science &Technology
Wuhan 430074, P.R. China
Nov, 2010
独创性声明
本人声明所呈交的学位论文是我个人在导师指导下进行的研究工作及取得的研究成果。尽我所知,除文中已经标明引用的内容外,本论文不包含任何其他个人或集体已经发表或撰写过的研究成果。对本文的研究做出贡献的个人和集体,均已在文中以明确方式标明。本人完全意识到本声明的法律结果由本人承担。
学位论文作者签名: 日期: 年 月 日
学位论文版权使用授权书
本学位论文作者完全了解学校有关保留、使用学位论文的规定,即:学校有权保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借阅。本人授权华中科技大学可以将本学位论文的全部或部分内容编入有关数据库进行检索,可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。
保密□,在_____年解密后适用本授权书。
本论文属于
不保密□。
(请在以上方框内打“√”)
学位论文作者签名: 指导教师签名: 日期: 年 月 日 日期: 年 月
华中科技大学硕士学位论文
摘 要
随着计算机科学技术的快速发展,计算机信息的处理和判断能力进步迅速,但计算机与人类信息交流却进步较小,使得信息瓶颈问题日益严重。语音技术作为人机接口的重要技术之一,经历了半个多世纪的发展,涌现了许多重大的技术突破。在语音识别、说话人识别和语音合成等诸多领域,已经产生了大量的进入实际应用水平的商业化产品,对社会带来了巨大的变化。但是,现有语音系统在面对实际应用环境时还存在着某些限制。因此,对于人类听觉系统的性质研究,具有非常重要的意义和价值。
现有的对听觉系统的大量抗噪性特性研究中,可以按照自下而上和自上而下的分析过程分为两个大类。自下而上分析过程是信号处理系统的基本功能,对声音信号的基本物理特征进行分析和归类;自上而下分析过程则是按照人的意识、注意力和经验等对声音信号的处理进行指导。其中后者作为人类听觉系统和人工系统最显著的区别,是当今听觉研究的热点领域。
本研究从三个不同的方面,揭示了人类听觉系统的两个过程中的某些特性。第一个研究通过心理声学实验,发现了人类听觉系统在进行说话人识别过程中,遵循着类似信息论的信息编码方式,而且受到了语言环境的训练带来的影响。第二个研究则通过同样的方式,揭示出人类在进行说话人识别过程中,不同与现有的技术模型,利用了词汇及更长时间尺度上的连续信息。在第三个研究则揭示了在对语音信号分析过程中,除了存在现有理论揭示的分析过程外,还可能存在基于频谱包络结构这样的高级特征作为辅助,更加增强了在某些特殊环境下(如耳语环境)中的识别能力。
关键词:说话人识别 听觉系统 抗噪性 信息论 频谱包络
I
华中科技大学硕士学位论文
Abstract
With the rapid development of computer science, computer has great improvement in information processing and judgment, but computer information exchange with human has smaller progress, which makes the information bottleneck problem increasingly serious. Speech technology as one of the most important technologies in man-machine interface, experienced more than half a century of development, has made many major breakthroughs. In speech recognition, speaker recognition and speech synthesis fields, a lot of commercial products have entered the practical application stage, which brought great changes to the society. However, the existing technology in practical application environment still has some problems. So, the research of human auditory system properties is of extremely significance and value.
The research on auditory system's robustness could be classified into two basic topics: bottom-up process and up-bottom process. Bottom-up process is the basic functional part of human auditory system, is used to analysis the physical character of sounds and classify them into different sound streams. But in up-bottom process, human auditory system reconstructs the sound scene based on consciousness, attention and experiences. The latter is the most remarkable difference between artificial system and human auditory system, is also the focus in auditory research field.
In our research we studied these two processes' property of human auditory system from three different aspects. From the first experiment and analysis, we can find that in speaker recognition process, human auditory system takes the method similar with the information theory method, is also influenced by native language environment. In the second study, we find that in speaker recognition process, people use word-length or longer information, which is different with modern modals such as GMM. In third research, we found auditory system maybe takes not only the sound clues that existing theory explored, but also high level characteristic such as the kurtosis of spectral envelope, which could improve the system ability in some special conditions, e.g. whisper environment.
Key words: speaker recognition auditory system robustness information theory spectral
envelope.
II