Jonker, C.J. (2011) Towards using Echo State Neural Networks to classify Phonemes from Formant Frequencies. Bachelor's Thesis, Artificial Intelligence.
Text
AI-BA-2011-C.J.Jonker.pdf - Other Restricted to Registered users only Download (445kB) |
Abstract
Although Automatic Speech Recognition (ASR) has improved over the past decade, current ASR system performance remains inferior to human speech perception. ASR systems mainly use Hidden Markov Models as acoustic models. These models tend to work only in the exact situation they were trained and show limited robustness towards noise, change in environment or change of speaker. Among alternative approaches is the Echo State Network as acoustic model. The use of Mel-Frequency Cepstral Coefficients (MFCCs) as input vectors further limits noise robustness in ASR. MFCCs treat noise and speech equally and can therefore strongly be influenced by low energy noise contributions. Formant Frequencies are much less sensitive to noise and may be an alternative to MFCCs as input for ASR-systems. In this bachelor project an alternative approach using Formant Frequencies as input to a phoneme recognizing Echo State Network is explored. Although a lot of training data was provided and the network has been trained several days, no generalization over the dataset occured. Formant Frequencies seem to contain insufficient information to perform phoneme classification in this way.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 15 Feb 2018 07:45 |
Last Modified: | 15 Feb 2018 07:45 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/9560 |
Actions (login required)
View Item |