The spoken language can be divided into two groups, vowels and consonants. This distinction can be done with the help of a computer. To achieve that, some interm steps are necessary.
A suitable database containing speech samples is used to calculate features. The voice recordings must be divided into time frames. For each frame nearly 100 different values are calculated which help to differentiate the sounds of speech. All of the algorithms used in this thesis are utilized in audio and signal processing.
For machine learning it is advantageous to have few meaningful features. That is why the number of features get reduced. This is done with different algorithms, such as principal component analysis.
The data is used to train an artificial intelligence. Through the collected information it is able to distinguish between vowels and consonants with a high accuracy. Post-processing improves this result by correcting outliers.
At the start, the thesis discusses the basics needed to understand the algorithms and the database. All relevant language processing algorithms and used databases are subsequently described in greater detail. Since this thesis focuses on signal processing, machine learning is treated in less detail.
Keywords: vowel recognition, AI, formant
August 18th, 2020