Abstract [eng] |
Without much effort people can understand what they see, what other people say. One gaze or phrase is enough to recognize each other. However, in technical sphere even the easiest automatic object recognition is a hard task. A huge number of researches occur in biometrics for faster, more reliable and accurate human recognition. HTK, Kaldi and MSR Identity Toolbox are widely used for various voice technology tasks. These open source packages could be apply for audio signal analysis and phonetic alignment i.e. for phoneme boundaries segmentation for given transcription. The research of speech signal acoustic models for speaker recognition been described in this paper. The aim of this thesis – to investigate acoustic speech signal models suitable for speaker recognition. In the analytical practical part voice records were investigate, MFCC features were extracted, statistics of acoustic models components were determined, GMM and k-mean acoustic speech signal models were trained for English, Spanish, Italian, French, Russian and German languages, investigation of created acoustic models was done. Furthermore, investigation results has shown that components in records distributed differently. The six most common acoustic models components were chose. Moreover, it turned out that most common voice and background components are different. Statistical analysis has shown that log-likelihoods are not statistically significant different for different language records when the same type and the same language acoustic models were applied. Besides, log-likelihoods are not statistically significant different for different language records when English acoustic models were used. Finally, log-likelihoods differ mostly in Spanish and English records. Increasing the number of English and Spanish records log-likelihoods are statistically significant different when English acoustic models are applied. |