Faculty of Technical Sciences

Subject: (17.EAI550)

General information:
 
Category Scientific-professional
Scientific or art field Telecommunications and Signal Processing
ECTS 6

Based on artificial intelligence and machine learning, speech technologies enable the development of new interfaces between humans and smart environment: phones, computers, devices in smart homes etc. Building onto the knowledge acquired through several undergraduate academic courses, the objective of this course is to widen the multidisciplinary knowledge of students in the area of human-machine speech communication. In order to understand the algorithms for automatic speech recognition and synthesis, speaker recognition and emotional speech recognition, students should become more familiar with the features of human speech and its acoustic and linguistic models. Apart from understanding of algorithms, the aim of the course is that students become familiar with software tools for speech signal processing and learn about speech technology applications.

Students become familiar with basic machine learning algorithms used in automatic speech recognition (ASR) and in text-to-speech synthesis (TTS). In that way students acquire the fundamental knowledge needed in ASR and TTS development and application. They acquire the knowledge necessary for recording and processing speech signal databases and for understanding the algorithms for automatic speech recognition and synthesis, but also for speaker and emotion recognition, as well as language modules and dialogue systems. At the end of the course students are familiar with the possibilities of speech technologies, as well as with the tools for development of applications based on these technologies and are ready to give their professional contribution in this scientific and technical field.

•Introduction to ASR and TTS: history, terminology, perspectives •Speech: producation and perception, nature and characteristics (t-f display + labelling (AlfaNum)) •Speech signal: analysis and types of display on a computer (LPC, MFCC, PLP + visualisation (Matlab)) •Natural language processing: language modelling (n-grams) + HMM (HTK) •Approaches to ASR (DTW, HMM, DNN), acoustical, lexical and linguistic models •Procedures of ASR training: GMM, k-means, VQ, Baum-Welch, ML MMI, MWE MPE (HTK) •Algorithms for ASR decoding: Viterbi, Token passing, N-best (HTK) •Robust ASR methods: VTN, CMN, noise suppression •Text-to-speech synthesis (TTS): language processing, synthesis (concatenative, HMM and DNN) •Recognition of speakers (automatic and forensic) •Recognition of emotions in speech •Dialogue modelling, spoken language understanding (SLU), dialogue systems

Lectures are performed with PowerPoint presentations accompanied by numerous audio and video attachments and animations. They are followed by the practical exercises in the Laboratory of Acoustics and Speech Technologies and in a sound studio at FTS. Visits to some companies are arranged, where students can learn more about speech technologies. The exam prerequisites are a seminar work and a project - the condition for entering the exam is 25 of 50 points. Seminar works are done individually and it can serve as basis for master thesis. Independent student work on the project task is supported through the web portal of the Chair of Communications and Signal Processing - www.telekom.ftn.uns.ac.rs.

Authors Title Year Publisher Language
Uday Kamath, John Liu, James Whitaker Deep Learning for NLP and Speech Recognition 2019 Springer English
Paul Taylor Text-to-Speech Synthesis 2009 Cambridge University Press English
Dong Yu and Li Deng Automatic Speech Recognition – A Deep Learning Approach 2015 Springer-Verlag London English
Course activity Pre-examination Obligations Number of points
Term paper Yes Yes 20.00
Project Yes Yes 30.00
Theoretical part of the exam No Yes 50.00
API Image

Prof. Delić Vlado

Full Professor

Lectures
API Image

Prof. Sečujski Milan

Full Professor

Lectures
API Image

Asistent sa doktoratom dr Simić Nikola

Assistant with PhD

Laboratory classes
API Image

Senior Science Associate Popović Branislav

viši naučni saradnik

Laboratory classes

Faculty of Technical Sciences

© 2024. Faculty of Technical Sciences.

Contact:

Address: Trg Dositeja Obradovića 6, 21102 Novi Sad

Phone:  (+381) 21 450 810
(+381) 21 6350 413

Fax : (+381) 21 458 133
Emejl: ftndean@uns.ac.rs

© 2024. Faculty of Technical Sciences.