Courses – Fakultet tehničkih nauka u Novom Sadu

Subject: Human-Machine Speech Communication (17.DE512)

Native organizations units: Department of Power, Electronic and Telecommunication Engineering

Type of studies	Title
Doctoral Academic Studies	Power, Electronic and Telecommunication Engineering (Year: 2, Semester: Winter)
Doctoral Academic Studies	Biomedical Engineering (Year: 2, Semester: Winter)

General information:

Category	Scientific-professional
Scientific or art field	Telecommunications and Signal Processing
ECTS	10

The aim is to expand and deepen the multidisciplinary knowledge of PhD students related to human-machine speech communication. In order to understand machine learning algorithms for speech signal processing, it is necessary to get introduced with features of speech signal and its acoustic and linguistic models. Firstly, to master the application of software tools for audio (speech) signals. Secondly, to understand the algorithms used in speech signal processing, in particular adaptive algorithms and deep learning techniques for automatic speech recognition and speech synthesis based on the given text. Expand knowledge on the speaker identification and verification, as well as the emotion recognition. Also introduce students to the basics of natural language processing, the dialogue management and the dialogue system.

During the course, the PhD students have to theoretically explore machine learning algorithms, used in automatic speech recognition (ASR), speaker identification and verification, as well as in synthesizing speech based on text (TTS). In addition, they will practically master most of the software tools and techniques for processing speech signals. In this way, they acquire all the necessary background for understanding algorithms for ASR and TTS. Hence, necessary knowledge is acquired for recording and processing of speech databases and working on development of multimodal systems where ASR and TTS are applicable. Learn about the basic elements of natural language processing and dialogue management. At the end of course they are familiar with the capabilities of automatic speech recognition and synthesis, as well as tools for developing applications and dialogue systems based on these new speech technologies and are ready to provide technical and scientific contributions in this field.

•Physiological acoustics and acoustic modelling of speech. •Psycho-acoustics and perception of sound. •Articulation and acoustic phonetics. •The fundamentals of formal languages theory. •Linguistic modelling of speech. •Pre-processing of speech signal and extraction of relevant features. •Recording and processing of speech database for ASR and TTS. •The theory of finite automates and statistical models, hidden Markov models (HMM). •Viterbi algorithm, vector quantization, clustering, parsing techniques. •Algorithms based on the comparison of samples and dynamic programming (DTW). •Statistical approach based on HMM. •Expert systems for automatic speech recognition. •Deep Neural Network (DNN) and hybrid systems (DNN-HMM). •Algorithms for speaker`s identification and verification. •Morphological and syntactic analysis of text. •Concatenative approach to text-to speech synthesis. •Speech synthesis in the time domain. •Parametric synthesis of speech based on HMM or DNN. •Natural language processing (NLP) and dialogue management (DM). •Telephone and Internet voice portals (CTI, IVR). •Automation of call centre. •Applications in the household, industry, cars. •Humane application of speech technology. •Learning Serbian as a foreign language by using voice machine. •Using standard software tools for working with audio (Sound Forge, Praat). •Implementation of algorithms for processing speech signals (Matlab, DSP, HTK, Kaldi). •Tools for development of applications with speech technologies (SAPI, VoiceXML, Merlin, TensorFlow, etc.).

Classes are a combination of lectures and tutorial work. Study research work includes active monitoring of primary scientific sources, organization and performance of experiments and statistical processing of data, numerical simulations, as well as writing of the paper from the narrow scientific field where the doctoral dissertation belongs to. At the web portal of the Chair of Telecommunications and Signal Processing the students can find PowerPoint presentations of lectures with numerous audio and video attachments and animations, as well as some on-line practice details intended for individual work. A part of the course includes the practice work at the Acoustics and Speech Technologies Laboratory at FTS and visits to some companies, where the doctoral students are further acquainted with speech technologies. The completion of a practical project is the course prerequisite. At the final exam, all the knowledge acquired at the course is evaluated.

Authors	Title	Year	Publisher	Language
T. Quatieri	Discrete-Time Speech Signal Processing - Principles and Practice	2002	Prentice Hall	English
L. Rabiner and B-H. Juang	Fundamentals of Speech Recognition	1993	Prentice Hall	English
B. Gold and N. Morgan	Speech and Audio Signal Processing - Processing and Perception of Speech and Music	2000	JW&S	English
T. Dutoit	An Introduction to Text-to-Speech Synthesis	1997	Kluwer	English