Faculty of Technical Sciences

Subject: Human-Machine Speech Communication (17.DE512)

Native organizations units: Department of Power, Electronic and Telecommunication Engineering
General information:
 
Category Scientific-professional
Scientific or art field Telecommunications and Signal Processing
Interdisciplinary No
ECTS 10
Educational goal:

The aim is to expand and deepen the multidisciplinary knowledge of PhD students related to human-machine speech communication. In order to understand machine learning algorithms for speech signal processing, it is necessary to get introduced with features of speech signal and its acoustic and linguistic models. Firstly, to master the application of software tools for audio (speech) signals. Secondly, to understand the algorithms used in speech signal processing, in particular adaptive algorithms and deep learning techniques for automatic speech recognition and speech synthesis based on the given text. Expand knowledge on the speaker identification and verification, as well as the emotion recognition. Also introduce students to the basics of natural language processing, the dialogue management and the dialogue system.

Educational outcome:

During the course, the PhD students have to theoretically explore machine learning algorithms, used in automatic speech recognition (ASR), speaker identification and verification, as well as in synthesizing speech based on text (TTS). In addition, they will practically master most of the software tools and techniques for processing speech signals. In this way, they acquire all the necessary background for understanding algorithms for ASR and TTS. Hence, necessary knowledge is acquired for recording and processing of speech databases and working on development of multimodal systems where ASR and TTS are applicable. Learn about the basic elements of natural language processing and dialogue management. At the end of course they are familiar with the capabilities of automatic speech recognition and synthesis, as well as tools for developing applications and dialogue systems based on these new speech technologies and are ready to provide technical and scientific contributions in this field.

Course content:

•Physiological acoustics and acoustic modelling of speech. •Psycho-acoustics and perception of sound. •Articulation and acoustic phonetics. •The fundamentals of formal languages theory. •Linguistic modelling of speech. •Pre-processing of speech signal and extraction of relevant features. •Recording and processing of speech database for ASR and TTS. •The theory of finite automates and statistical models, hidden Markov models (HMM). •Viterbi algorithm, vector quantization, clustering, parsing techniques. •Algorithms based on the comparison of samples and dynamic programming (DTW). •Statistical approach based on HMM. •Expert systems for automatic speech recognition. •Deep Neural Network (DNN) and hybrid systems (DNN-HMM). •Algorithms for speaker`s identification and verification. •Morphological and syntactic analysis of text. •Concatenative approach to text-to speech synthesis. •Speech synthesis in the time domain. •Parametric synthesis of speech based on HMM or DNN. •Natural language processing (NLP) and dialogue management (DM). •Telephone and Internet voice portals (CTI, IVR). •Automation of call centre. •Applications in the household, industry, cars. •Humane application of speech technology. •Learning Serbian as a foreign language by using voice machine. •Using standard software tools for working with audio (Sound Forge, Praat). •Implementation of algorithms for processing speech signals (Matlab, DSP, HTK, Kaldi). •Tools for development of applications with speech technologies (SAPI, VoiceXML, Merlin, TensorFlow, etc.).

Teaching methods:

Classes are a combination of lectures and tutorial work. Study research work includes active monitoring of primary scientific sources, organization and performance of experiments and statistical processing of data, numerical simulations, as well as writing of the paper from the narrow scientific field where the doctoral dissertation belongs to. At the web portal of the Chair of Telecommunications and Signal Processing the students can find PowerPoint presentations of lectures with numerous audio and video attachments and animations, as well as some on-line practice details intended for individual work. A part of the course includes the practice work at the Acoustics and Speech Technologies Laboratory at FTS and visits to some companies, where the doctoral students are further acquainted with speech technologies. The completion of a practical project is the course prerequisite. At the final exam, all the knowledge acquired at the course is evaluated.

Literature:
Authors Title Year Publisher Language
B. Gold and N. Morgan Speech and Audio Signal Processing - Processing and Perception of Speech and Music 2000 JW&S English
Vlado Delić i dr. Audio-izdanje udžbenika i prezentacija u okviru CABUNS-a 2019 Univerzitet u Novom Sadu Serbian language
T. Quatieri Discrete-Time Speech Signal Processing - Principles and Practice 2002 Prentice Hall English
L. Rabiner and B-H. Juang Fundamentals of Speech Recognition 1993 Prentice Hall English
T. Dutoit An Introduction to Text-to-Speech Synthesis 1997 Kluwer English
Knowledge evaluation:
Course activity Pre-examination Obligations Number of points
Project Yes Yes 50.00
Oral part of the exam No Yes 50.00
Lecturers:
API Image

Popović Branislav

viši naučni saradnik

Lectures
API Image

doc. Suzić Siniša

Assistant Professor

Lectures
API Image

prof. dr Delić Vlado

Full Professor

Lectures

Faculty of Technical Sciences

© 2024. Faculty of Technical Sciences.

Contact:

Address: Trg Dositeja Obradovića 6, 21102 Novi Sad

Phone:  (+381) 21 450 810
(+381) 21 6350 413

Fax : (+381) 21 458 133
Emejl: ftndean@uns.ac.rs

© 2024. Faculty of Technical Sciences.