19D031GT - Speech Technologies

Course specification
Course title		Speech Technologies
Acronym		19D031GT
Study programme		Electrical Engineering and Computing
Module		Telecommunications
Type of study		doctoral studies
Lecturer (for classes)		professor PhD Dragana Šumarac Pavlović
Lecturer/Associate (for practice)
Lecturer/Associate (for OTC)
ESPB		9.0	Status	elective
Condition		Passed the examination - Fundamentals of speech communications
The goal		The goal is to master E2E architectures, speech synthesis, and SSL methods. Through research, the focus is on critical analysis and developing solutions for robustness, biometrics, and paralinguistic extraction. Students develop the capacity to design innovative speech systems and produce high-quality scientific publications.
The outcome		Students will be able to design E2E and diffusion models for speech recognition and synthesis. They will master SSL techniques for feature extraction and paralinguistic challenges in low-resource scenarios. They will develop the ability to create original solutions in speech biometrics and prepare scientific papers according to top international standards.
Contents
Contents of lectures		Physiology and acoustics of speech, elements of linguistics, psychoacoustics, speech perception, and psycholinguistics. Theories and systems in speech synthesis and recognition. Methods of languages and speakers recognition (biometric and forensic applications). Strategies in the design of human-computer dialogue. Specific applications of these technologies in the multi-modal communications.
Contents of exercises		The application of various software tools in the speech signal processing and development of adopted theoretical and practical knowledge through seminars and/or projects.
Literature
Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing (3rd Edition Draft). (Original title) Tan, X. (2022). Neural Speech Synthesis. Springer Nature. (Original title) Tan, X., Qin, T., Soong, F., & Liu, T. Y. (2021). A Survey on Neural Speech Synthesis. Microsoft Research Asia. (Published in IEEE Access/arXiv). (Original title) Li, J. (2020). Recent Advances in End-to-End Automatic Speech Recognition. Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SIP), Cambridge University Press. (Original title) Mohamed, A., Lee, H. Y., Borgholt, L., et al. (2022). Self-Supervised Speech Representation Learning: A Review. IEEE Journal of Selected Topics in Signal Processing. (Original title)
Number of hours per week during the semester/trimester/year
Lectures	Exercises	OTC	Study and Research	Other classes
8
Methods of teaching		Consultations, seminar work and/or participation in projects.
Knowledge score (maximum points 100)
Pre obligations		Points	Final exam	Points
Activites during lectures		20	Test paper	0
Practical lessons		0	Oral examination	30
Projects
Colloquia		0
Seminars		50