26D111OPG - Selected Topics in Program Code Generation

Course specification
Course title		Selected Topics in Program Code Generation
Acronym		26D111OPG
Study programme		Electrical Engineering and Computing
Module		Software Engineering
Type of study		doctoral studies
Lecturer (for classes)		professor PhD Dragan Bojić
Lecturer/Associate (for practice)
Lecturer/Associate (for OTC)
ESPB		9.0	Status	elective
Condition
The goal		The goal of the course is to enable students to understand and apply methods for program synthesis using large language models. The course covers the construction of code generation systems (including fine-tuning, inference, and evaluation) and explores current research directions in code generation, such as interaction with programmers, model reliability, adaptability, and applications.
The outcome		Upon completion of the course, students will be able to: understand the key algorithmic and architectural foundations of large language models for code generation; apply techniques for fine-tuning, inference, and evaluating models; analyze and critically evaluate research papers in the area of code generation, and present their own ideas for improvement in the field.
Contents
Contents of lectures		• Introduction to code generation: motivation, history, basic concepts of large language code models. • Fundamentals: learning (pre-training and fine-tuning), data (sets, synthetic data), inference, evaluation (methodologies and benchmarks). • Interaction with people (developers + models), adaptability (long context, search-augmented generation - RAG, self-correcting code), applications.
Contents of exercises		Writing a seminar paper: studying a collection of existing papers, summarizing the content, discussing the advantages, disadvantages and future directions of research, reproducibility of results. Alternatively: Implementing a practical research project, formulating the problem, conducting an experimental evaluation and presenting the results.
Literature
M. Chen et al, Evaluating Large Language Models Trained on Code, https://arxiv.org/abs/2107.03374 (Original title) D. Fried, InCoder: A Generative Model for Code Infilling and Synthesis, https://arxiv.org/abs/2204.05999 (Original title) N. Muennighoff, OctoPack: Instruction Tuning Code Large Language Models, https://arxiv.org/abs/2308.07124 (Original title) J. Liu, Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation, https://arxiv.org/abs/2305.01210 (Original title) Selected research papers
Number of hours per week during the semester/trimester/year
Lectures	Exercises	OTC	Study and Research	Other classes
8
Methods of teaching		Tutoring, individual project
Knowledge score (maximum points 100)
Pre obligations		Points	Final exam	Points
Activites during lectures			Test paper
Practical lessons			Oral examination	30
Projects		70
Colloquia
Seminars