Navigation

26D111OPG - Selected Topics in Program Code Generation

Course specification
Course title Selected Topics in Program Code Generation
Acronym 26D111OPG
Study programme Electrical Engineering and Computing
Module Software Engineering
Type of study doctoral studies
Lecturer (for classes)
Lecturer/Associate (for practice)
    Lecturer/Associate (for OTC)
      ESPB 9.0 Status elective
      Condition
      The goal The goal of the course is to enable students to understand and apply methods for program synthesis using large language models. The course covers the construction of code generation systems (including fine-tuning, inference, and evaluation) and explores current research directions in code generation, such as interaction with programmers, model reliability, adaptability, and applications.
      The outcome Upon completion of the course, students will be able to: understand the key algorithmic and architectural foundations of large language models for code generation; apply techniques for fine-tuning, inference, and evaluating models; analyze and critically evaluate research papers in the area of ​​code generation, and present their own ideas for improvement in the field.
      Contents
      Contents of lectures • Introduction to code generation: motivation, history, basic concepts of large language code models. • Fundamentals: learning (pre-training and fine-tuning), data (sets, synthetic data), inference, evaluation (methodologies and benchmarks). • Interaction with people (developers + models), adaptability (long context, search-augmented generation - RAG, self-correcting code), applications.
      Contents of exercises Writing a seminar paper: studying a collection of existing papers, summarizing the content, discussing the advantages, disadvantages and future directions of research, reproducibility of results. Alternatively: Implementing a practical research project, formulating the problem, conducting an experimental evaluation and presenting the results.
      Literature
      1. M. Chen et al, Evaluating Large Language Models Trained on Code, https://arxiv.org/abs/2107.03374 (Original title)
      2. D. Fried, InCoder: A Generative Model for Code Infilling and Synthesis, https://arxiv.org/abs/2204.05999 (Original title)
      3. N. Muennighoff, OctoPack: Instruction Tuning Code Large Language Models, https://arxiv.org/abs/2308.07124 (Original title)
      4. J. Liu, Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation, https://arxiv.org/abs/2305.01210 (Original title)
      5. Selected research papers
      Number of hours per week during the semester/trimester/year
      Lectures Exercises OTC Study and Research Other classes
      8
      Methods of teaching Tutoring, individual project
      Knowledge score (maximum points 100)
      Pre obligations Points Final exam Points
      Activites during lectures Test paper
      Practical lessons Oral examination 30
      Projects 70
      Colloquia
      Seminars