70028
Reinforcement Learning
Module aims
This module provides comprehensive knowledge in reinforcement learning through an integrated approach that traces core algorithmic concepts from their theoretical foundations to state-of-the-art implementations. This course covers tabular and deep reinforcement learning topics and couples classical and modern approaches to provide students with a unified understanding of how RL algorithms have evolved and are applied today.
Students will develop three core skills: theory, implementation, and evaluation. The module follows a ""roots-to-modern"" structure where each major algorithmic family (model-free, model-based, etc) is explored from its tabular origins through to contemporary deep learning implementations. By mid-course, students will be equipped to understand and critically evaluate modern RL research papers.
Specifically, you will:
* Master the theoretical foundations of reinforcement learning (MDPs, dynamic programming, and convergence guarantees)
* Trace the evolution of model-free methods from temporal difference learning to Deep Q-Networks and beyond
* Understand policy-based approaches from REINFORCE to modern actor-critic methods like SAC and PPO
* Explore model-based RL from classical planning to modern neural approaches
* Implement and experiment with both classical and deep RL algorithms using Python and PyTorch
* Develop skills in performance evaluation, hyperparameter tuning, and algorithmic comparison
* Critically analyze current RL research and identify future directions
Learning outcomes
Upon completion of this module, you should be able to:
1. Analyse the mathematical foundations of reinforcement learning and explain how theoretical principles manifest in practical algorithms
2. Trace the evolution of major RL algorithm families from tabular to deep learning implementations
3. Compare classical and modern RL approaches, identifying their respective strengths, limitations, and appropriate use cases
4. Design RL solutions for complex decision-making problems by selecting and adapting appropriate algorithmic approaches
5. Implement both tabular and deep RL algorithms in Python and PyTorch, demonstrating understanding of key implementation details
6. Evaluate RL algorithm performance using appropriate metrics and statistical methods, and propose theoretically-grounded improvements
7. Synthesise insights from multiple sources to critically assess current RL research and identify promising future directions
Module syllabus
The module follows an integrated "algorithm family" structure:
Foundation:
* Reinforcement learning paradigm and applications
* Markov Decision Process framework and mathematical foundations
* Dynamic programming and the principle of optimality
Model-Free Learning: Tabular to Deep:
* Temporal difference learning and Q-learning (tabular)
* Function approximation motivation and neural network foundations
* Deep Q-Networks (DQN): experience replay, target networks, and variants
* Modern developments: Double DQN, Dueling DQN, Rainbow
Policy-Based Methods: Classical to Contemporary:
* Policy gradients and the REINFORCE algorithm
* Actor-critic methods and baseline techniques
* Deep policy gradients: A2C, A3C, and PPO
* Advanced actor-critic: DDPG, TD3, and Soft Actor-Critic (SAC)
Model-Based Approaches: Planning to Learning:
* Classical planning and model-based RL
* Modern model-based methods: Dyna-Q to neural model learning
* Model-based deep RL: PETS, Dreamer, and MuZero concepts
Teaching methods
Teaching consists of integrated lectures and hands-on laboratory sessions, supported by an active discussion forum.
* Lectures introduce theoretical concepts and trace algorithmic evolution, using concrete examples to show how tabular methods scale to deep learning implementations. Each algorithm family is presented as a coherent narrative from inception to current state-of-the-art.
* Laboratory sessions provide immediate reinforcement through progressive implementation exercises. Students begin with simple tabular implementations and gradually incorporate neural networks and advanced techniques, building a comprehensive codebase throughout the course.
* Progressive complexity: By mid course, students can read and understand modern model-free RL papers. By course completion, they can critically evaluate research across all major RL paradigms and identify connections between seemingly disparate approaches.
An online service will be used as a discussion form for the module.
Assessments
The courseworks and exams are structured to cover three different core skills: theory, implementation, and evaluation. There will be a single coursework that assesses both fundamental theory and mathematical solutions, while also assessing practical application through implementation and evaluation. The exam covers both theory and evaluation.
Reinforcement learning has a strong practical element and is best appreciated through implementation and evaluation. True understanding of the meaning behind the various theoretical concepts is only realised through hands-on experience and observing the effects of various design choices. As such, the coursework will have a high level of involvement and will contribute 30% towards the overall grade.
Students will work on the coursework independently and submit individually. Coursework will involve a set of theory questions and practical code.
Reading list
Reading list
-
Reinforcement learning : an introduction
Second edition., The MIT Press
-
Deep learning
The MIT Press