Electrical Engineering Seminar and Special Problems- AMP FOR FPGA
Machine Teaching
Prerequisites
Basic programming and a graduate level machine learning course.
Course Content
The main theme of this course is to expose the students to the current state of the art approaches in teaching AI models from human feedback. Starting from the earliest ideas dating back to 2000’s to the recent ones behind success of GPT. List of topics that I plan to cover are the following :
1. Behavioral cloning, and its suboptimality bounds.
Concepts from well known methods : Dagger paper – A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, and Behavioral Cloning from Observation.
Assignment 1: Develop the behavioural cloning algorithm on an openAI gym example such as Hopper or Mountain Car. The demonstration traces can be from a pretrained neural network which acts as the expert.
2. Reinforcement learning, and use some openly available resources for implementation. Spinning up in RL is a great resource. For fundamentals there is Sutton and Barto book. Optional material would be Proximal Policy Optimization.
3. Reward Specification: This covers the ideas about reward hacking. The important papers that I plan to cover are : The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications, article on reward hacking problem, and the paper on characterizing reward hacking.
4. Interactive Reinforcement Learning : The Tamer Framework and Interactive Learning from Policy- Dependent Human Feedback
5. Inverse Reinforcement Learning – There are two classic papers here : Apprenticeship Learning via Inverse Reinforcement Learning and Maximum Entropy Inverse Reinforcement Learning.
Assignment 2 : Implement maximum entropy inverse reinforcement learning on a simple grid world like setting.
6. Adversarial Imitation Learning : The relevant material here is Generative Adversarial Imitation Learning and Learning Robust Rewards with Adversarial Inverse Reinforcement Learning.
7. Preference Learning and RL from Human Feedback : The two main papers here are Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations and RLHF.
8. Continued – Preference Learning and RLHF : Preference optimization paper and Contrastive preference learning.
Assignment 3 : This would be an RLHF assignment.
Final Project Proposals.
9. Active Reward Learning and Optimal Teaching : This would include papers Active Reward Learning from Critiques and Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications.
10. Miscellaneous Topics : Reward Design with Language Models and Reinforcement Learning with AI feedback – RLAIF.
11. Final Project Presentations.
Related Courses at UBC
1. There are some good courses on the basics of machine learning, from both CS and the ECE departments. For instance EECE 571F by Prof Renjie Liao. This covers some commonly used deep learning models, probabilistic models, and reinforcement learning algorithms. Same goes for CPEN 455 which covers general deep learning paradigms.
From the CS department there are courses on more theoretical aspects like statistical machine learning CPSC 532D, and another course on basics CPSC 440/550. Both of these courses are by Danica Sutherland.
2. An introduction to basic ML course from the CS department – CPSC 532M/340. There might be a version of this course by Jeff Clune as well in the more recent offerings.
3. Then there are some target courses from the CS department on specific uses. In case of NLP there is a course by Vered Shwartz – CPSC 532V. On multimodal learning, and representation learning from Leonid Sigal CPSC 532. There is a course called never-ending RL CPSC 532J, this seems to be a seminar style course which tries to touch a bunch of recent topics.
4. Then there are some older courses which might be offered less frequently. CPSC 502 is a course on symbolic AI, based on a book by Poole and Mackworth. A course on robotic/control centered topics CPSC 515 from Ian Mitchell. This had some examples of F1tenth in the content.