IFDS/MADLab Working Group on
Multi-Armed Bandits and Reinforcement Learning
Organizer: Robert Nowak
Group Meetings: 1-3pm Fridays, 2534 Engr Hall
Focus: The mathematical foundations of Reinforcement Learning (RL), beginning with Multi-Armed Bandits (MAB).
Tentative Schedule:
Week 1: Introduction to RL, concentration inequalities, basics of MABs
Week 2: Online Learning, [Bubeck] Chapters 1, 2.1-2.3 (presenter Blake Mason)
Week 3: Stochastic MABs Regret Minimization, [BubeckCesaBianchi] Chapter 2 (presenter Ardhendu Tripathy)
Week 4: Stochastic MABs Ranking, [JamiesonNowak] (presenter Moayad Alnammi, special 1:20pm start)
Week 5: Lower Bounds, [SzepesvariLattimore], Chapter 13
Week 6: Non-stochastic MABs Regret Minimization, [SzepesvariLattimore] Adversarial Bandits pp. 140-165
Week 7: Non-stochastic MABs Ranking, [LiEtAl]
Week 8: Contextual Bandits, [SzepesvariLattimore] Contextual Bandits pp. 213-223
Week 9: Linear Bandits, [SzepesvariLattimore] Adversarial Bandits pp. 227-235
Weeks 10-14: Reinforcement Learning and Optimal Control
[Bertsekas], [Munos], [Szepesvari], [AbbasiyadkoriSzepesvari], [DeanEtAl], [FazelGKM]