IFDS/MADLab Working Group on Multi-Armed Bandits/Reinforcement Learning

IFDS/MADLab Working Group on
Multi-Armed Bandits and Reinforcement Learning

Organizer: Robert Nowak
Group Meetings: 1-3pm Fridays, 2534 Engr Hall

Focus: The mathematical foundations of Reinforcement Learning (RL), beginning with Multi-Armed Bandits (MAB).

Tentative Schedule:

Week 1: Introduction to RL, concentration inequalities, basics of MABs

Week 2: Online Learning, [Bubeck] Chapters 1, 2.1-2.3 (presenter Blake Mason)

Week 3: Stochastic MABs Regret Minimization, [BubeckCesaBianchi] Chapter 2 (presenter Ardhendu Tripathy)

Week 4: Stochastic MABs Ranking, [JamiesonNowak] (presenter Moayad Alnammi, special 1:20pm start)

Week 5: Lower Bounds, [SzepesvariLattimore], Chapter 13

Week 6: Non-stochastic MABs Regret Minimization, [SzepesvariLattimore] Adversarial Bandits pp. 140-165

Week 7: Non-stochastic MABs Ranking, [LiEtAl]

Week 8: Contextual Bandits, [SzepesvariLattimore] Contextual Bandits pp. 213-223

Week 9: Linear Bandits, [SzepesvariLattimore] Adversarial Bandits pp. 227-235

Weeks 10-14: Reinforcement Learning and Optimal Control
[Bertsekas], [Munos], [Szepesvari], [AbbasiyadkoriSzepesvari], [DeanEtAl], [FazelGKM]