This course introduces reinforcement learning (RL) from a control-theoretic and optimization perspective, with emphasis on online decision-making and theoretical guarantees. Core topics include approximate dynamic programming (ADP) for both finite and infinite-horizon settings, including value function approximation, rollout methods, value and policy iteration, and policy space approximation. The course covers both model-based and selected model-free approaches under a unifying approximation framework. Additional topics include state aggregation and model learning, with an emphasis on their roles in improving prediction, control, and decision-making. Applications focus on real-time decision-making and control in mobility systems. A final project requires students to design and implement learning-based controllers in online or sequential decision environments.
Objectives: