Descriptif
This delves into the theoretical underpinnings of sequential decision-making in artificial intelligence (AI), focusing on the rigorous mathematical frameworks that govern online learning, multi-armed bandits, and Markov Decision Processes (MDPs). It begins with a study of online learning through the lens of regret minimization in adversarial and stochastic settings, including the analysis of follow-the-leader, follow-the-regularized-leader, and mirror descent methods. It then transitions to multi-armed bandits, where students will analyze the trade-offs between exploration and exploitation and derive guarantees for algorithms such as UCB, Thompson sampling, and exp3. The final part covers MDPs, emphasizing dynamic programming, value iteration, and policy gradient methods with special attention to the theoretical guarantees of these approaches.
Diplôme(s) concerné(s)
Format des notes
Numérique sur 20Littérale/grade réduitPour les étudiants du diplôme Programmes d'échange internationaux
Vos modalités d'acquisition :
Examen final écrit sans calculatrice.
Pour les étudiants du diplôme Titre d’Ingénieur diplômé de l’École polytechnique
Vos modalités d'acquisition :
Examen final écrit sans calculatrice.Le rattrapage est autorisé (Note de rattrapage conservée)
- Crédits ECTS acquis : 5 ECTS
La note obtenue rentre dans le calcul de votre GPA.
La note obtenue est classante.