UE DS-UPSUD-2 | Catalogue 2022-2023

Descriptif

Course description: Our information society produces an ever-increasing flow of unstructured data of various types (texts, audio, image, video, etc) that needs to be dealt with quickly and effectively. In the face of such polymorphic data, probabilistic models have emerged through their ability to digest the variability of information into effective and sound statistical models. In the last decades, these models have become indispensable tools for information management and decision-making. The course is divided into two main parts. The first part deals with the basic concepts and their computational manipulation: directed and undirected graphical models, and the associated algorithms. In the second part, we focus more specifically on (a) the estimation of latent variable models: (b) approximate inference. We will illustrate these methods with applications from the text mining literature (text classification and clustering, question answering, sentiment analysis, etc).

Main themes :

Directed graphical model and probabilistic reasoning
Undirected graphical model
Exact inference in graphical models
EM and latent variable models
Approximate inference: variational techniques
Approximate inference: sampling techniques

Language: English

Numerus Clausus: 24

Recommended readings:

Probabilistic Graphical Models: Principles and Techniques by Daphne Koller and Nir Friedman. MIT Press.
Pattern Recognition and Machine Learning by Chris Bishop.
Machine Learning: a Probabilistic Perspective by Kevin P. Murphy. MIT Press
Modeling and Reasoning with Bayesian networks by Adnan Darwiche. Information Theory, Inference, and Learning Algorithms by David J. C. Mackay. [Available online.]
Graphical models, exponential families, and variational inference by Martin J. Wainwright and Michael I. Jordan. [Available online]

Prerequisites: Basic statistics and optimization

Grading: Final exam

effectifs minimal / maximal:

/24

Diplôme(s) concerné(s)

M2 Data Sciences

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme M2 Data Sciences

Le rattrapage est autorisé (Max entre les deux notes)

L'UE est acquise si Note finale >= 10

Crédits ECTS acquis : 3 ECTS

Programme détaillé

1. Introduction
Classification de documents
Modèles graphiques orientés
2. Les modèles de thèmes
Mélange de lois multinomiales
Algorithme EM
Modèle PLSA
Modèle LDA
3. Les modèles structurés
Dépendances linguistiques, structures dans les modèles graphiques
Retour sur les HMM
Modèles d’alignements IBM1, IBM2
4. Les modèles conditionnels
Régression logistique et maximum d’entropie
Champs Aléatoires Conditionnels : CRF
5. Inférence exacte
Elimination des variables
Passage de messages
Algorithme de l’arbre de jonction
6. Inférence approchée: méthodes variationnelles
Propagation de croyances dans des graphes cycliques
Principes de l’inférence variationnelle
Application à LDA
7. Inférence approchée: échantillonnage
Principe des méthodes d’échantillonnage
Application au mélange de lois multinomiales
Application à LDA

PA - C8 - DS-UPSUD-2 : Graphical models for large scale content access