v2.11.0 (5509)

PA - C8 - DS-UPSUD-2 : Graphical models for large scale content access

Descriptif

Course description: Our information society produces an ever-increasing flow of unstructured data of various types (texts, audio, image, video, etc) that needs to be dealt with quickly and effectively. In the face of such polymorphic data, probabilistic models have emerged through their ability to digest the variability of information into effective and sound statistical models. In the last decades, these models have become indispensable tools for information management and decision-making. The course is divided into two main parts. The first part deals with the basic concepts and their computational manipulation: directed and undirected graphical models, and the associated algorithms. In the second part, we focus more specifically on (a) the estimation of latent variable models: (b) approximate inference. We will illustrate these methods with applications from the text mining literature (text classification and clustering, question answering, sentiment analysis, etc).

               

Main themes : 

  • Directed graphical model and probabilistic reasoning 
  • Undirected graphical model
  • Exact inference in graphical models
  • EM and latent variable models
  • Approximate inference: variational techniques
  • Approximate inference: sampling techniques

 

Language: English

 

Numerus Clausus: 24

 

Recommended readings:

  • Probabilistic Graphical Models: Principles and Techniques by Daphne Koller and Nir Friedman. MIT Press.
  • Pattern Recognition and Machine Learning by Chris Bishop.
  • Machine Learning: a Probabilistic Perspective by Kevin P. Murphy. MIT Press
  • Modeling and Reasoning with Bayesian networks by Adnan Darwiche. Information Theory, Inference, and Learning Algorithms by David J. C. Mackay. [Available online.]
  • Graphical models, exponential families, and variational inference by Martin J. Wainwright and Michael I. Jordan. [Available online]

 

Prerequisites: Basic statistics and optimization

 

Grading: Final exam

effectifs minimal / maximal:

/24

Diplôme(s) concerné(s)

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Data Sciences

Le rattrapage est autorisé (Max entre les deux notes)
    L'UE est acquise si Note finale >= 10
    • Crédits ECTS acquis : 3 ECTS

    Programme détaillé

    1. Introduction
    Classification de documents
    Modèles graphiques orientés
    2. Les modèles de thèmes
    Mélange de lois multinomiales
    Algorithme EM
    Modèle PLSA
    Modèle LDA
    3. Les modèles structurés
    Dépendances linguistiques, structures dans les modèles graphiques
    Retour sur les HMM
    Modèles d’alignements IBM1, IBM2
    4. Les modèles conditionnels
    Régression logistique et maximum d’entropie
    Champs Aléatoires Conditionnels : CRF
    5. Inférence exacte
    Elimination des variables
    Passage de messages
    Algorithme de l’arbre de jonction
    6. Inférence approchée: méthodes variationnelles
    Propagation de croyances dans des graphes cycliques
    Principes de l’inférence variationnelle
    Application à LDA
    7. Inférence approchée: échantillonnage
    Principe des méthodes d’échantillonnage
    Application au mélange de lois multinomiales
    Application à LDA

    Veuillez patienter