UE DS-ENSAE-2 | Catalogue 2021-2022

Descriptif

Course description: This course develops tools to analyze statistical problems in high-dimensional settings where the number of variables may be greater than the sample size. It is in contrast with the classical statistical theory that focuses on the behavior of estimators in the asymptotics as the sample increases while the number

of variables stays fixed. We will show that, in high-dimensional problems, powerful statistical methods can be constructed under such properties as sparsity or low-rankness. The emphasis will be on the non-asymptotic theory underlying these developments.

Topics covered:

-- Sparsity and thresholding in the Gaussian sequence model.

-- High-dimensional linear regression: Lasso, BIC, Dantzig selector, Square

Root Lasso. Oracle inequalities and variable selection properties.

-- Estimation of high-dimensional low rank matrices. Matrix completion.

-- Inhomogeneous random graph model. Community detection and esti-

mation in the stochastic block model.

Prerequisites: Solid knowledge of probability theory, mathematical statis-

tics, linear algebra. Notions of convex optimization.

Resources:

Alexandre Tsybakov. High-dimensional Statistics. Lecture Notes.

Grading: The grade is determined by a final exam. Extra points can be obtained for optional homeworks.

effectifs minimal / maximal:

/15

Diplôme(s) concerné(s)

Data Sciences

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Data Sciences

Le rattrapage est autorisé (Max entre les deux notes)

L'UE est acquise si Note finale >= 10

Crédits ECTS acquis : 3 ECTS

PA - C8 - DS-ENSAE-2 : Statistique en grande dimension