Descriptif
The goal of this course is to give an introduction to Machine Learning methods in Econometrics. We will analyze methods for making prediction and methods for causal inference in economic settings where one wants to learn the effects of counterfactual policies (policy evaluation).
The machine learning methods that will be presented in the course are well suited to deal with datasets with many observations and/or many covariates. We will show that statistical methods based on machine learning do not allow to answer the causal questions of interest in economics and so specific econometrics methods based on machine learning have to be used.
All along the course, the different models will be illustrated through applications and case studies. The applications will be developed by using R as statistical software package.
Objectifs pédagogiques
- Understanding the challenges related to: Big Data sets, modeling of nonlinearities, endogeneity in econometrics, causal inference in presence of large datasets and of possible unkwnon nonlinearities
- Understading of the main methodologies based on Machine Learning algorithms to perform causal inference
- Implementation in R of the methodologies analyzed and application to real datasets.
effectifs minimal / maximal:
/25Diplôme(s) concerné(s)
Parcours de rattachement
Pour les étudiants du diplôme Programmes d'échange internationaux
Connaissance de base d'économètrie et statistique
Pour les étudiants du diplôme Titre d’Ingénieur diplômé de l’École polytechnique
Connaissance de base d'économètrie et statistique
Pour les étudiants du diplôme Programmes d'échange internationaux
Pour les étudiants du diplôme Titre d’Ingénieur diplômé de l’École polytechnique
Programme détaillé
- Where “Big Data” come from in economics? Introduction to Statistical Learning, review of linear regression and least squares. Review of the concept of endogeneity and of instrumental variables. Introduction to R.
- Linear Regression Model with many covariates: subset selection, shrinkage methods (Ridge regression, Lasso). Inference: Post-Lasso and debiased Lasso.
- Extension of the linear model: polynomial regression, regression splines, smoothing splines, local regression, generalized additive models.
- High-Dimensional Instrumental Variables for causal inference, inference for the Average Treatment Effect.
- Tree-Based methods for regression, treatment effects and classification: bagging, random forest, boosting. Causal random forest.
- Analyses of real data sets for causal inference and treatment effect estimation.