BIG DATA and MACHINE LEARNING in Econometrics
Instructor: Anna Simoni
The goal of this course is to give an introduction to High Dimensional models and Machine Learning in Econometrics. We will analyze methods for making both prediction and causal inference in economic settings where the effects of counterfactual policies are of interest, like the effects of introducing a new product, of advertisement, of implementing a government policy. The methods that will be presented in the course are well suited to datasets with many observations and/or many covariates and will be based on supervised and unsupervised machine learning approaches to model selection and prediction. We will also see how standard methods in machine learning have to be modified and extended to adapt them to causal inference and provide statistical theory for hypothesis testing.
All along the course, the different models will be illustrated through applications and case studies. The applications will be developed by using as statistical software package either R or Matlab.
- Introduction to Statistical Learning, review of linear regression and least squares. Introduction to R (and maybe to Matlab).
- Linear Regression Model with many covariates: subset selection, shrinkage methods (Ridge regression, Lasso), dimension reduction methods (PCA and sparse PCA). Inference: Post-Lasso and debiased Lasso.
- Extension of the linear model: polynomial regression, regression splines, smoothing splines, local regression, generalized additive models.
- High-Dimensional Instrumental Variables for causal inference.
- Tree-Based methods for regression, treatment effects and classification: bagging, random forest, boosting.
- Support Vector Machine.
- Case Studies based on research articles. Some examples include: impact of internet and social media on news, search advertising, estimation of treatment effects.
- Estimation of large covariance and precision matrices with applications in portfolio management and risk assessment.
G. James, D. Witten, T. Hastie and R. Tibshirani, “An Introduction to Statistical Learning with applications in R”, 2013, Springer.
Other references to journal articles will be provided during the course.