v2.3.2 (2676)

# PA - C8 - DS-télécom-1 : Optimization for Data Science

## Descriptif

Syllabus
Modern machine learning heavily relies on optimization tools, typically to minimize the so called loss functions on training sets. The objective of this course is to cover the necessary theorical results of convex
optimization as well as the computational aspects. This course contains a fair amount of programming as all algorithms presented will be implemented and tested on real data. At the end of the course, students
shall be able to decide what algorithm is the most adapted to the machine learning problem given the size the data (number of samples, sparsity, dimension of each observation).

Evaluation

Project. Implementation of solvers for a machine learning model. 30% of final grade.

Project 2017-2018: Implement OCSVM with cvxopt package, derive dual and implement dual solution (without intercept) with prox grad and coordinate descent (SDCA).

Project 2018-2019: Implement Proportional Odds model with L-BFGS, proximal gradient (derive convex formulation, justify convexity).

Exam. 3h Exam (40% of the final grade).

To sum up :

Eval type

% final

Lab 30% Students have 1 week to upload solutions on moodle

Project 30% To upload on moodle by Jan 20 2020
Examen 40% Final course

Detailed Program

16 / 09 [2.00 hours de cours + 1.50 hours exe, Robert]

Foundations of convex optimization: gradient, sub-differential,strong convexity, conditionning. Examples with linear regression and logistic regression for which we compute gradients, Hessian, Lipchitz constants, etc.
-Exercise list : Convexity and smoothness, Ridge regression and gradient descent

23 / 09 [2.00 hours de cours + 1.50 hours exe, Robert] First
order algorithms: gradient descent, proximal operator, proximal gradient descent, convergence proofs in the smooth and smooth + strongly convex case. Give exercises.
-Exercise list : Proximal Operator

30 / 09 [3.50 hours Lab Robert + Alex + TAs] Lab (grad + prox grad) in jupyter notebooks. Lab to be graded.
-Lab1.ipynb

07 / 10 [2.0 hours de cours + 1.50 hours exe, Robert] Stochastic algorithms, SGD, with and without momoment convex, strongly convex, with proofs.
-Exercise list : SGD and CD for ridge regression

14 / 10 [1.50 hours de cours + 2.0 hours Lab, Robert] Stochastic variance reduced methods (SAGA, SVRG) with numerical tricks:
lazy updating, sparse tricks.
-Lab SGD + variants (not graded)
-Exercise list : Variance reduction

21 / 10 [3.50 hours Lab Robert + Alex + TAs] Lab on all methods seen so far (GD, prox GD, Fast prox GD, SGD, SAG, SVRG, SDCA). Lab to be graded
-Lab2.ipynb

18 / 11 [2.00 hours de cours + 1.50 hours Lab, Robert] Online variance reduction and scale invariant methods (Natural gradient descent, ADagrad, Adam, randomized Newton)

25 / 11 [3.50 hours, Alex] Part I: Coordinate descent algorithms

02 / 12 [1.00 hours de cours + 2.5 hours Lab, Alex] Part II:
Coordinate descent algorithms + Lab: coordinate descent implementation on logistic and ridge + Project introduction

09 / 12 [2.50 hours lecture + 1.0 hours Lab, Alex] Solvers for quadratic functions (SVD, woodbury matrix inversion lemma), conjuguate gradient (dense vs sparse data). Line search methods.

16 / 12 [2 hours + 1.5 hours Lab, Alex] Second order methods:
Newton, Quasi-Newton (BFGS, L-BFGS). Illustrate convergence problem of Newton, initialisation, construction of BFGS et L-BFGS.

06 / 01 [3.5 hours, Alex] non-convex and beyond convexity (adaptive Lasso / SCAD), Frank-Wolfe.

27 / 01 Exam (3h?)

Book 1. Boyd & Vandenberghe: Convex Optimization. Chapters 2, 3 and 4 for a revision on convexity and chapter 9 for a revision on unconstrained optimization.

freely available here:
http://web.stanford.edu/~boyd/cvxbook/

Book 2. Shalev-Shwartz & Ben-David: Understanding Machine Learning, from Theory to Algorithms. Chapters 1 and 2 for a frequentist introduction to Machine Learning.

freely available here:
https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/

Book 3. Bubeck: Convex Optimization: Algorithms and Complexity. Chapter 6 for additional proofs for stochastic gradient methods including SGD and SVRG.

freely available here: http://sbubeck.com/book.html
Paper 1. Amir Beck and Marc Teboulle (2009), SIAM J. Imaging Sciences, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems.
freely available here:

https://people.rennes.inria.fr/Cedric.Herzet/Cedric.Herzet/Sparse_Seminar/Entrees/2012/11/12_A_Fast_Iterative_Shrinkage-Thresholding_Algorithmfor_Linear_Inverse_Problems_(A._Beck,_M._Teboulle)_files/Breck_2009.pdf

Paper 2. RMG et all (2019), Proceedings of Machine Learning Research, Volume 97, SGD: general analysis and improved rates

freely available here:
http://proceedings.mlr.press/v97/qian19b/qian19b.pdf

Numérique sur 20