Descriptif
We have entered the Big Data Era. The explosion and profusion of available data in a wide range of application domains rise up new challenges and opportunities in a plethora of disciplines – ranging from science and engineering to business and society in general. A major challenge is how to take advantage of the unprecedented scale of data, in order to acquire further insights and knowledge for improving the quality of the offered services, and this is where Data Science comes in capitalizing on techniques and methodologies from data engineering (acquisition, storage, indexing, retrieval, pre-processing, quality assurance,x validation), exploration (statistical profiling, visualization) and machine learning (identifying patterns, correlations, groupings, modeling etc.). This lifecycle is universal spanning all application domains.
The Data Science and Mining - Introduction to Machine Learning class will cover the following aspects:
- The Machine Learning Pipeline
- Data Preprocessing and Exploration
- Feature Selection/Engineering & Dimensionality reduction
- Supervised Learning
- Unsupervised Learning
- Web Mining: recommendations, collaborative filtering, opinion/sentiment analysis, web advertising & algorithms.
- 
Learning from graphs: ranking in graphs, ranked lists comparison, learning to rank, community detection and graph clustering, applications 
Logistics
1. The course will take place on Mondays afternoon from 19/09 for 9 weeks until 28/11, and will be divided into nine 4-hour sessions (+ final exam) of teaching (14:00 - 16:00) in amphi Faurre and the lab session (16:15 -18:15) will be split between three lab rooms (Amphi Grégory, Amphi Painlevé, Amphi Sauvy) - the split will be based on students surname initial letter).
2. Due to the high number of enrolled students, labs will not take place in rooms equipped with workstations. Therefore, the students are expected to come in class with their laptops (preferably with a Unix environment like Linux or Mac OS X for compatibility reasons). As for software, we will be using Python among others (to be installed locally on the laptops).
We will be using the e-learning platform Moodle to share the course materials (slides and lab statements) and to upload assignments. Therefore it is imperative for students to enroll (available once logged in in Moodle with their @polytechnique.edu account using the enrollment key specified in the welcoming email they received). Additionally, the forum should be used to communicate with the staff following the guidelines
The Data Science and Machine Learning team 2016.
Detailed syllabus of the course*
*minor chaneges may apply during course evolution.
Machine Learning Pipeline 
- Task and Metrics
- Models and parameters estimation (ML, Optimization, Penalization)
- Case study: logistic regression, penalization
Data Preprocessing and Exploration
- Distance/similarity measures
- Data normalization/ standardization/cleaning/ missing values
- Dimensionality reduction (SVD, PCA, MDS)
Supervised Learning
- Generative vs. non generative
- N Bayes vs. knn-perceptron
- Logistic regression
- Trees (Decision, Extra trees)
Feature Selection/Engineering & Dimensionality reduction
- Feature Selection, feature engineering, wrapper methods
- Dimensionality reduction (Spectral methods - PCA, MDS, and applications)
- Linear Discriminant Analysis
- Non – Linear DR
Over fitting & Regularization
- Model validation, Resampling methods
- Over-fitting, penalization, bias-variance tradeoff, Cross Validation (CV), Regularisation (e.g., Lasso, Ridge, SVM)
Supervised Learning
- Classification Regression trees
- Bagging, Boosting
- Ensembling methods (Adaboost, Random Forest)
Unsupervised Learning
- Principles of Clustering, K-means, Hierarchical clustering, SOMs, Association Rules
Bayesian Learning
- Introduction to Bayesian learning (EM)
Diplôme(s) concerné(s)
- Innovation Technologique : ingénierie et entrepreneuriat
- M1 Informatique - Voie Jacques Herbrand - X
- Cybersecurity : Threats & Defenses
- Internet of Things : Innovation and Management Program (IoT)
- Artificial Intelligence and Advanced Visual Computing
- M1 Innovation, Entreprise, et Société - Voie Innovation technologique
- Diplôme d'ingénieur de l'Ecole polytechnique
Parcours de rattachement
Format des notes
Numérique sur 20Littérale/grade réduitPour les étudiants du diplôme Artificial Intelligence and Advanced Visual Computing
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
La note obtenue rentre dans le calcul de votre GPA.
Pour les étudiants du diplôme Innovation Technologique : ingénierie et entrepreneuriat
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
La note obtenue rentre dans le calcul de votre GPA.
Pour les étudiants du diplôme M1 Innovation, Entreprise, et Société - Voie Innovation technologique
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
La note obtenue rentre dans le calcul de votre GPA.
Pour les étudiants du diplôme Diplôme d'ingénieur de l'Ecole polytechnique
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 5 ECTS
La note obtenue rentre dans le calcul de votre GPA.
Pour les étudiants du diplôme M1 Informatique - Voie Jacques Herbrand - X
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
Pour les étudiants du diplôme Cybersecurity : Threats & Defenses
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
Pour les étudiants du diplôme Internet of Things : Innovation and Management Program (IoT)
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 4 ECTS
