v2.6.0 (3760)

PA - C1 - INF666C : Big Data / AMS607.6

Domaine > Informatique.

Descriptif

Course Objective

The course aims to familiarize students with advanced machine learning and data mining methods towards design and development of solutions for data sets that are characterized by complexity and large volume (Bigdata).

Real world cases will be presented in the area of Web and social media/networks.

Course contents

Data Preprocessing: Linear and nonlinear dimensionality reduction, spectral methods, Feature selection, Cross-validation.

Supervised learning: Linear Regression, Support vector machines (SVMs),  Unsupervised learning: Gaussian Mixture models, EM algorithm, Spectral Clustering.

Learning in Graphs: ranking algorithms, evaluation measures, degeneracy and community mining methods.

Text Mining
Feature extraction (measures), Indexing (pros and cons) with regards to bigtable.,Retrieval functions (tf, idf, BM25 intuition etc), Adhoc retrieval – Filtering,  classification

Bigdata
An introduction (Hadoop, Mapreduce),No SQL databases,algorithms in Mapreduce

Graph Mining & Community Evaluation algorithms - applications in social networks.

References

- Bayesian Reasoning and Machine Learning,  David Barber, University College London, Cambridge University Press, ISBN:9780521518147, Publication date:February 2012

- Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer, 1st ed. 2006    2006, XX, 740 p, ISBN 978-0-387-31073-2

Diplôme(s) concerné(s)

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Diplôme d'ingénieur de l'Ecole polytechnique

Le rattrapage est autorisé

    Pour les étudiants du diplôme Echanges PEI

    Veuillez patienter