Descriptif
Exploration and statistical analysis of complex datasets
Analysis of environmental samples benefits a lot from the recent developments of molecular ecology and analytical chemistry. These high-throughput methodologies quickly generate an important volume of information to characterize the samples. However, this information is often high dimensional. Moreover, for a robust analysis, multiple samples must be processed and analyzed in combination with each other. Extracting information from these complex data sets is very challenging but is necessary to sum up the information and draw the relevant conclusions. Appropriate statistical were developed to manage this big amount of data. The aim of Exploration and statistical analysis of complex datasets course is to present the issues concerning the analysis of complex datasets as well as different statistical methods that can be used to explore this data. It will provide different notions useful to analyze datasets from environmental samples. The different tools will be illustrated by practical examples.
Teaching staff
Anthony Boulanger, Chief Executive Officer, Greentropism
Olivier Chapleur, Researcher, IRSTEA
Douglas Rutledge, Professor, AgroParisTech
Christophe Cordella, Research Engineer, INRA
Course outline
● Analysis of complex datasets: issues and solutions
- How to treat big data sets?
- Chemometrics and statistical analysis
● Extracting the information of large biological datasets
- Multivariate approaches to reduce the dimensions of data sets and highlight the relevant information.
- Data integration methodologies to link different categories of data obtained on the same set of samples
● Models for spectral data interpretation
- How to make the most of spectroscopy with statistical data?
- Spectral data interpretation, bio-computing, predictive analysis, and learning algorithms
- Utilization for classification and quantification
● Complex data fusion
- How to organize, analyze, gain insight from, and use the data for predictive, design, and operational purposes, such as improving the function of specific engineered bioprocesses?
- Common Components and Specific Weights Analysis (CCSWA or “ComDim”) to take into account the common and complementary information contained in multiple datasets
This module includes 20 hours of courses, 20 hours of tutorial classes.
--------------------------------------------
Level required: Basic knowledge in mathematics and statistics
Language: English
Credits ECTS: 4
Supervisor: Olivier Chapleur
Diplôme(s) concerné(s)
Parcours de rattachement
Format des notes
Numérique sur 20Littérale/grade réduitPour les étudiants du diplôme Echanges PEI
Pour les étudiants du diplôme Environmental Engineering and Sustainability Management
Le rattrapage est autorisé (Note de rattrapage conservée)- Crédits ECTS acquis : 5 ECTS
La note obtenue rentre dans le calcul de votre GPA.