v2.6.0 (3588)

PA - C8 - CHI568 : Exploration and Statistical Analysis of Complex Datasets

Domaine > Chimie.


Exploration and statistical analysis of complex datasets

Analysis of environmental samples benefits a lot from the recent developments of molecular ecology and analytical chemistry. These high-throughput methodologies quickly generate an important volume of information to characterize the samples. However, this information is often high dimensional. Moreover, for a robust analysis, multiple samples must be processed and analyzed in combination with each other. Extracting information from these complex data sets is very challenging but is necessary to sum up the information and draw the relevant conclusions. Appropriate statistical were developed to manage this big amount of data. The aim of Exploration and statistical analysis of complex datasets course is to present the issues concerning the analysis of complex datasets as well as different statistical methods that can be used to explore this data. It will provide different notions useful to analyze datasets from environmental samples. The different tools will be illustrated by practical examples.


Teaching staff

Anthony Boulanger, Chief Executive Officer, Greentropism

Olivier Chapleur, Researcher, IRSTEA

Douglas Rutledge, Professor, AgroParisTech

Christophe Cordella, Research Engineer, INRA


Course outline

● Analysis of complex datasets: issues and solutions

- How to treat big data sets?

- Chemometrics and statistical analysis

● Extracting the information of large biological datasets

- Multivariate approaches to reduce the dimensions of data sets and highlight the relevant information.

- Data integration methodologies to link different categories of data obtained on the same set of samples

● Models for spectral data interpretation

- How to make the most of spectroscopy with statistical data?

- Spectral data interpretation, bio-computing, predictive analysis, and learning algorithms

- Utilization for classification and quantification

● Complex data fusion

- How to organize, analyze, gain insight from, and use the data for predictive, design, and operational purposes, such as improving the function of specific engineered bioprocesses?

- Common Components and Specific Weights Analysis (CCSWA or “ComDim”) to take into account the common and complementary information contained in multiple datasets


This module includes 20 hours of courses, 20 hours of tutorial classes.


Level required: Basic knowledge in mathematics and statistics

Language: English

Credits ECTS: 4

Supervisor: Olivier Chapleur

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Environmental Engineering and Sustainability Management Master

Le rattrapage est autorisé (Note de rattrapage conservée)
    L'UE est acquise si note finale transposée >= C
    • Crédits ECTS acquis : 5 ECTS

    La note obtenue rentre dans le calcul de votre GPA.

    Pour les étudiants du diplôme Echanges PEI

    Veuillez patienter