Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery
Bodart, Fanny
Promoteur(s) :
Louppe, Gilles
Date de soutenance : 30-jui-2025/1-jui-2025 • URL permanente : http://hdl.handle.net/2268.2/23237
Détails
| Titre : | Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery |
| Titre traduit : | [fr] Apprentissage par représentation structurée pour la cytométrie : Annotation des cellules et découverte de populations |
| Auteur : | Bodart, Fanny
|
| Date de soutenance : | 30-jui-2025/1-jui-2025 |
| Promoteur(s) : | Louppe, Gilles
|
| Membre(s) du jury : | DE VOEGHT, Adrien
Geurts, Pierre
Huynh-Thu, Vân Anh
|
| Langue : | Anglais |
| Nombre de pages : | 78 |
| Mots-clés : | [fr] Generative AI [fr] Deep Learning [fr] Cytometry [fr] Representation Learning |
| Discipline(s) : | Ingénierie, informatique & technologie > Multidisciplinaire, généralités & autres |
| Public cible : | Chercheurs Professionnels du domaine Etudiants |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Diplôme : | Master en ingénieur civil biomédical, à finalité spécialisée |
| Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] Flow cytometry enables the characterization of cell types based on the expression of specific surface and intracellular markers. It is widely used in both research and clinical settings to analyze cell populations. Recent advances in the field now allow the simultaneous measurement of numerous markers, resulting in high-dimensional datasets. Thus, the conventional manual gating approach is no longer suitable for analyzing such complex data. While several machine learning methods have been proposed for automated cell classification, most focus solely on known populations. Conversely, unsupervised methods can discover novel subpopulations but lack interpretability and do not support direct annotation.
In this work, we propose a model capable of addressing these complementary goals within a unified semi-supervised framework. Our approach leverages structured representation learning through a deep generative model to achieve (1) accurate classification of known immune cell populations, (2) discovery of novel subpopulations, and (3) characterization of immune population dynamics across experimental conditions.
We introduce MARVIN - Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery, a mixture-based variational autoencoder with a latent space explicitly structured by cell type. By modeling the latent space as a Gaussian mixture, MARVIN enables both annotation and subpopulation discovery within a unified framework.
To evaluate its performance, we benchmark MARVIN on public cytometry datasets and compare it to Scyan (Blampey et al.) a recent generative model designed for cytometry data. We assess MARVIN's ability to recover masked subpopulations specific to peanut allergy and analyze immune response dynamics before and after allergen exposure. MARVIN reliably identifies relevant novel (unseen) subpopulations and captures their shifts across different experimental conditions.
This dual functionality makes MARVIN a powerful tool for both exploratory research and routine clinical analysis. We plan to apply this framework to investigate immune activation patterns in an ongoing clinical trial focused on vaccine response in immunocompromised patients.
Fichier(s)
Document(s)
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.

Master Thesis Online


Master_Thesis_FB.pdf
Abstract.pdf