Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery
Bodart, Fanny
Promotor(s) :
Louppe, Gilles
Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23237
Details
| Title : | Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery |
| Translated title : | [fr] Apprentissage par représentation structurée pour la cytométrie : Annotation des cellules et découverte de populations |
| Author : | Bodart, Fanny
|
| Date of defense : | 30-Jun-2025/1-Jul-2025 |
| Advisor(s) : | Louppe, Gilles
|
| Committee's member(s) : | DE VOEGHT, Adrien
Geurts, Pierre
Huynh-Thu, Vân Anh
|
| Language : | English |
| Number of pages : | 78 |
| Keywords : | [fr] Generative AI [fr] Deep Learning [fr] Cytometry [fr] Representation Learning |
| Discipline(s) : | Engineering, computing & technology > Multidisciplinary, general & others |
| Target public : | Researchers Professionals of domain Student |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Degree: | Master en ingénieur civil biomédical, à finalité spécialisée |
| Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Flow cytometry enables the characterization of cell types based on the expression of specific surface and intracellular markers. It is widely used in both research and clinical settings to analyze cell populations. Recent advances in the field now allow the simultaneous measurement of numerous markers, resulting in high-dimensional datasets. Thus, the conventional manual gating approach is no longer suitable for analyzing such complex data. While several machine learning methods have been proposed for automated cell classification, most focus solely on known populations. Conversely, unsupervised methods can discover novel subpopulations but lack interpretability and do not support direct annotation.
In this work, we propose a model capable of addressing these complementary goals within a unified semi-supervised framework. Our approach leverages structured representation learning through a deep generative model to achieve (1) accurate classification of known immune cell populations, (2) discovery of novel subpopulations, and (3) characterization of immune population dynamics across experimental conditions.
We introduce MARVIN - Structured Representation Learning for Cytometry: Cell Annotation and Population Discovery, a mixture-based variational autoencoder with a latent space explicitly structured by cell type. By modeling the latent space as a Gaussian mixture, MARVIN enables both annotation and subpopulation discovery within a unified framework.
To evaluate its performance, we benchmark MARVIN on public cytometry datasets and compare it to Scyan (Blampey et al.) a recent generative model designed for cytometry data. We assess MARVIN's ability to recover masked subpopulations specific to peanut allergy and analyze immune response dynamics before and after allergen exposure. MARVIN reliably identifies relevant novel (unseen) subpopulations and captures their shifts across different experimental conditions.
This dual functionality makes MARVIN a powerful tool for both exploratory research and routine clinical analysis. We plan to apply this framework to investigate immune activation patterns in an ongoing clinical trial focused on vaccine response in immunocompromised patients.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Master Thesis Online


Master_Thesis_FB.pdf
Abstract.pdf