Harnessing artificial intelligence to predict quantotypic property of peptides in targeted bottom-up proteomics
Walraff, Jimmy
Promoteur(s) :
Huynh-Thu, Vân Anh
;
Pierre, Nicolas
Date de soutenance : 30-jui-2025/1-jui-2025 • URL permanente : http://hdl.handle.net/2268.2/23220
Détails
| Titre : | Harnessing artificial intelligence to predict quantotypic property of peptides in targeted bottom-up proteomics |
| Titre traduit : | [fr] Utilisation de l’intelligence artificielle pour prédire la propriété quantotypique des peptides en protéomique ciblée bottom-up |
| Auteur : | Walraff, Jimmy
|
| Date de soutenance : | 30-jui-2025/1-jui-2025 |
| Promoteur(s) : | Huynh-Thu, Vân Anh
Pierre, Nicolas
|
| Membre(s) du jury : | Geurts, Pierre
Louppe, Gilles
|
| Langue : | Anglais |
| Nombre de pages : | 87 |
| Mots-clés : | [en] Proteomics [en] Quantotypic Peptides [en] Machine Learning [en] Deep Learning [en] Transfer Learning [en] Self-Training |
| Discipline(s) : | Ingénierie, informatique & technologie > Sciences informatiques |
| Public cible : | Chercheurs Professionnels du domaine |
| URL complémentaire : | https://github.com/Ziboce/TFE-proteomics |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Diplôme : | Master : ingénieur civil en science des données, à finalité spécialisée |
| Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] Proteins are essential biomolecules that perform a wide range of functions in all living organisms. Proteomics, the large-scale study of proteins, aims to characterize their structure, function, and abundance in biological systems. Within this field, one major challenge is the accurate quantification of proteins through their peptides. In particular, there is a lack of tools capable of predicting the quantotypic property of peptides, a task which could reduce reliance on expensive and time-consuming experiments. This thesis explores the feasibility of using artificial intelligence to predict the quantotypic property of peptides based solely on their amino acid sequences. We first applied supervised learning techniques on a labeled dataset, testing classical machine learning models (Random Forest, XGBoost) and deep learning architectures (Multilayer Perceptron, Bidirectional Long Short-Term Memory). Both training from scratch and transfer learning strategies were evaluated. Transfer learning used pre-trained models on peptide-related tasks (AlphaPeptDeep) and protein-related tasks (Evolutionary Scale Modeling). Two transfer learning approaches were compared: fine-tuning and feature extraction. To deal with severe class imbalance in the data, cost-sensitive learning and Random Oversampling were applied. In the second part of the thesis, we explored self-training to leverage unlabeled data. We tested three pseudo-labeling strategies: threshold-based, proportion-based, and optimal thresholding. Additionally, we experimented with both hard and soft pseudo-labels, and introduced a dual loss function to further penalize the misclassification of true labels. Despite the modest results, this work represents a first step towards the development of predictive tools for the quantotypic property of peptides and provides valuable insights for future research in this field.
Fichier(s)
Document(s)
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.

Master Thesis Online


TFE_WALRAFF_JIMMY.pdf