Harnessing artificial intelligence to predict quantotypic property of peptides in targeted bottom-up proteomics
Walraff, Jimmy
Promotor(s) :
Huynh-Thu, Vân Anh
;
Pierre, Nicolas
Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23220
Details
| Title : | Harnessing artificial intelligence to predict quantotypic property of peptides in targeted bottom-up proteomics |
| Translated title : | [fr] Utilisation de l’intelligence artificielle pour prédire la propriété quantotypique des peptides en protéomique ciblée bottom-up |
| Author : | Walraff, Jimmy
|
| Date of defense : | 30-Jun-2025/1-Jul-2025 |
| Advisor(s) : | Huynh-Thu, Vân Anh
Pierre, Nicolas
|
| Committee's member(s) : | Geurts, Pierre
Louppe, Gilles
|
| Language : | English |
| Number of pages : | 87 |
| Keywords : | [en] Proteomics [en] Quantotypic Peptides [en] Machine Learning [en] Deep Learning [en] Transfer Learning [en] Self-Training |
| Discipline(s) : | Engineering, computing & technology > Computer science |
| Target public : | Researchers Professionals of domain |
| Complementary URL : | https://github.com/Ziboce/TFE-proteomics |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Degree: | Master : ingénieur civil en science des données, à finalité spécialisée |
| Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Proteins are essential biomolecules that perform a wide range of functions in all living organisms. Proteomics, the large-scale study of proteins, aims to characterize their structure, function, and abundance in biological systems. Within this field, one major challenge is the accurate quantification of proteins through their peptides. In particular, there is a lack of tools capable of predicting the quantotypic property of peptides, a task which could reduce reliance on expensive and time-consuming experiments. This thesis explores the feasibility of using artificial intelligence to predict the quantotypic property of peptides based solely on their amino acid sequences. We first applied supervised learning techniques on a labeled dataset, testing classical machine learning models (Random Forest, XGBoost) and deep learning architectures (Multilayer Perceptron, Bidirectional Long Short-Term Memory). Both training from scratch and transfer learning strategies were evaluated. Transfer learning used pre-trained models on peptide-related tasks (AlphaPeptDeep) and protein-related tasks (Evolutionary Scale Modeling). Two transfer learning approaches were compared: fine-tuning and feature extraction. To deal with severe class imbalance in the data, cost-sensitive learning and Random Oversampling were applied. In the second part of the thesis, we explored self-training to leverage unlabeled data. We tested three pseudo-labeling strategies: threshold-based, proportion-based, and optimal thresholding. Additionally, we experimented with both hard and soft pseudo-labels, and introduced a dual loss function to further penalize the misclassification of true labels. Despite the modest results, this work represents a first step towards the development of predictive tools for the quantotypic property of peptides and provides valuable insights for future research in this field.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Master Thesis Online


TFE_WALRAFF_JIMMY.pdf