Exploration of machine learning methods for genomic selection in cattle
Hervers, Florent
Promotor(s) :
Geurts, Pierre
;
Druet, Tom
Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23245
Details
| Title : | Exploration of machine learning methods for genomic selection in cattle |
| Translated title : | [fr] Exploration des méthodes de machine learning pour la sélection génomique chez les bovins |
| Author : | Hervers, Florent
|
| Date of defense : | 30-Jun-2025/1-Jul-2025 |
| Advisor(s) : | Geurts, Pierre
Druet, Tom
|
| Committee's member(s) : | Huynh-Thu, Vân Anh
Phillips, Christophe
Van Steen, Kristel
|
| Language : | English |
| Number of pages : | 78 |
| Keywords : | [en] Machine Learning [en] Deep Learning [en] Genomic selection [en] Artificial intelligence |
| Discipline(s) : | Engineering, computing & technology > Computer science |
| Target public : | Researchers Professionals of domain Other |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Degree: | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
| Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Genomic selection is a method developed to help breeders select the best parents to have a progeny that exhibits good targeted characteristics. This selection is performed based on the genotype of the animals. The field of genomic selection consists of designing a model that can be trained based on a training population that is able to predict the chosen phenotypes based only on the genotypes.
Machine learning is a field that has become more and more popular during the last decade. These algorithms are used to build complex models based on a large amount of data. These methods have provided very powerful models in a large variety of fields, from image segmentation to the processing of natural language.
As more and more animals are genotyped, the amount of available data becomes big enough to be able to consider machine learning methods. Some architectures trained on cattle datasets were proposed in the literature, but there are still many possible algorithms to be evaluated. In this thesis, we will explore a wide range of machine learning methods from the linear Ridge regression or XGBoost to more complex neural networks, like convolutional neural networks or transformers. The objective is to find models that perform better than the state-of-the-art GBLUP linear model, and perform some experiments on the models to better understand how to apply the machine learning methods for genomic selection.
From all the models we have trained, no model had better performance than the GBLUP models, but some models, the MLP, the ridge regression, and the SVM, reached similar performances. The experiments also show interesting results. The usage of maximum or average pooling layers seems to decrease the performance of the convolutional neural network. Models trained to predict several phenotypes at the same time do reach the same performance as the models trained on a single phenotype, which can reduce the time required to develop a model. Indeed, only a single architecture would have to be tuned instead of one for every phenotype. The impact of the size of the training set was studied. It shows that using more data could increase the performance of the different architectures. Finally, some suggestions are proposed for future research on the usage of machine learning methods for genomic selection.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Master Thesis Online


All files (archive ZIP)
TFE_Hervers.pdf