Biomedical Text Classification Using LSTM, GRU and Bahdanau Attention
Mvomo Eto, Wilfried
Promotor(s) :
Ittoo, Ashwin
Date of defense : 8-Sep-2025/9-Sep-2025 • Permalink : http://hdl.handle.net/2268.2/24931
Details
| Title : | Biomedical Text Classification Using LSTM, GRU and Bahdanau Attention |
| Translated title : | [fr] Classification de textes biomédicaux à l'aide des réseaux neuronaux LSTM, GRU et Bahdanau Attention |
| Author : | Mvomo Eto, Wilfried
|
| Date of defense : | 8-Sep-2025/9-Sep-2025 |
| Advisor(s) : | Ittoo, Ashwin
|
| Committee's member(s) : | Geurts, Pierre
Huynh-Thu, Vân Anh
Singh, Akash
|
| Language : | English |
| Number of pages : | 65 |
| Keywords : | [en] Biomedical Text Classification [en] Natural Language Processing [en] Deep Learning [en] Few-Shot Learning [en] SMOTE (Synthetic Minority Over-sampling Technique) [en] Class Weighting |
| Discipline(s) : | Engineering, computing & technology > Civil engineering |
| Target public : | Researchers Professionals of domain Student General public Other |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Degree: | Master en science des données, à finalité spécialisée |
| Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] This research evaluates the performance of deep learning models—Gated Recurrent Unit (GRU)
and Long Short-Term Memory (LSTM), with Bahdanau attention added—for biomedical text classification using two datasets: the Memorial Sloan Kettering Cancer Center (MSKCC ) dataset for binary classification of genetic mutations and the Medical Abstracts dataset for multi-class disease categorization. The experiments showed that GRU models generally offer better training efficiency and balanced accuracy than LSTM models, and that class weighting proved more effective than Synthetic Minority Over-sampling Technique (technique for oversampling minority classes) (SMOTE) in handling class imbalance. While few-shot learning remains challenging, models using Biomedical Natural Language Processing (BioMedNLP) contextual embeddings combined with attention mechanisms demonstrated promising generalization, particularly in low-resource scenarios. These findings support the development of more robust and equitable Natural Language Processing (NLP) systems for biomedical applications.
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Master Thesis Online


s226625.pdf