Balancing Durability, Performance, and Interpretability in Unbalanced Data as Fraud Detection
Saulas, Adrien
Promotor(s) : Debruyne, Christophe
Date of defense : 5-Sep-2024/6-Sep-2024 • Permalink : http://hdl.handle.net/2268.2/21035
Details
Title : | Balancing Durability, Performance, and Interpretability in Unbalanced Data as Fraud Detection |
Translated title : | [fr] Équilibrer la durabilité, la performance et l'interprétabilité dans les données déséquilibrées pour la détection de la fraude |
Author : | Saulas, Adrien |
Date of defense : | 5-Sep-2024/6-Sep-2024 |
Advisor(s) : | Debruyne, Christophe |
Committee's member(s) : | Geurts, Pierre
Louppe, Gilles |
Language : | English |
Number of pages : | 111 |
Keywords : | [en] MLOps [en] Machine Learning [en] Fraud detection [en] Imbalance [en] Interpretability |
Discipline(s) : | Engineering, computing & technology > Computer science |
Funders : | Intech S.A |
Target public : | Other |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master : ingénieur civil en science des données, à finalité spécialisée |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] The problem of fraud detection is one of the most discussed topics in the field of
machine learning. This study addresses four key areas essential for a fraud detection
platform: prediction accuracy in imbalanced datasets, interpretability of predictions,
deployment and sustainability of the platform, monetary costs associated with model
errors. To tackle these issues, we first conducted extensive research in the field,
then proposed and evaluated our solutions. We introduce methods such as using a
WCGAN (Wasserstein Conditional Generative Adversarial Network) for sampling
or cost-sensitive learning with new models like Light Gradient Boosting, employing
interpretable models like Explainable Boosting, deploying and automating training
processes with Kubernetes and Kubeflow, and utilizing approaches like thresholding
or tuning metrics that account for monetary costs. Each of these solutions shows
promising results and improves upon existing research in the field.
File(s)
Document(s)
Annexe(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.