Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver
Smagghe, Clément
Promoteur(s) :
Geuzaine, Christophe
Date de soutenance : 30-jui-2025/1-jui-2025 • URL permanente : http://hdl.handle.net/2268.2/23356
Détails
| Titre : | Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver |
| Auteur : | Smagghe, Clément
|
| Date de soutenance : | 30-jui-2025/1-jui-2025 |
| Promoteur(s) : | Geuzaine, Christophe
|
| Membre(s) du jury : | Cicuttin, Matteo
Louant, Orian
Fontaine, Pascal
|
| Langue : | Anglais |
| Nombre de pages : | 69 |
| Mots-clés : | [en] gpu [en] discontinuous galerkin [en] maxwell [en] multi-gpu [en] supercomputer |
| Discipline(s) : | Ingénierie, informatique & technologie > Sciences informatiques |
| Public cible : | Chercheurs Professionnels du domaine Etudiants |
| URL complémentaire : | https://gitlab.onelab.info/gmsh/dg/-/tree/clem_dev |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Diplôme : | Master : ingénieur civil en informatique, à finalité spécialisée en "computer systems security" |
| Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] The Applied and Computational Electromagnetics research group of the University of Liège has previously developed a solver based on the Discontinuous Galerkin (DG) method for solving Maxwell's equations. This master's thesis focuses on its performance analysis and optimization on multi-GPU architectures when solving large-scale problems.
Initial profiling on the Lucia supercomputer revealed that inter-GPU communication, rather than computation, was the primary bottleneck. We addressed this by updating MPI communication from blocking to non-blocking. An extensive set of benchmark tests was conducted on Lucia (NVIDIA GPUs), showing a significant reduced communication overhead and improved scalability --- up to 6 times faster execution in some cases. The same tests were then performed on LUMI (AMD GPUs) by using even more computing power (up to 512 GPUs), confirming the robustness of our improvements across architectures, despite a 30% performance penalty due to hardware differences, notably cache size.
A second and final optimization step was carried out in order to mask communication with computation, which needed the application of the DG operation to be restructured. While this step was only benchmarked on Lucia due to time and resource constraints, results showed another considerable performance improvement of up to 11 times faster execution compared to the solver with blocking communication.
Our work only focused on the solver's resolution phase, leaving the initialization phase --- currently a major performance and memory bottleneck --- largely untouched. Addressing this phase could make it possible to solve even larger problems and support a greater number of GPUs for a given problem in future work.
Fichier(s)
Document(s)
Multi-GPU Galerkin solver(3).pdf
Description: Mémoire complet + remerciements + abstract.
Taille: 4 MB
Format: Adobe PDF
Multi-GPU Galerkin solver - Abstract.pdf
Description: Abstract - 1 page.
Taille: 37.66 kB
Format: Adobe PDF
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.

Master Thesis Online


Tous les fichiers (archive ZIP)