Feedback

Faculté des Sciences appliquées
Faculté des Sciences appliquées
MASTER THESIS

Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver

Download
Smagghe, Clément ULiège
Promotor(s) : Geuzaine, Christophe ULiège
Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23356
Details
Title : Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver
Author : Smagghe, Clément ULiège
Date of defense  : 30-Jun-2025/1-Jul-2025
Advisor(s) : Geuzaine, Christophe ULiège
Committee's member(s) : Cicuttin, Matteo 
Louant, Orian ULiège
Fontaine, Pascal ULiège
Language : English
Number of pages : 69
Keywords : [en] gpu
[en] discontinuous galerkin
[en] maxwell
[en] multi-gpu
[en] supercomputer
Discipline(s) : Engineering, computing & technology > Computer science
Target public : Researchers
Professionals of domain
Student
Complementary URL : https://gitlab.onelab.info/gmsh/dg/-/tree/clem_dev
Institution(s) : Université de Liège, Liège, Belgique
Degree: Master : ingénieur civil en informatique, à finalité spécialisée en "computer systems security"
Faculty: Master thesis of the Faculté des Sciences appliquées

Abstract

[en] The Applied and Computational Electromagnetics research group of the University of Liège has previously developed a solver based on the Discontinuous Galerkin (DG) method for solving Maxwell's equations. This master's thesis focuses on its performance analysis and optimization on multi-GPU architectures when solving large-scale problems.

Initial profiling on the Lucia supercomputer revealed that inter-GPU communication, rather than computation, was the primary bottleneck. We addressed this by updating MPI communication from blocking to non-blocking. An extensive set of benchmark tests was conducted on Lucia (NVIDIA GPUs), showing a significant reduced communication overhead and improved scalability --- up to 6 times faster execution in some cases. The same tests were then performed on LUMI (AMD GPUs) by using even more computing power (up to 512 GPUs), confirming the robustness of our improvements across architectures, despite a 30% performance penalty due to hardware differences, notably cache size.

A second and final optimization step was carried out in order to mask communication with computation, which needed the application of the DG operation to be restructured. While this step was only benchmarked on Lucia due to time and resource constraints, results showed another considerable performance improvement of up to 11 times faster execution compared to the solver with blocking communication.

Our work only focused on the solver's resolution phase, leaving the initialization phase --- currently a major performance and memory bottleneck --- largely untouched. Addressing this phase could make it possible to solve even larger problems and support a greater number of GPUs for a given problem in future work.


File(s)

Document(s)

File
Access Multi-GPU Galerkin solver(3).pdf
Description: Mémoire complet + remerciements + abstract.
Size: 4 MB
Format: Adobe PDF
File
Access Multi-GPU Galerkin solver - Abstract.pdf
Description: Abstract - 1 page.
Size: 37.66 kB
Format: Adobe PDF

Author

  • Smagghe, Clément ULiège Université de Liège > Master ing. civ. inf. fin. spéc. comp. syst. secur

Promotor(s)

Committee's member(s)

  • Cicuttin, Matteo
  • Louant, Orian ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Applied and Computational Electromagnetics (ACE)
    ORBi View his publications on ORBi
  • Fontaine, Pascal ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes informatiques distribués
    ORBi View his publications on ORBi








All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.