Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver

Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver

Smagghe, Clément

Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : `http://hdl.handle.net/2268.2/23356`

Details

Title :	Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver
Author :	Smagghe, Clément
Date of defense :	30-Jun-2025/1-Jul-2025
Advisor(s) :	Geuzaine, Christophe
Committee's member(s) :	Cicuttin, Matteo Louant, Orian Fontaine, Pascal
Language :	English
Number of pages :	69
Keywords :	[en] gpu [en] discontinuous galerkin [en] maxwell [en] multi-gpu [en] supercomputer
Discipline(s) :	Engineering, computing & technology > Computer science
Target public :	Researchers Professionals of domain Student
Complementary URL :	https://gitlab.onelab.info/gmsh/dg/-/tree/clem_dev
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master : ingénieur civil en informatique, à finalité spécialisée en "computer systems security"
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] The Applied and Computational Electromagnetics research group of the University of Liège has previously developed a solver based on the Discontinuous Galerkin (DG) method for solving Maxwell's equations. This master's thesis focuses on its performance analysis and optimization on multi-GPU architectures when solving large-scale problems.

Initial profiling on the Lucia supercomputer revealed that inter-GPU communication, rather than computation, was the primary bottleneck. We addressed this by updating MPI communication from blocking to non-blocking. An extensive set of benchmark tests was conducted on Lucia (NVIDIA GPUs), showing a significant reduced communication overhead and improved scalability --- up to 6 times faster execution in some cases. The same tests were then performed on LUMI (AMD GPUs) by using even more computing power (up to 512 GPUs), confirming the robustness of our improvements across architectures, despite a 30% performance penalty due to hardware differences, notably cache size.

A second and final optimization step was carried out in order to mask communication with computation, which needed the application of the DG operation to be restructured. While this step was only benchmarked on Lucia due to time and resource constraints, results showed another considerable performance improvement of up to 11 times faster execution compared to the solver with blocking communication.

Our work only focused on the solver's resolution phase, leaving the initialization phase --- currently a major performance and memory bottleneck --- largely untouched. Addressing this phase could make it possible to solve even larger problems and support a greater number of GPUs for a given problem in future work.

File(s)

Document(s)

Multi-GPU Galerkin solver(3).pdf
Description: Mémoire complet + remerciements + abstract.
Size: 4 MB
Format: Adobe PDF

Multi-GPU Galerkin solver - Abstract.pdf
Description: Abstract - 1 page.
Size: 37.66 kB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

MASTER THESIS

Performance Optimization of a Multi-GPU Discontinuous Galerkin Solver

Smagghe, Clément

Promotor(s) : Geuzaine, Christophe

Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23356

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : `http://hdl.handle.net/2268.2/23356`