BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074335Z
LOCATION:Osaka Room
DTSTART;TZID=Europe/Stockholm:20220628T153000
DTEND;TZID=Europe/Stockholm:20220628T160000
UID:submissions.pasc-conference.org_PASC22_sess176_pap120@linklings.com
SUMMARY:Distributed-Memory Simulations of Turbulent Flows on Modern GPU Sy
 stems using an Adaptive Pencil Decomposition Library
DESCRIPTION:Paper\n\nDistributed-Memory Simulations of Turbulent Flows on 
 Modern GPU Systems using an Adaptive Pencil Decomposition Library\n\nRomer
 o, Costa, Fatica\n\nThis paper presents a performance analysis of pencil d
 omain decomposition methodologies for three-dimensional Computational Flui
 d Dynamics (CFD) codes for turbulence simulations, on several large GPU-ac
 celerated clusters. The performance was assessed for the numerical solutio
 n of the Navier-Stokes equations in two codes which require the calculatio
 n of Fast-Fourier Transforms (FFT): a tri-periodic pseudo-spectral solver 
 for isotropic turbulence, and a finite-difference solver for canonical tur
 bulent flows, where the FFTs are used in its Poisson solver. Both codes us
 e a newly developed transpose library that automatically determines the op
 timal domain decomposition and communication backend on each system. We co
 mpared the performance across systems with very different node topologies 
 and available network bandwidth, to show how these characteristics impact 
 decomposition selection for best performance. Additionally, we assessed th
 e performance of several communication libraries available on these system
 s, such as OpenMPI, IBM Spectrum MPI, Cray MPI, the NVIDIA Collective Comm
 unication Library (NCCL), and NVSHMEM. Our results show that the optimal c
 ombination of communication backend and domain decompositon is highly syst
 em-dependent, and that the adaptive decomposition library is key in ensuri
 ng efficient resource usage.\n\nDomain: Engineering, Physics
END:VEVENT
END:VCALENDAR