BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074335Z
LOCATION:Rio Room
DTSTART;TZID=Europe/Stockholm:20220629T113000
DTEND;TZID=Europe/Stockholm:20220629T120000
UID:submissions.pasc-conference.org_PASC22_sess163_msa232@linklings.com
SUMMARY:Designing MD Simulations for Modern Heterogenous Parallel Architec
 tures
DESCRIPTION:Minisymposium\n\nDesigning MD Simulations for Modern Heterogen
 ous Parallel Architectures\n\nGray, Páll\n\nCutting-edge molecular dynamic
 s (MD) simulations strive for quickest possible evolution of fixed-sized b
 iological systems, requiring strong-scaling across all available heterogen
 ous parallel resources in modern accelerated systems. We will showcase suc
 h achievements for the popular GROMACS MD package, including co-design wit
 h the NVIDIA CUDA software environment. We expect similar techniques can b
 e applied in other areas, and as such will make this presentation highly a
 ccessible. GROMACS features a hierarchy of heterogenous parallelism aimed 
 at allowing fine-grained concurrency to be scheduled in an asynchronous ma
 nner across disparate hardware components. Task-level CPU parallelism allo
 ws management of the various required calculations, with optional offload 
 to GPUs allowing a tuneable heterogenous runtime schedule. CPU threading a
 llows not only decomposition of calculations across multiple cores, but al
 so parallel execution of GPU scheduling activities. CUDA streams allow con
 current calculations and communications, with asynchronous GPU-direct comm
 unications allowing utilization of fast hardware links while allowing the 
 CPU to progress with other tasks. CUDA threading exploits the highly paral
 lel GPU architecture for compute-intensive parts. And novel asynchronous C
 UDA task graphs are showing promise to further enhance performance through
  facilitating the GPU to be more thoroughly decoupled. We will present dra
 matic performance improvements and discuss plans for further enhancements.
 \n\nDomain: Chemistry and Materials, Computer Science and Applied Mathemat
 ics, Life Sciences, Physics
END:VEVENT
END:VCALENDAR
