BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074335Z
LOCATION:Samarkand Room
DTSTART;TZID=Europe/Stockholm:20220629T143000
DTEND;TZID=Europe/Stockholm:20220629T150000
UID:submissions.pasc-conference.org_PASC22_sess141_msa280@linklings.com
SUMMARY:Scalable Multi-FPGA Design for Distributed Processing of Scientifi
 c Workloads in Shallow Water Simulations
DESCRIPTION:Minisymposium\n\nScalable Multi-FPGA Design for Distributed Pr
 ocessing of Scientific Workloads in Shallow Water Simulations\n\nFaj\n\nRe
 presenting data in unstructured meshes is a common approach to discretize 
 complex geometries for scientific simulations. Since adjacency information
  must be explicitly defined in such meshes, efficiently processing them re
 quires high memory bandwidth, in order to keep computational resources con
 stantly occupied. For shallow water simulations, this challenge has been a
 ddressed on FPGAs before, by utilizing low latency on-chip memory resource
 s to improve computational throughput. However, on different levels of com
 putational complexity the limited availability of these resources can eith
 er prevent high utilization of compute resources or limit maximum processa
 ble mesh sizes. The presented work aimed to overcome these limitations thr
 ough partitioned processing of meshes. A design concept, that enables main
 taining a continuous throughput by integrating communication of halo eleme
 nts into the data stream using OpenCL channels for inter- and intra-FPGA c
 ommunication, was developed and evaluated with an accurate performance mod
 el. The elaborated design enabled a distribution of the simulation workloa
 d over up to 10 Intel Stratix 10 FPGAs on BittWare 520N accelerator cards 
 of the HPC cluster Noctua. A peak performance of up to 3119 GFLOPs was rea
 ched, and further scaling possibilities are explored in ongoing research.\
 n\nDomain: Chemistry and Materials, Climate, Weather and Earth Sciences, C
 omputer Science and Applied Mathematics
END:VEVENT
END:VCALENDAR
