BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074334Z
LOCATION:Singapore Room
DTSTART;TZID=Europe/Stockholm:20220627T143000
DTEND;TZID=Europe/Stockholm:20220627T150000
UID:submissions.pasc-conference.org_PASC22_sess118_msa113@linklings.com
SUMMARY:Performance Portable Modernizations for the Albany Land Ice Model
DESCRIPTION:Minisymposium\n\nPerformance Portable Modernizations for the A
 lbany Land Ice Model\n\nCarlson, Watkins, Tezaur\n\nTo accommodate the per
 formance needs for the new generation of supercomputers, Sandia's Albany L
 and-Ice (ALI) code base is being refactored to use the Kokkos performance 
 portability framework. We identified the main performance bottleneck for t
 he finite element assembly on GPU architectures as inefficient memory acce
 ss for boundary conditions. By reworking how boundary conditions are repre
 sented in Albany, all memory accesses on the GPU are now properly coalesce
 d, resulting in performant GPU kernels. The Kokkos refactor alone led to a
  finite element assembly speedup of 1.6x on 64 KNL CPU processors, and 4.5
 x on 64 V100 GPUs on a Greenland ice sheet problem solving the first-order
  Stokes equations and discretized using a variable resolution 1km-10km mes
 h. With the reworking of boundary conditions, we achieved an additional 1.
 5x speedup on V100 GPUs and an approximately 5.5x reduction in memory usag
 e for a 1km-7km mesh. The overall performance improvements that came from 
 this work are a large step towards the goal of full end-to-end ALI GPU run
 s on the Summit supercomputer. Additionally, we present preliminary result
 s for using automatic performance tuning to improve the performance of the
  linear solve phase on GPU architectures.\n\nDomain: Computer Science and 
 Applied Mathematics, Engineering
END:VEVENT
END:VCALENDAR
