Semilagrangian Hybrid Kinetic/Driftkinetic Code for the Studying of Fusion Plasmas

Modeling of the tokamak edge plasma is one of the most important problems we have to solve to achieve understanding of physics, taking place in the device. A lot of currently existing and well known codes used by the community employ gyrokinetic system of equations. This is a framework to resolve kinetic equations on a reduced 5D space, applicable for charged particles moving in a strong background magnetic field and valid until the phenomena scale gets as small as the Larmor radius. Presence of steep gradients at the edge region prevents us from using GK models in their present state. One of the ways to avoid this complication is to use a fully kinetic 6D framework. However, the immense computational cost of such a direct approach makes it ill-suited for longtime simulations. Here we discuss the hybrid framework and its implementation in the ssV: a semi Lagrangian code with fully kinetic ions and driftkinetic electrons to completely resolve ion physics and save computational resources on electrons, while retaining the most important kinetic effects.

Author(s): Aleksandr Mustonen (Max Planck Institute for Plasma Physics)

Domain: Physics


Robust Decision-Making under Risk and Ambiguity

Economists often estimate economic models on data and use the point estimates as a stand-in for the truth when studying the model’s implications for optimal decision-making. This practice ignores model ambiguity, exposes the decision problem to misspecification, and ultimately leads to post-decision disappointment. Using statistical decision theory, we develop a framework to explore, evaluate, and optimize robust decision rules that explicitly account for estimation uncertainty. We show how to operationalize our analysis by studying robust decisions in a stochastic dynamic investment model in which a decision-maker directly accounts for uncertainty in the model’s transition dynamics.

Author(s): Maximilian Blesch (Humboldt University Berlin), and Philipp Eisenhauer (University of Bonn)

Domain: Humanities and Social Sciences


Pylspack: Fast Parallel Algorithms, Data Structures and Software for Sparse Matrix Sketching, Column Subset Selection, Regression and Leverage Scores

In recent work, we developed novel parallel algorithms and data structures and software implementations for three fundamental operations in Numerical Linear Algebra: (i) matrix sketching, (ii) computation of the Gram matrix and (iii) computation of the squared row norms of the product of two matrices. This presentation focuses on the ubiquitous Gaussian and CountSketch random projections, as well as their combination. We present the data structures for storing such random projection matrices, that are memory efficient and fully parallelizable both to construct and to multiply with a dense or sparse input matrix. We show how these results can applied to other important problems, namely column subset selection, least squares regression and leverage scores estimations. We also present details of our publicly available implementation (https://github.com/IBM/pylspack), the Pylspack Python package, whose core is written in C++ and paralellized with OpenMP. We show that our implementations outperform existing state-of-the-art libraries, namely the corresponding implementations from libskylark (https://xdata-skylark.github.io/libskylark/) and scikit-learn (https://scikit-learn.org) for the same tasks. Pylspack is fully compatible with standard numerical packages like SciPy and NumPy and is easily obtained via a single command: pip install git+https://github.com/IBM/pylspack.

Author(s): Aleksandros Sobczyk (IBM Research, ETH Zurich), and Efstratios Gallopoulos (University of Patras)

Domain: Computer Science and Applied Mathematics


Scaling the Plasma Simulation while Conserving the Mass: A Massively-Parallel Semi-Lagrangian Solver with the Sparse Grid Combination Technique

Grid-based direct plasma physics simulations suffer the curse of dimensionality in compute time and memory complexity, making the simulation of modern fusion devices extremely expensive and lengthy. Consequently, the curse also applies to the Semi-Lagrangian code selalib, which solves the 6-dimensional Vlasov-Poisson equation at high efficiency and scalability while conserving the plasma mass. The sparse grid combination technique can alleviate the curse of dimensionality, but former approaches have not respected the conservation of solver invariants such as mass. To overcome this limitation, the massively-parallel distributed combination technique code DisCoTec was extended to include two mass-preserving schemes, based on full weighting and biorthogonal wavelets. Our poster introduces the mass-conserving approach. It compares the DisCoTec+selalib solution with mass-conserving hierarchical functions to the standard hat function approach as well as the monolithic selalib solver on a full grid. Results are shown for a plasma two-stream instability in 6D. The full weighting and biorthogonal basis functions not only conserve the mass, but also stabilize the solution. This comes at a run time cost, since more data needs to be communicated. However, the extra parallelism introduced by the combination technique is not affected, allowing to still scale up to 8192 worker processes on Hawk.

Author(s): Theresa Pollinger (University of Stuttgart), Katharina Kormann (Max Planck Institute for Plasma Physics), and Dirk Pflüger (University of Stuttgart)

Domain: Computer Science and Applied Mathematics


A Circular Harmonic Oscillator Basis for Image Compression

Polar coordinates are frequently used to transform 2D images appearing in 4D scanning transmission electron microscopy (4D-STEM) as the dominant feature of the ronchigram is a central spot where the undeflected electron beam hits the detector. The information of interest resides in the deviations from a circular shape of the spot.
The function basis of the quantum mechanical harmonic oscillator consists of Hermite polynomials and a Gaussian envelope function for the one-dimensional problem. For the two-dimensional isotropic problem, the basis can be represented either as a Cartesian product of two 1D basis functions or in polar coordinates. A unitary transformation connects both representations.
To allow fast and affordable compression of STEM images, we incorporate the Cartesian product representation as it leads to two successive matrix-matrix multiplications. This compression method is particularly suitable for single-side-band (SSB) ptychography.
We present the explicit shape of the associated radial functions of a circular harmonic oscillator and compression factors in relation to computational costs for a typical SSB ptychography application.

Author(s): Paul F Baumeister (JSC, Forschungszentrum Jülich), Arya Bangun (ER-C, Forschungszentrum Jülich), and Dieter Weber (ER-C, Forschungszentrum Jülich)

Domain: Computer Science and Applied Mathematics


Enabling Ab-Initio Molecular Dynamics at the Exascale with the CP2K Software Package

Recent efforts have been made to prepare the CP2K software package for exascale computing. Highly accurate electronic structure methods such as double-hybrid density functional theory (DHDFT) have been implemented, allowing for the simulation of large and periodic systems. These high-level methods exhibit a high computational complexity, which have limited their application in the past. Thanks to CP2K’s implementation scalability to hundreds of GPU nodes, such calculations are now feasible. DHDFT can be used to generate quality training data for machine learning models, eventually leading to highly accurate and affordable molecular dynamics simulations.

Author(s): Augustin Bussy (University of Zurich), Frederick Stein (University of Zurich), and Juerg Hutter (University of Zurich)

Domain: Chemistry and Materials


Welcome to a new World, through Heroic Journeys: Heroine's Learning Journey applied to the Machine Learning, Mathematics and Ethics course.

In recent years, organizations such as UNESCO and the UN have been developing policies in line with the Sustainable Development Goals, namely Goal 4 - Quality Education and Goal 5 - Gender Equality.
The development process of "Heroine's Learning Journey" was built to be a motivation framework that helps young women overcome the challenges that are present in STEM courses. Our target audience is young women, aged between 15 and 21.The first application is in a Machine Learning with Mathematics and Ethics course on the IST MOOC platform.
The goal is to help young people acquire the basics of the standard Machine Learning process, while also recognizing the math that is embedded in some of the algorithms. Being very practical, and introducing Python programming, the content also addresses relevant ethical issues in data preparation and exploration.
The data from the course shows that the journey resulted in an increase in the motivation of the girls to participate in the course.
The research shows that it is possible to apply the Heroine’s Learning Journey to a STEM MOOC course in order to improve the enrollment of girls as well as increasing their participation during the course and reducing their evasion.

Author(s): Luis Costa (Federal University of Rio de Janeiro), Yuri Lima (Federal University of Rio de Janeiro), Ana Moura Santos (Instituto Superior Técnico), and Geraldo Xexéo (Federal University of Rio de Janeiro)

Domain: Computer Science and Applied Mathematics


DFTK: A Differentiable Julia Toolkit Enabling Joint Multidisciplinary Research on Efficient and Error-Controlled Electronic-Structure Simulations

High-throughput electronic structure calculations involving millions of systematic simulations are an indispensible tool in materials science, physics and chemistry to design and discover novel materials. In this regime the challenges are manyfold, including the selection of a physical model with an appropriate cost/accuracy balance, the automatised setup of the simulation and the efficiency and robustness of the numerical implementation. Tackling these challenges inevitably concerns multiple research fields and requires interdisciplinary cooperation. To support the research approaches of multiple domains jointly in a single software platform, we started the density-functional toolkit (DFTK, dftk.org) [M. F. Herbst, A. Levitt, E. Cancès. JuliaCon Proceedings, 3, 69 (2021)]. DFTK supports both analytical models accessible to numerical analysis and mathematical physics as well as state-of-the-art density-functional theory (DFT) simulations on solid-state systems up to 1000 electrons. Moreover DFTK is algorithmically differentiable permitting advanced techniques in uncertainty quantification and data-enhanced models of scientific machine learning to be explored. With only a few thousand lines of high-level Julia code the package features a low entrance barrier across backgrounds. Recent advances with respect to error control, robustness and efficiency of DFT simulations achieved with DFTK demonstrate its suitability of for interdisciplinary research in this domain.

Author(s): Michael F. Herbst (RWTH Aachen University)

Domain: Computer Science and Applied Mathematics


Dynamos in a Rapidly Rotating Full Sphere

The Earth’s magnetic field has existed for 4 Gyr. Before the onset of crystallisation of the solid inner core, and for possibly 90% of Earth history, the relevant geometry was that of a full sphere. Numerical studies of the geodynamo in a full sphere are thus critical for understanding the paleomagnetic field. However, direct numerical simulations of the dynamo problem in a whole sphere has been rare, partially due to the difficulty of proper treatment of the singularity at the sphere center. Using a fully spectral method with basis functions being smooth everywhere, we run a series of numerical simulations to investigate dynamos in a whole sphere. For various Ekman numbers E, measuring the effect of viscosity compared to the Coriolis force, the dynamo regime dependence on the Rayleigh number Ra (measuring thermal forcing) and the magnetic Prandtl number Pm (controlling electrical conductivity) was determined, and we find it to be different from that of dynamos operating in a spherical shell. The regime of stable dipolar magnetic field seems to be narrower compared to the spherical shell case. We also report scaling analysis of the input/output quantities of our simulations.

Author(s): Jiawen Luo (ETH Zurich), Colin Hardy (ETH Zurich), Philippe Marti (ETH Zurich), and Andrew Jackson (ETH Zurich)

Domain: Climate, Weather and Earth Sciences


Data-Driven Analysis of the Elder Problem Using Big Data and Machine Learning

In this work, the d3f software is used for numerical solving the problems in Computational Fluid Dynamics. We have ported the d3f software to the Spark cluster. Such a modification allowed implementing the mass parallel runs of d3f software, efficient post-processing, and further analysis of vast amounts of data using Big Data tools and Machine Learning approaches. Specifically, our Spark-d3f setup is used to simulate and analyze the Elder problem. For this problem, we achieved the following scientific results.
- Investigated the steady-state solutions of the Elder problem with regards to the Rayleigh numbers (Ra), grid sizes, perturbations, etc.
- Analyzed the complexity of solutions regarding time, solution types, and other factors.
- Created a tool for visual exploration of large solution ensembles of the Elder problem.
- Developed predictive models for the Elder problem using different classification methods.
Our predictive models are divided into three types, depending on how we designed the model's predictors (features). The best of them can predict a steady-state of the Elder problem (i.e., when time t > 50 years) with 95% accuracy at t=8-9 years.

Author(s): Roman Khotyachuk (NORCE Norwegian Research Center AS, University of Bergen), and Klaus Johannsen (NORCE Norwegian Research Center AS)

Domain: Computer Science and Applied Mathematics


Reinvigorating WRF I/O with ADIOS2 - Enabling High Performance Parallel I/O and In-Situ Analysis for Numerical Weather Prediction

As the computing power of large-scale HPC clusters approaches the Exascale, the gap between compute capabilities and storage systems is ever widening. In particular, the ubiquitous High Performance Computing application, the Weather Research and Forecasting Model (WRF) is currently being utilized for high resolution weather forecasting and research which generates very large datasets. However, the I/O modules within WRF have not been updated within a decade, resulting in lack-luster overall parallel I/O performance. This work demonstrates the impact of integrating a next-generation data management I/O framework - ADIOS2, as a new I/O backend option in WRF. The results of I/O write times are compared with results of currently available WRF I/O options, and show up to a two orders of magnitude speedup when using ADIOS2 compared to classic MPI-I/O based solutions. Additionally, the node-local burst buffer write capabilities as well as in-line lossless compression capabilities of ADIOS2 are showcased. Finally, usage of the novel ADIOS2 in-situ analysis capabilities for weather forecasting is demonstrated using a WRF forecasting pipeline, showing a seamless end-to-end processing pipeline that occurs concurrently with the execution of the WRF model, leading to a dramatic improvement in total time to solution.

Author(s): Michael Laufer (Toga Networks, a Huawei Company), and Erick Fredj (Toga Networks, a Huawei Company)

Domain: Climate, Weather and Earth Sciences


Supercritical Thermal Convection in a Sphere

Thermal convection, i.e. fluid flow due to buoyancy forces, sets in when the thermal forcing exceeds a critical value. As the forcing is increased, more heat is transferred by convection and the flow develops small-scale patterns. We present results from direct numerical simulations of highly supercritical thermal convection in both full sphere and spherical shell geometries and investigate the scaling of heat transfer and global flow properties with the thermal forcing. The simulations are performed using highly accurate and efficiently parallelised fully spectral methods for solving the relevant equations of motion and of heat transfer.

Author(s): Tobias Sternberg (ETH Zürich), Andrew Jackson (ETH Zurich), Philippe Marti (ETH Zurich), and Giacomo Gastiglioni (ETH Zurich)

Domain: Climate, Weather and Earth Sciences


ALPINE: A Set of Portable Plasma Physics Particle-in-Cell Mini-Apps for Exascale

Alpine consists of a set of mini-apps which provide a test bed for implementing new algorithms and/or novel implementations of existing algorithms related to particle-in-cell (PIC) schemes in the context of exascale architectures in a portable way. Alpine is based on IPPL (Independent Parallel Particle Layer) a framework that is designed around performance portable and dimension independent particles and fields. We consider the following mini-apps which are most commonly used in electrostatic PIC studies: linear and non-linear Landau damping, bump-on-tail or two-stream instability and a Penning trap. The mentioned mini-apps are benchmarked with varying grid sizes (512^3-2048^3) and number of simulation particles (10^9-10^{11}). We show strong and weak scaling and analyse the performance of different components on several pre-exascale architectures such as Piz-Daint, Cori, Perlmutter and Summit up to thousands of CPU cores and GPUs. This work will serve as a guidance for the plasma PIC community to identify the major reasons for performance limitations, and better prepare for exascale architectures. So far portable, exascale PIC studies are mostly in the context of electromagnetic PIC schemes. To the best of our knowledge this is the first study which considers the performance of electrostatic PIC in such context.

Author(s): Sriramkrishnan Muralikrishnan (Paul Scherrer Institute), Matthias Frey (University of St Andrews), Alessandro Vinciguerra (ETH Zurich), Michael Ligotino (ETH Zurich), Antoine Cerfon (New York University), and Andreas Adelmann (Paul Scherrer Institute)

Domain: Computer Science and Applied Mathematics


Shallow Water Simulations on Complex Ocean Domains using Block-Structured Grids

Real-world ocean domains often have complex geometry and topography and are thus best suited for unstructured-mesh discretizations. However, structured grids offer performance advantages on cache-based architectures due to regular memory accesses. To combine the geometrical flexibility of unstructured meshes with the performance benefits of structured ones, we developed a block-structured grid generator for realistic ocean domains. In order to be able to represent correctly small features such as small islands and narrow channels, we enhance our methodology by allowing the generated grids to cover a larger area than the actual computational domain and introduce masking to exclude excessive grid elements. The automatically generated block-structured grids are used for simulations with GHODDESS, a code generation framework based on ExaStencils which discretizes the shallow water equations by a quadrature-free discontinuous Galerkin method. A key feature of the grid generation with regard to high performance computing is the ability to exactly specify a required number of blocks, i.e. load imbalances can be avoided. We validate our approach by comparing the simulation results on unstructured and masked block-structured grids for complex ocean domains and present performance studies for our new methodology.

Author(s): Sara Faghih-Naini (University of Bayreuth, Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computer Science and Applied Mathematics


Real-Time Large Deformation Simulations using Probabilistic Deep Learning Framework

Several engineering applications rely on the predictive capabilities of computational models. Some of these applications, like biomedical simulations, require computationally efficient or even real-time solutions. Conventional methods for solving the underlying nonlinear problems, such as the Finite Element Method are computationally far too expensive. In this work, we propose a probabilistic deep learning surrogate framework that is capable of accurately and efficiently predicting non-linear deformations of bodies together with the predictions’ uncertainties. The framework uses a special convolutional neural network architecture (U-Net), which has strong resemblances to Finite Element multigrid methods and proves to be capable of capturing non-linear responses characteristic to large deformation regimes. Our surrogate framework directly takes the Finite Element nodal forces at the neural network input to give nodal displacements at its output. The probabilistic part of the framework is based on a dedicated Variational Inference formulation, with which we are not only able to efficiently capture uncertainties related to noisy data, but we also have knowledge about the model uncertainties—which is especially important in regions not well supported by the data (e.g., the extrapolated region). Hence our framework acts as an important step towards making real-time large deformation simulations more trustworthy.

Author(s): Saurabh Deshpande (University of Luxembourg), Jakub Lengiewicz (University of Luxembourg; Institute of Fundamental Technological Research, Polish Academy of Sciences), and Stephane Bordas (University of Luxembourg)

Domain: Engineering


Secure Data handling with Smart Contracts in Federated Distributed Systems for the Digitally Economy of Europe

The greatest amount of data involved in the digital industry, society and cloud markets is processed in massive autonomous datacenters. In the context of a new sustainable and effortless economy, evolving to an edge-centric approach is the new computing paradigm. This happens by splitting the computational tasks between fog nodes (devices, set-top-boxes, and datacenters) in a distributed manner. Ways of surpassing Cloud computing technologies, that offer a variety of services for data processing, storage, and security, is a major research challenge especially with the current analytical models, and experimental research. Using real-world experiments is too costly because of the limited quantity of data. We aim to build highly competitive heterogeneous ecosystems that handle data in a secure fashion based on three state-of-the-art pillars: distributed file storage using straight forward concepts such Non-Fungible Tokens and Interplanetary File System, federated learning, and smart contracts. This poster presents our security-first vision on distributed data handling along with the architecture, design principles, and protocols of a complex interoperable middleware for large volumes of heterogenous data. Our solution is suitable in complex cyber-physical systems, modern and sustainable industries or in governmental systems with respect to the Digital Compass strategy of the European Commission.

Author(s): Dragoș Mihai RĂDULESCU (University Politehnica of Bucharest), Bogdan-Costel MOCANU (University Politehnica of Bucharest), and Florin POP (University Politehnica of Bucharest, National Institute for Research & Development in Informatics - ICI Bucharest)

Domain: Computer Science and Applied Mathematics


Real Time Scheduling in Mission Critical Systems

In the context of massive digitalization according to the Digital Compass strategy of the European Commissionfor 2030, we identify a major research challenge in the real-time processing of streams or other large amounts of data. Our research shows that scheduling algorithms are largely limited to two objectives and uses a-priori methods that do not approach the Pareto boundary (a set of solutions to a multi-criteria optimization problem, representing compromises between objectives). They aim to find a single compromise solution through aggregation or constrained planning. Our proposal aims to be the first truly distributed multi-objective approach, capable of optimizing the set of compromise solutions for real-time applications such as natural language processing, traffic routing or social media interactions. We base our research on state-of-the-art distributed approaches such as Interplanetary File System and propose an interoperable cost-effective middleware for task scheduling with real-time constraints. This poster presents a new vision on task scheduling algorithms and a comprehensive architecture for an interoperable middleware for task scheduling in distributed systems that extends the real-time heuristic to other dimensional spaces like data storage, processing cost and QoS in distributed fashion.

Author(s): Bogdan-Costel MOCANU (University Politehnica of Bucharest), Alexandra-Elena MOCANU (University Politehnica of Bucharest), Ion-Dorinel FILIP (University Politehnica of Bucharest), and Florin POP (University Politehnica of Bucharest, National Institute for Research & Development in Informatics - ICI Bucharest)

Domain: Computer Science and Applied Mathematics


Large-Scale Hybrid/Heterogeneous Platform for Climate Modeling

Large-Scale Hybrid/Heterogeneous Platform for Climate Modeling

Climate modeling is a hypercomplex problem requiring huge computational resources and careful combination of several sub-models and disciplines into a coherent whole in order to produce climate forecasts that can be trusted to inform policy decisions on human adaptation to the changing climate.

Highly abstracted, climate models consist of physical models of the atmosphere and oceans based on hydrodynamics and thermodynamics, interplaying with chemical and biological models based on quantum mechanics of radiation and molecules. The many different orders of magnitude of length and time scales involved make direct computation from basic principles impracticable. Research groups around the world are attempting to help bridge the gaps between micro- and macro-scale science in climate models through application of methods involving inferences from artificial neural networks trained on observed data. Consequently, we postulate that the platform for global modeling should be decomposed into a hybrid architecture with sub-platforms differing in nature as they represent theory-rich and data-intensive parts of modeling.

We propose a hybrid architecture consisting of computational and cognitive subsystems corresponding to a decomposed climate model requiring different approaches. Consequently, models have computation-intensive and training-inference-heavy submodels which are executed and then combined/synthesized into complete solutions.

Author(s): Kemal Delic (The Open University), and Martin Walker (ACM Ubiquity)

Domain: Climate, Weather and Earth Sciences


Building a Physics-Constrained, Fast and Stable Machine Learning-Based Radiation Emulator

In climate models, the transfer of radiation is approximated by parameterizations. The current operational radiative transfer solver in the Icosahedral Nonhydrostatic Weather and Climate Model (ICON) is ecRad. It is an accurate radiation parameterization but remains computationally expensive. Therefore, the radiation solver is only run on a reduced spatial grid, which can affect prediction accuracy. In this project, we are trying to develop a radiative transfer solver improved by machine learning to speed up the computation without loss of accuracy. Our research focuses on two methods: random forests and physics-informed neural networks. We continue to call ecRad at constant though significantly reduced time intervals and on a reduced spatial grid thereby using it as a regularizer while reducing computation costs. The underlying idea is to avoid unphysical climate drifts and to support the generalization capabilities of the ML method.

Author(s): Guillaume Bertoli (ETH Zurich), Sebastian Schemm (ETH Zurich), Firat Ozdemir (Swiss Data Science Center), Eniko Székely (Swiss Data Science Center), and Fernando Perez-Cruz (Swiss Data Science Center, ETH Zurich)

Domain: Climate, Weather and Earth Sciences


HighFive: An Easy-To-Use, Header-Only C++ Library for HDF5

The use of portable scientific data formats are vital for managing complex workflows, reliable data storage, knowledge transfer, and long-term maintainability and reproducibility. Hierarchical Data Format (HDF) 5 is considered the de-facto industry-standard for this purpose. While the official HDF5 library is versatile and well supported, it only provides a low-level C/C++ interface. Lacking proper high-level C++ abstractions dissuades the use of HDF5 in scientific applications. There are a number of C++ wrapper libraries available. Many, however, are domain-specific, incomplete or not actively maintained.
To address these challenges we present HighFive, an easy-to-use, header-only C++11 library that simplifies data management in HDF5 while maintaining the flexibility of the data format. Highfive is designed with performance in mind, and reaches near-zero runtime overhead thanks to compiler inlining on header-only templates. The library features: automatic C++ type-mapping, automatic memory management via RAII, and adjustable data selections for partial I/O. It is both thread-safe and supports the HDF5 MPI backend. Finally, it integrates smoothly with other projects via the CMake build system. HighFive is developed as an open-source library and can be downloaded from: github.com/BlueBrain/HighFive .

Author(s): Adrien Devresse (EPFL), Omar Awile (EPFL), Jorge Blanco Alonso (EPFL), Tristan Carel (EPFL), Nicolas Cornu (EPFL), Tom de Geus (EPFL), Pramod Kumbhar (EPFL), Fernando Pereira (EPFL), Sergio Rivas Gomez (EPFL), Matthias Wolf (EPFL), and James Gonzalo King (EPFL)

Domain: Computer Science and Applied Mathematics


Performance Modelling of Generated Stencil Kernels within the HyTeG Framework

In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. With the use of the pystencils code generation toolbox, the original abstract C++ kernels are replaced with highly optimized loop nests. The performance of one of those kernels (the matrix-vector multiplication) is thoroughly analyzed using the Execution-Cache-Memory (ECM) performance model. These predictions are validated by measuring execution on the SuperMUC-NG supercomputer. Overall, the results agree with the predicted performance, and the discrepancies are discussed. Additionally, we conduct a node-level scaling study which shows the expected behavior for a memory-bound compute kernel.

Author(s): Dominik Thönnes (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computer Science and Applied Mathematics


DisCosTiC: A DSL-based Parallel Simulation Framework using First-Principles Analytic Performance Models

DisCosTiC (Distributed Cost in Clusters) is a lightweight message passing simulation toolkit that simulates large-scale applications by taking the socket-level performance properties of the hardware-software interaction into account. It can reproduce and explore the dynamics of parallel programs on current and future supercomputers in a well-controlled environment, thereby saving resources and time. In contrast to existing trace-based simulators, we propose a domain-specific language (DSL) since traces do not comprise inter-process dependency information and are superimposed by many effects coming from the real system, such as system noise, variations in MPI implementations, etc. DisCosTiC has no dependencies on external libraries and uses analytical, first-principle models for execution and communication time predictions, taking socket-level bandwidth contention into account. For the execution part, it supports Roofline and ECM (execution-cache-memory) models, while communication is covered by Hockney and LogGOPS models. The structure of the parallel program is formulated within the DSL, and a configuration file holds hardware attributes. The resulting simulated traces can be visualized via Chromium’s Trace Event Profiling tool. In the simulator design, we tried to find a trade-off among modeling complexity, simulation accuracy, and user-friendliness.

Author(s): Ayesha Afzal (Erlangen National High Performance Computing Center), Georg Hager (Erlangen National High Performance Computing Center), and Gerhard Wellein (Erlangen National High Performance Computing Center)

Domain: Computer Science and Applied Mathematics


ARM-Powered Numerical Weather Prediction: Running the ECMWF Model on Fugaku

The current top supercomputer in the world is Fugaku, based at the RIKEN Centre for Computational Science (R-CCS) in Japan. Fugaku is notable not only for its size, with 160,000 nodes providing a peak performance of almost half an exaFLOPS, but also for having achieved this speed entirely through ARM CPU technology. Taking advantage of a collaboration between the European Centre for Medium-Range Weather Forecasts (ECMWF) and R-CCS, we have been evaluating the IFS global atmospheric model on Fugaku for the purposes of numerical weather prediction. We have ported the IFS to Fugaku and carried out preliminary benchmarking exercises, with the goal of eventually running computationally-efficient storm-resolving kilometer-scale global atmospheric simulations. In this poster we will recount our experiences in porting the IFS to Fugaku and provide benchmark comparisons with ECMWF's brand new AMD-powered supercomputer, the Atos BullSequana XH2000. We will also present our attempts to exploit some of Fugaku's unique features for IFS simulations, including its very wide vector length and hardware support for 16-bit floating-point operations.

Author(s): Samuel Hatfield (European Centre for Medium-Range Weather Forecasts), Peter Dueben (European Centre for Medium-Range Weather Forecasts), Ioan Hadade (European Centre for Medium-Range Weather Forecasts), Seiya Nishizawa (RIKEN Centre for Computational Science), Tsuyoshi Yamaura (RIKEN Centre for Computational Science), and Hirofumi Tomita (RIKEN Centre for Computational Science)

Domain: Climate, Weather and Earth Sciences


Physics-Inspired Representations for Atomistic Machine Learning

In the last decade, machine learning (ML) methods have been used to predict properties of molecules and materials with great success, reducing the cost of these predictions while keeping the accuracy high. A crucial step in atomistic ML methods is the mapping of atomic configurations to a set of features. By encoding physical symmetries, sum rules, asymptotic tails, and other physical concepts directly in the representation of the atomic systems, we can dramatically improve the accuracy of the resulting ML model and the data efficiency of the training exercise.
Our group is incorporating more physical symmetries in these representations, we recently proposed a way to deal with different variances with respect to atomic permutations, which allowed us to learn properties defined on multiple centers. This has proven a very useful tool to learn hamiltonian matrix elements directly, the inherent symmetry of the model making it very robust to noise and errors.
Such fundamental developments go hand to hand with software development, as we are writing software libraries to compute atomic representations efficiently on HPC hardware. We are integrating these libraries within well-known simulation tools such as LAMMPS, making atomistic machine learning available to the whole community.

Author(s): Guillaume Fraux (EPFL), Sergey Pozdnyakov (EPFL), and Michele Ceriotti (EPFL)

Domain: Chemistry and Materials


Parallel Algorithms for Solution Transfer on GPUs

In this work, we present parallel algorithms for solution transfer between solvers. Solution transfer, otherwise known as coupling, is a widely used method in coupled multi-physics simulations where data from one solver is interpolated to another. There are two main computations involved in solution transfer – intersection computation and determining interpolation weights. We present hierarchical parallel algorithms for interpolation that can be ported to many-cores and GPUs. The algorithms are low contention and show good performance on multi-GPU nodes. Kd-trees are constructed for the input meshes. Intersection computations are performed at two levels of the trees – coarse and fine. Empty regions are removed at each stage. The final set of intersecting pairs of tree nodes are partitioned equally across GPUs. Fine grids are overlayed on the elements belonging to these pairs. Mesh elements are assigned to grid corners and broadcast to grid cells within a cutoff region defined by the ratio of mesh resolutions. When broadcast to other grid cells, they compute interpolation weights with intersecting elements from the other mesh. These weights are gathered to owning grid cells. The entire computation is performed on GPUs. We used a simple interpolation scheme based on node averages for demonstration purposes.

Author(s): Aparna Sasidharan (ANSYS)

Domain: Computer Science and Applied Mathematics


Progress Towards Extending GPU Support in GROMACS with SYCL

GROMACS is an open-source, widely-used molecular dynamics package delivering high-performance, scalable biomolecular simulations on a wide range of hardware and software platforms. GROMACS consists of almost half a million lines of C++ code, and supports different CPU architecture, different SIMD instruction sets via a unified interface, and offload to GPUs by the three major vendors: NVIDIA, AMD, and Intel. Starting with GROMACS 2021, we started adding SYCL GPU offloading support, alongside existing CUDA and OpenCL, to have a unified, portable, and performant acceleration layer harmonized with the rest of our C++17 codebase. In GROMACS 2022, the SYCL code supports most of the GPU offloading features. We present the achieved results and discuss the challenges encountered when porting the mature and highly-optimized codebase to the new GPU framework and the decisions made to balance the code maintainability and portability, including performance portability.

Author(s): Andrey Alekseenko (KTH Royal Institute of Technology, Science for Life Laboratory)

Domain: Computer Science and Applied Mathematics


Surrogate Modeling of Laser-Plasma-Based Ion Acceleration with Invertible Neural Networks

The interaction of overdense and/or near-critical plasmas with ultra-intense laser pulses presents a promising approach to enable the development of very compact sources for high-energetic ions. However, current records for maximum proton energies are still below the required values for many applications, and challenges such as stability and spectral control remain unsolved to this day. In particular, significant effort per experiment and a high-dimensional design space renders naive sampling approaches ineffective. Furthermore, due to the strong nonlinearities of the underlying laser-plasma physics, synthetic observations by means of particle-in-cell (PIC) simulations are computationally very costly, and the maximum distance between two sampling points is strongly limited as well. Consequently, in order to build useful surrogate models for future data generation and experimental understanding and control, a combination of highly optimized simulation codes (we employ PIConGPU), powerful data-based methods, such as artificial neural networks, and modern sampling approaches are essential. Specifically, we employ invertible neural networks for bidirectional learning of parameter and observables, and autoencoder to reduce intermediate field data to a lower-dimensional latent representation.

Author(s): Thomas Miethlinger (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden)

Domain: Physics


An Innovative and Automated Vortex Identification Method Based on the Estimation of the Center of Rotation

An unambiguous method for the detection of vortices in hydrodynamical flows has not been found yet.
We aim at developing a robust method for the automated identification of vortices. Local and global rotations in the flow should be considered as both are necessary for the detection of coherent vortical structures. Moreover, the use of a threshold should be avoided to not exclude slow vortices in the identification process.
We present a new method, based on the velocity field alone, that combines the rigor of mathematical criteria and the global perspective of morphological techniques. The core of the method is the estimation of the center of rotation for every point in the flow that presents some degree of local rotation. For that, we employ the Rortex criterion and the morphology of the neighboring velocity field. We then identify coherent vortical structures by clustering the estimated centers of rotation with a grid-adapted version of the clustering by fast search and find of density peaks (CFSFDP) algorithm.
We apply the method to different examples of vortical velocity fields and demonstrate its reliability and accuracy in the solar atmosphere.

Author(s): José Roberto Canivete Cuissa (Istituto Ricerche Solari Locarno, University of Zurich), and Oskar Steiner (Istituto Ricerche Solari Locarno, Leibniz-Institut für Sonnenphysik (KIS))

Domain: Physics


A FAIR Digital Object-Based Data Lake Architecture to Support Various User Groups and Scientific Domains

Across various domains, data lakes are successfully utilized to centrally store all data of an organization in their raw format. Doing this with overarching governance for all the collected data and the developed processes prevents the creation of isolated Data Silos, which can quickly arise if small research teams operate independently of each other. Having a central Data Lake, however, promises high reusability of the stored data since a schema is implied on reading, which prevents an information loss due to ETL processes. Despite this schema-on-read approach, some modeling is mandatory to ensure proper data integration, comprehensibility, and quality. These data models are maintained within a central data catalog which can be queried. To further organize the data in the data lake, different architectures have been proposed, like the most widely known zone architecture. Here, data is assigned to different zones according to the processing they were subjected to. In this work, we present a novel data lake architecture based on FAIR Digital Objects (FDO) with (high-performance) processing capabilities. The FAIR Digital Objects are connected by a provenance-centered graph. Users can define generic workflows, which are reproducible by design, making this data lake implementation ideally suited for science.

Author(s): Hendrik Nolte (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen), Piotr Kasprzak (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen), Julian Kunkel (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen), and Philipp Wieder (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)

Domain: Computer Science and Applied Mathematics


Utopia: a Hardware Portable Library for Large Scale Simulations in Computational Geophysics

We present Utopia, an open-source C++ library for parallel non-linear multilevel solution strategies. Utopia provides the advantages of high-level programming interfaces while at the same time a framework to access low-level data structures without breaking code encapsulation. Complex numerical procedures can be expressed with few lines of code, and evaluated by different implementations, libraries, or computing hardware. In this poster, first, we illustrate a globally convergent solution strategy designed to solve non-convex constrained minimization problems. Second, we provide an overview of its implementation and provide an overview of its parallel performance. Third, we provide simulation. examples of pressure-induced phase-field fracture propagation in large and complex fracture networks. Solving such problems is deemed challenging even for a few fractures, however, here we are considering networks of realistic size with up to 1000 fractures.

Author(s): Patrick Zulian (Università della Svizzera italiana), Alena Kopanicakova (Università della Svizzera italiana), Maria Nestola (Università della Svizzera italiana, ETH Zurich), Daniel Ganellari (ETH Zurich / CSCS), Nur Fadel (ETH Zurich / CSCS), Andreas Fink (ETH Zurich / CSCS), Joost VandeVondele (ETH Zurich / CSCS), and Rolf Krause (Università della Svizzera italiana)

Domain: Computer Science and Applied Mathematics


Challenges of SINFONY - the Combination of Nowcasting and Numerical Weather Prediction on the Convective Scale at DWD

There are different "optimal" forecast methods for different lead-times and weather phenomena.
For precipitation and severe convection up to some hours ahead, radar extrapolation techniques (Nowcasting)
show good skill up to 1-2h ahead, while numerical weather prediction (NWP)
outperforms Nowcasting only at later hours. At DWD, both systems are independently developed and operationally operated by different teams on completely different computer architectures. NWP runs as an ensemble, while Nowcasting is deterministic. There are no integrated ("combined") forecast products for warning meteorologists and hydrological authorities, making simultaneous use of both methods difficult.
To overcome this situation and provide meaningful seamless forecasts from minutes to 12h, DWD's new Seamless INtegrated FOrecastiNg sYstem (SINFONY) will come to life in the next two years.
Nowcasting- and NWP-ensembles are developed, enhanced and integrated in one large interdisciplinary team. Topics:
a) Nowcasting ensembles for precipitation, radar reflectivity and convective cell-objects,
b) hourly km-scale Rapid-Update-Cycle Ensemble-NWP, assimilating new high-resolution observations (3D-radar-volumes, Meteosat-VIS-channels, cell-objects),
c) lead-time-weighted optimal combinations of Nowcasting- and NWP ensembles in observation space.
The poster will discuss some big challenges, e.g., such arising from different inherent characteristics of Nowcasting- and NWP-forecasts, and real-time production of high-frequent forecast updates across different computing platforms.

Author(s): Ulrich Blahak (Deutscher Wetterdienst), and Team SINFONY (Deutscher Wetterdienst)

Domain: Climate, Weather and Earth Sciences


Efficient Discrete Cosine and Polynomial Transforms on GPUs using VkFFT

This poster will focus on the latest advancements in the field of fast GPU algorithms for various types of discrete transforms. We present an extension to VkFFT - GPU Fast Fourier Transform library for Vulkan, CUDA, HIP and OpenCL, that allows calculating Discrete Cosine Transforms of types I-IV. They are often used in image processing, data compression and numerous scientific tasks, like calculating various discrete transformations on Chebyshev grids. So far, this is the first publicly available optimized GPU implementation of DCTs. We also present our advances in the GPU implementation of efficient spherical harmonic transforms and radial transforms in a spherical geometry. We will present Jones-Worland and Associated Legendre Polynomial Transforms for modern GPU architectures, implemented based on the VkFFT runtime kernel optimization model. These new implementations will be used to create a GPU-enabled version of the fully spectral CFD framework QuICC in spherical geometry.

Author(s): Dmitrii Tolmachev (ETH Zurich), Andrew Jackson (ETH Zurich), Philippe Marti (ETH Zurich), Giacomo Castiglioni (ETH Zurich), and Daniel Ganellari (ETH Zurich / CSCS)

Domain: Computer Science and Applied Mathematics


Towards a Task Based GPU Enabled Distributed Eigenvalue Solver

Developing and implementing an efficient GPU enabled eigenvalue solver is a complex operation, which becomes a challenge when a task-based approach is used. A fine balance among the number of tasks and their execution time has to be found, to maintain enough parallelism and to avoid increasing the scheduler overhead.
However the benefits of task-based linear algebra implementations are important, as our previous work on distributed Cholesky decomposition and triangular solver has shown.
The reduction of the number of synchronisation points (compared to the fork join approach used by LAPACK and ScaLAPACK) and the possibility of scheduling multiple algorithms to run concurrently are two of the main benefits.
Here we present a task-based GPU-enabled eigenvalue solver based on the pika library. Pika was chosen because it follows the developments of Concurrency and Parallelism proposed as part of the ongoing C++ standardization process. In particular pika provides an implementation of the latest sender/receiver proposal.

Author(s): Alberto Invernizzi (ETH Zurich / CSCS), Teodor Nikolov (ETH Zurich / CSCS), Auriane Reverdell (ETH Zurich / CSCS), Mikael Simberg (ETH Zurich / CSCS), and Raffaele Solcà (ETH Zurich / CSCS)

Domain: Computer Science and Applied Mathematics


Hybrid Parallelization for the Fully Spectral CFD Framework QuICC

Our CFD framework QuICC, based on a fully spectral method, has been successfully used for various dynamo simulations in a spherical geometry. It runs efficiently on a few thousands of cores using a 2D data distribution based on a distributed memory paradigm (MPI). The implicit treatment of the Coriolis force is an important feature of QuICC and it is critical to reduce the time to solution. However, it requires solving large sparse systems at each time step. In order to better harness the computing power of current and upcoming HPC systems, which are increasingly based on heterogeneous nodes built from multi-core processors and accelerators (GPU), we present our work on refactoring the framework to introduce a hybrid distributed and shared memory parallelization (MPI + X). The interplay between the data flow of the sparse direct solver and the spectral transforms needed for the nonlinear computations requires a careful design of the parallelization. Different strategies will be presented.

Author(s): Philippe Marti (ETH Zurich), Giacomo Castiglioni (ETH Zurich), Dmitrii Tolmachev (ETH Zurich), Daniel Ganellari (ETH Zurich / CSCS), and Andy Jackson (ETH Zurich)

Domain: Climate, Weather and Earth Sciences


Dynamic Workflows for Smart Earth System Models Computation and Advanced Extreme Event Analysis

Earth System Models (ESM) simulations represent one of the most challenging High Performance Computing (HPC) use cases for their high computational cost, intensive Input/Output patterns, extreme data volumes produced, and the necessity of post-processing for the extraction of relevant information. ESM ensemble workflow experiments typically consist of several start dates and members divided into sequential simulation chunks. Being able to dynamically prune members which are not useful to the whole simulation would allow using computing and storage resources more efficiently. On the other hand, the huge amounts of data produced by the state-of-the-art models represent an invaluable opportunity for the study of complex phenomena, such as extreme events. High Performance Data Analytics (HPDA) and Machine Learning (ML) techniques could allow effective analysis of these data at scale. Thus, novel workflow solutions capable of joining HPC, data analytics and ML are needed to support next-generation ESM computing and analysis. In the context of the eFlows4HPC project, we are exploring the use of integrated workflows exploiting approaches from HPC, Big Data and ML for smart and efficient ESM simulations and extreme events analysis (e.g. Tropical Cyclones). This contribution presents the workflows and the approaches being developed to tackle these requirements.

Author(s): Alessandro D'Anca (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici), Donatello Elia (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici), Julian Rodrigo Berlin (Barcelona Supercomputing Center), Nikolay Koldunov (AWI - Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung), Francesco Immorlano (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici, Università del Salento), Gabriele Accarino (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici, Università del Salento), Suvarchal K. Cheedala (AWI - Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung), Enrico Scoccimarro (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici), Miguel Castrillo (Barcelona Supercomputing Center), Jorge Ejarque (Barcelona Supercomputing Center), Rosa M. Badia (Barcelona Supercomputing Center), and Giovanni Aloisio (CMCC - Centro Euro-Mediterraneo sui Cambiamenti Climatici, Università del Salento)

Domain: Climate, Weather and Earth Sciences


Fusion of Massively-Parallel Simulation Frameworks and Code Generation Methodologies for Lattice Boltzmann and Multigrid Applications

In various application domains, large-scale simulations are required for accurate and meaningful results. For optimal usage of the target system, simulation codes are often tailored towards its hardware components. Implementing such specialized codes by hand, however, can be a challenging task. Code generation can provide a remedy for this task. One successful example is ExaStencils. ExaStencils is a whole-program code generation framework for stencil codes, in particular multigrid, on block-structured grids. From its domain-specific language ExaSlang, optimal C++ code can be produced. waLBerla, on the other hand, is a more traditional HPC framework for multi-physics with a focus on CFD. It specializes in LBM and particle simulation on octrees. Here, code generation is also employed for performance-critical kernels working on grids and particles. In this work, we couple the two frameworks and examine how multi-physics applications, requiring specialized components from both worlds, can be implemented. One promising example is a charged particle application, comprising of fluid flows, simulated using LBM, and electric potentials, solved via multigrid, in addition to their interaction with particles. In this work, we demonstrate how such a coupling can be achieved and illustrate interface data structures.

Author(s): Richard Angersbach (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computer Science and Applied Mathematics


Docker Container in DWD's Seamless INtegrated FOrecastiNg sYstem (SINFONY)

At Deutscher Wetterdienst (DWD), the SINFONY project has been set up to develop a seamless ensemble prediction system for convective-scale forecasting with forecast ranges of up to 12 hours. It combines Nowcasting (NWC) techniques with numerical weather prediction (NWP) in a seamless way. So far NWC and NWP run on two different IT-Infrastructure levels. This separation slows down SINFONY, due to data transfer between both infrastructures and makes it complex and prone to disturbances. Both disadvantages are solved by applying the interconnected part of the SINFONY system on one single architecture.
With this aim in view a Docker-Container of the respective NWC components is created, and executed on the infrastructure of NWP, the high performance linux computing cluster (HPC) of DWD. In test applications we already observed a speed up of roughly 20% by using the K3D-Container on the HPC-cluster instead of using konrad3d on the initial NWC IT-Architecture. As next steps we plan to test the K3D-Container in DWD’s experimental tools and hope to observe another speed up since data transfer is no longer required.
This poster will show preliminary results and discuss challenges we are faced with using Docker container at DWD.

Author(s): Matthias Zacharuk (DWD), and Team SINFONY (DWD)

Domain: Climate, Weather and Earth Sciences


Distributed Training of Deep Neural Networks

Deep networks (DNNs) are nowadays used in a wide range of application areas and scientific fields. Since the representation capacity of DNNs is tightly coupled to their width and depth, networks have grown considerably over the last year. As this growing trend is expected to continue, the development of novel, highly-scalable training algorithms becomes an important task.
In this work, we propose novel distributed-training strategies for large-scale DNNs. The developed training algorithms are based on multilevel and domain decomposition sub-space correction techniques, well-known from numerical computing. A hierarchy of suitable sub-spaces, related to different levels or subdomains, is constructed by exploiting the underlying structure of the loss function and the network architecture. For the implementation, we leverage the PyTorch framework and take advantage of CUDA and NCCL technologies.
The convergence properties and scaling behavior of the proposed training methods will be demonstrated using several state-of-the-art benchmark problems. Moreover, a comparison with the widely-used stochastic gradient optimizer will be presented, showing a significant reduction in the number of iterations and the execution time.

Author(s): Alena Kopanicakova (Università della Svizzera italiana), Samuel Cruz (Università della Svizzera italiana), Hardik Kothari (Università della Svizzera italiana), and Rolf Krause (Università della Svizzera italiana)

Domain: Computer Science and Applied Mathematics


Benchmarking Memory-Bound Computational Physics Codes with In-House Developed Cloud-Bursting Solution

At CERN, the Theoretical Physics department (TH) and the Accelerator Technology Sector (ATS) rely heavily on HPC usage for developing next-generation LHC technology. Due to the varying computational needs of each department, we observe a bursty compute demand pattern that depends on our HPC project deadlines and conference schedules. In order to tackle compute demand periods that exceed our on-premise compute capacity, we have implemented a cloud-bursting solution leveraging Microsoft Azure. This cloud-bursting solution is integrated into our Slurm-based on-premise infrastructure to allow users to submit their jobs to the cloud without any major modifications to their MPI programs or job submit scripts. It provides the elasticity to grow our HPC resources in the cloud when more compute capacity is needed, and scale down these cloud-associated costs when the on-premise capacity is sufficient to satisfy demand.In addition, we have carried out a cost-effectiveness evaluation of different VM sizes, leveraging common workloads from our ATS and TH use cases. More precisely, we evaluate different Azure HPC VM sizes equipped with AMD EPYC 7551 (Naples), 7742 (Rome), 7003 (Milan), targeting memory-bound codes. For each of these we run the following workloads: FDS (Fire Dynamics Simulator), Ansys Fluent, and OpenQCD (open Quantum Chromodynamics).

Author(s): Pablo Llopis Sanmillan (CERN), Fernández Álvarez (CERN), Tomasz Józefiak (Microsoft Azure), and Lukasz Miroslaw (Microsoft Azure)

Domain: Physics


GT4Py: High Performance Stencil Computations in Weather and Climate Applications using Python

All major weather and climate prediction models in operation today are developed in Fortran or C++, intermixing optimization directives for specific hardware architectures with numerical algorithms. The resulting codes tend to be verbose, difficult to extend and maintain, and difficult to port to new architectures. In GT4Py, a Python framework for the development of weather and climate applications, we take a different approach by separating optimization concerns from the domain scientists algorithmic development. Stencil computations are expressed using a high-level Python interface, subsequently transformed into high-performance implementations using an optimizing toolchain integrated into GT4Py. We will give an overview of the design of GT4Py, its components and the language used by scientist to express their stencil-like numerical algorithms. We further showcase a performance comparison of the FV3 dynamical core between the original Fortran implementation and a GT4Py-based application developed by a team at the Allen Institute for AI.

Author(s): Anton Afanasyev (ETH Zurich / CSCS), Mauro Bianco (ETH Zurich / CSCS), Till Ehrengruber (ETH Zurich / CSCS), Enrique González Paredes (ETH Zurich / CSCS), Linus Groner (ETH Zurich / CSCS), Rico Häuselmann (ETH Zurich / CSCS), Felix Thaler (ETH Zurich / CSCS), and Hannes Vogt (ETH Zurich / CSCS)

Domain: Computer Science and Applied Mathematics


Methodology for Estimating the Effective Dissipation Coefficients in Magnetohydrodynamic Simulations of Stellar Plasmas

A crucial step in the post processing of astrophysical magnetohydrodynamic (MHD) numerical simulations is the accurate determination of the effective viscosity and magnetic diffusivity effecting the MHD flow. Once these are known, one can determine the dimensionless numbers that characterise the flow such as Reynolds and Prandtl numbers. These are of particular significance for simulations of the solar and stellar small-scale dynamo. The proposed methodology relies on a post processing step carried out with numerical operators of higher order accuracy than the ones in the simulation code. The poster explains the methodology and presents application of it to a number of radiative MHD simulations of various effective viscosities and plasma resistivities. The proposed methodology provides a solid estimate of the dissipation coefficients affecting the momentum and induction equations of MHD simulations. It is found that small-scale dynamos are active and can amplify a small seed magnetic field up to significant values in simulations of the solar convection zone with a grid spacing better than 12 km, even at a magnetic Prandtl number as small as 0.65.

Author(s): Fabio Riva (Istituto Ricerche Solari Locarno (IRSOL), Università della Svizzera italiana (USI)), and Oskar Steiner (Istituto Ricerche Solari Locarno (IRSOL), Leibniz-Institut für Sonnenphysik (KIS))

Domain: Physics


Performance Analysis of Nonlinear Optimization Problems from Radiation Therapy Treatment Planning on HPC Systems

Nonlinear optimization is widely used in the planning process for modern radiation therapy. The goal is to determine control parameters for the treatment machine in order to conform the dose delivered to the patient to the tumour volume as well as possible. Treatment planning is a time consuming process, which in part is due to the optimization problem being computationally intensive. As such the performance of the optimization solver is crucial which makes it natural to consider how HPC resources, such as GPU accelerators or computational clusters could be used to accelerate the process. Many software libraries, e.g. IPOPT and PETSc/TAO, exist for solving nonlinear optimization problems, with varying support for accelerators and distributed computing. In this work, we analyze the performance of different optimization codes on nonlinear optimization problems from radiation therapy. We present performance analysis of the different computational kernels involved in order to better understand computational bottlenecks in different problems, as well as to understand how different optimizers can utilize HPC hardware today. These results can help guide future research efforts on HPC codes for nonlinear optimization tailored for radiation therapy problems.

Author(s): Felix Liu (KTH Royal Institute of Technology, RaySearch Laboratories), Albin Fredriksson (RaySearch Laboratories), and Stefano Markidis (KTH Royal Institute of Technology)

Domain: Computer Science and Applied Mathematics


Topology Aware Collective Communication based on Cyclic Shift and Recursive Exchange

The cyclic shift and recursive exchange algorithms for collective communication on parallel computers were comprehensively investigated recently [1]. With suitable parameters of the schemes determined with a benchmark at installation time and a heuristic at runtime implementations with high performance for the message passing interface (MPI) can be obtained. For the collective communication patterns reduce_scatter, allgatherv, and allreduce on hybrid shared and distributed computers the topology is mostly addressed using a hierarchical approach. We show how a combination of the cyclic shift and recursive exchange algorithms can match the topology of multi CPU or GPU per node architectures. The algorithm applied is recursive exchange with higher radix and different factors for each step where the steps are performed with cyclic shift and multiple ports per node are used. Out of many algorithmic options the communication is arranged such that the largest data volumes occur for the fast shared memory while over the network smaller volumes are sent. Comparisons with the hierarchical implementation are made for persistent collective communication, but our approach is not limited to this case. [1] "An optimisation of allreduce communication in message-passing systems" A.Jocksch, N.Ohana, E.Lanti, E.Koutsaniti, V.Karakasis, L.Villard, Parallel Comput. 2021

Author(s): Andreas Jocksch (ETH Zurich / CSCS), and Vasileios Karakasis (ETH Zurich / CSCS)

Domain: Computer Science and Applied Mathematics


Recent Atlas Library Developments for Earth System Modelling

ECMWF has the strategy to run ECMWF's Integrated Forecasting System (IFS) -- or at least some of its Earth sytem model components -- on non-traditional hardware such as GPUs.
In this strategy, the ECMWF Atlas library plays a central role to manage data structures, distributed parallelisation, and memory spaces. The Atlas library possibly delegates its field data allocations which contains links between host CPU memory spaces and GPU device memory spaces to GridTools, developed at CSCS/ETH/MeteoSwiss. GridTools is a domain specific language (DSL) for stencil based numerical algorithms, and Atlas can then encapsulate the fields used in such numerical algorithm. Orthogonally, IFS consists of various Earth system model components such as atmosphere, radiation, ocean, ocean waves. These components may each operate on different grids and use different parallel distributions. Adding GPU device memory spaces to this mix is inevitably making the coupling of these components more challenging. ECMWF therefore aims to integrate Atlas deeper in each Earth system model component, and thus take advantage of a common datastructure library to efficiently interpolate fields between each component. The presented poster will elaborate on the recent Atlas developments to bring us closer to our ambitious goals.

Author(s): Willem Deconinck (ECMWF)

Domain: Climate, Weather and Earth Sciences


Rapid Update Cycle in DWD's Seamless INtegrated FOrecastiNg sYstem (SINFONY)

The SINFONY project at Deutscher Wetterdienst (DWD) aims to produce seamless precipitation and radar reflectivity ensemble forecast products
for a time-range from minutes up to 12 hours. It combines numerical weather predictions (NWP) and nowcasting. Nowcasts are initialized with an update frequency of 5 Minutes, while standard short-range numerical weather prediction (SRNWP) systems are initialized every three hours. Hence, in the worst case a three hour old SRNWP-forecast had to be combined with an up-to-date Nowcast.
To overcome this issue a rapid update cycle (RUC) is implemented, which initializes forecasts every hour with a shorter observation cutoff due to time criticality. To avoid growing differences between the atmospheric states in RUC and SRNWP, the RUC is time limited and branches every 24h from the SRNWP-cycle. The prediction of extreme convective events benefits from additionally available observation systems with huge data amount (e.g. satellites). Also, a more sophisticated microphysics scheme is applied differing from SRNWP leading to a spin-up phase after branching off.
The large amount of involved additional observational data is a challenge regarding stable data production and handling at DWD. We present ideas in terms of atmospheric parameters, related data flow, involved infrastructure, and results for a trial period.

Author(s): Sven Ulbrich (Deutscher Wetterdienst), Christian Andreas Welzbacher (Deutscher Wetterdienst), Thomas Hanisch (Deutscher Wetterdienst), Roland Potthast (Deutscher Wetterdienst; University of Reading, Department of Mathematics, United Kingdom), and Team Sinfony (Deutscher Wetterdienst)

Domain: Climate, Weather and Earth Sciences


Developing a Performance-Portable Finite-Volume Core for Numerical Weather Prediction

We highlight the ongoing development of a performance-portable version of the finite-volume dynamical core IFS-FVM for high-resolution global weather prediction at ECMWF and its partners at ETH Zurich and CSCS. Starting from the science expressed in the traditional Fortran programming with hybrid MPI/OpenMP parallelization targeting CPUs, the challenges associated with diverse emerging and future supercomputing architectures are addressed by a comprehensive software redesign based on GT4Py. The GT4Py framework includes a high-level Python-embedded DSL to implement stencil computations in weather and climate applications. The toolchain integrated in GT4Py enables automatic code-generation and optimization to achieve high-performance execution across a range of computing architectures and adaptation to latest energy-efficient hardware. The systematic separation of the scientific model implementation from the performance engineering improves development, debugging, maintenance and introduction of new developers and users in their specific fields. The IFS-FVM employs non-oscillatory forward-in-time (NFT) semi-implicit numerical integration of the compressible equations with robust high-resolution capabilities in regional and global domains. The poster presentation will emphasize the exciting efforts and goals in the PASC funded project KILOS (“Kilometre-scale nonhydrostatic global weather forecasting with IFS-FVM”) in 2021-2024 at ETH Zurich.

Author(s): Christian Kühnlein (ECMWF), Till Ehrengruber (ETH Zurich / CSCS), Enrique González Paredes (ETH Zurich / CSCS), Nicolai Krieger (ETH Zurich), Lukas Papritz (ETH Zurich), Stefano Ubbiali (ETH Zurich), Hannes Vogt (ETH Zurich / CSCS), and Heini Wernli (ETH Zurich)

Domain: Climate, Weather and Earth Sciences


Deep Learning-Based Forecast of Space Weather Indices

Space weather science is an important emerging field investigating events and processes developing in space between the Sun and the Earth. The development of space weather forecasting capabilities is a crucial benefit to design strategies to protect human assets in space and on the Earth; as space weather events may damage power lines, transformers, pipelines and disrupt communication. Space weather indices, such as the Disturbance Storm Time (Dst) index, characterizes magnetic activity in Earth’s ring current and aids in identifying geomagnetic storms. This work first establishes a time series dataset consisting of historical Dst data and spacecraft observations. We then used the Temporal Fusion Transformer deep-learning architecture for a 12-hour forecast of the Dst index and compared it to other deep-learning and other traditional approaches in terms of accuracy and computational performance. We demonstrated that Temporal Fusion Transformers deep-learning networks are an emerging and promising technology for application to space weather.

Author(s): Jeremy Williams (KTH Royal Institute of Technology), and Stefano Markidis (KTH Royal Institute of Technology)

Domain: Climate, Weather and Earth Sciences


A Partially Meshfree Galerkin Scheme for Representing Highly Anisotropic Fields

A method for representing highly anisotropic fields is presented, based on a partially meshfree Galerkin formulation. A mapping function is used to provide information about the local direction of the anisotropy, with one of the global coordinates chosen to parameterize the ‘parallel’ position along the mapping in a one-to-one manner. Standard unstructured finite element meshes are used on planes of constant parallel coordinate to represent the necessary small-scale variations perpendicular to the mapping direction, with large spacings then possible between these planes because of the small variation along the mapping. This greatly reduces the number of degrees of freedom required to represent fields in this space and the associated computational cost of simulations involving such fields. No mesh connectivity is defined between planes, and field aligned basis functions are constructed using the mapping function to extend the standard finite element bases into the full domain. Integration of the basis has been addressed with reference to methods developed for fully meshfree methods, and the scheme (as well as other similar element free Galerkin schemes) is shown to be locally conservative under certain conditions. Robust convergence of several test problems is demonstrated.

Author(s): Samuel Maloney (University of Warwick), and Ben McMillan (University of Warwick)

Domain: Physics


Bayesian Parameter Estimation of Galactic Binaries in LISA Data with Gaussian Process Regression

The Laser Interferometer Space Antenna (LISA), which is currently under construction, aims to measure gravitational waves in the milli-Hertz frequency band. It is expected that tens of millions of Galactic binaries will be the dominant sources of gravitational waves. The Galactic binaries at mHz frequencies emit quasi monochromatic gravitational waves which will be constantly measured by LISA. To resolve as many Galactic binaries as possible is a central challenge of the upcoming LISA data set. Although it is estimated that tens of thousands of these overlapping gravitational wave signals are resolvable, and the rest blurs into a galactic foreground noise; extracting tens of thousands of signals using Bayesian approaches is still computationally expensive. Hence, in this contribution we describe an end-to-end pipeline with a new approach using Gaussian Process Regression to model the likelihood function in order to rapidly compute Bayesian posterior distributions.

Author(s): Stefan Herbert Strub (ETH Zurich), Luigi Ferraioli (ETH Zurich), Simon Christian Stähler (ETH Zurich), Cédric Schmelzbach (ETH Zurich), and Domenico Giardini (ETH Zurich)

Domain: Physics


Image Deconvolution for Next-Generation Radio Interferometry

Next-generation radio interferometers such as the Square Kilometre Array (SKA) will produce massive datasets that will need to be processed and analysed with efficient imaging techniques. The thousands of serial cleaning iterations required by traditional imaging algorithms cannot scale to the requirements of Big Data radio astronomy. With Bluebild we are developing efficient and user-friendly software for radio astronomy imaging as a modern alternative to the state-of-the-art software CLEAN algorithm. Our method employs Principle Component Analysis (PCA) to linearly decompose visibilities from interferometric radio telescopes into different energy levels of detected sources in the sky. Bluebild is GPU accelerated, decreasing the time for image processing by several orders of magnitude when compared with the contemporary approach. Decomposition of the sky into separate energy levels also allows for a more efficient, parallelized application of the deconvolution process. Here, we present possible extensions to Buebild that include deconvolution solutions of the energy levels, and comparison to serial CLEAN deconvolution. We also explore a deep learning method that takes advantage of the sky image linear decomposition of Bluebild to denoise and re-generate a clean image. We show that the deconvolution successfully recovers the signal from the data processor pipeline of SKA precursor telescopes

Author(s): Michele Bianco (EPFL)

Domain: Physics


Preparing a High-Order Incompressible Flow Solver for Next Generation Supercomputers

Graphics Processing Units (GPUs) are commonplace in modern high-performance computing systems. Of particular interest to us, several upcoming supercomputers in Europe will predominantly use AMD GPU to accelerate their computations. One of the application domains particularly affected by this shift to GPUs is high-fidelity computational fluid dynamics, where large-scale simulations are a necessity. To leverage the computing power of future supercomputers we need our solvers to cater to these new processing units. In this work, we describe our development process of the high-fidelity incompressible flow solver Neko which is based on the spectral element method, and how we design Neko to accommodate modern accelerators. We discuss how we offload our computations through a modern Fortran framework and describe the differences we have observed between Nvidia and AMD GPUs. We provide results with regards to scaling and performance on modern AMD GPUs for this type of solver. In addition to performance results, we show an overview of current production runs and efforts using Neko where we utilize the new computer hardware to make new large-scale detailed simulations of turbulent flows at high Reynolds numbers.

Author(s): Martin Karp (KTH Royal Institute of Technology), Niclas Jansson (KTH Royal Institute of Technology), Philipp Schlatter (KTH Royal Institute of Technology), and Stefano Markidis (KTH Royal Institute of Technology)

Domain: Engineering


Designing a Modern 3D FFT Library for HPC with Data-Centric Parallel Programming

FFT is one of the essential algorithms in Scientific Computing and many applications, from CFD to MD, rely on fast implementations of it. There are many FFT libraries; one such library is the classic FFTW, which is still considered as the state-of-the-art library. FFTW solves n-dimensional FFTs on many-core systems by efficiently optimizing the algorithm to the current hardware. For Nvidia GPUs, the de facto standard is cuFFT available from CUDA Toolkit. Several 3D FFT libraries utilize cuFFT and FFTW for the 1D calculation while controlling decomposition and communication for higher dimensions. DaCe is a parallel programming framework that uses SDFGs as a transformable intermediate representation. We propose a modern FFT library written in DaCe that is portable and optimizable to most HPC hardware, such as multi-core CPUs, GPUs and FPGAs. We aim to leverage the SDFGs and their transforms for better hardware optimization by using the many parallelisms found in FFT algorithms, from SIMD vectorization on CPU to efficient inter-node 2D decompositions of 3D FFTs. Using DaCe, we aim to be faster and more portable than FFTW and cuFFT while still being simple to maintain and develop. Finally, we demonstrate the new library within GROMACS molecular dynamics code.

Author(s): Måns Andersson (KTH Royal Institute of Technology), and Stefano Markidis (KTH Royal Institute of Technology)

Domain: Computer Science and Applied Mathematics