Presentation

Parallel Space-Time Likelihood Optimization for Air Pollution Prediction on Large-Scale Systems
DescriptionGaussian geostatistical space-time modeling is an effective tool for performing statistical inference of data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on parallel linear algebra libraries and runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each sub-communicator perform a single MLE iteration at a time. We assess the accuracy of the implemented space-time model on large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations X 730 time slots and 945 spatial locations X 500 time slots, respectively. The evaluation shows that the proposed implemntation satisfies high prediction accuracy on both synthetic and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490K geospatial locations on a Cray XC40 system.
SlidesPDF
TimeMonday, June 2711:15 - 11:45 CEST
LocationSydney Room
Event Type
Paper
Domains
Climate, Weather and Earth Sciences