The world’s fastest ocean model

By Simone Silvestri, Gregory Wagner, and Raffaele Ferrari, for the MIT CliMA group.

Ocean eddies—the ocean equivalent of atmospheric cyclones and anticyclones—play a key role in the Earth’s climate system. However, they are not simulated by climate models due to their small scale, between 10 and 100 km, which is below the resolution of standard ocean models. To approximate the climate impact of the missing eddies, modelers employ parameterizations—empirical equations that estimate the collective effect of eddies given resolved model variables such as ocean current strength, temperature, and salinity. Yet, this approach is fraught with uncertainties. For example, a 2002 study in Science ¹, reported a strengthening of the Southern Hemisphere Westerlies over the preceding decades. Subsequently, several climate models predicted that this intensification of surface winds would bring carbon-rich water from the deep ocean to the surface, raising atmospheric carbon dioxide levels and exacerbating climate warming. This prediction, however, was challenged when higher-resolution ocean-only simulations indicated that ocean eddies mitigate the effects of stronger winds on upwelling, thus reducing additional carbon dioxide release. The parameterized changes in eddy activity in climate models turned out to be inaccurate, a problem that persists to this day.

The CliMA group at MIT made great strides in addressing this challenge by developing an ocean model that takes advantage of the rapid advancement of computational power through Graphic Processing Units (GPUs). The new model can be run with meshes as fine as 10 km for multi-decadal climate simulations. At this resolution, ocean eddies are relatively well resolved, obviating the need for parameterizing them. Fig. 1 shows the surface velocity from such a 10 km resolution simulation and eddies are clearly visible as round coherent features. There are valuable broader lessons in how we achieved this performance gain.

Figure 1. Visualization of the ocean’s surface speed in an eddying simulation with CliMA-Ocean at 1/12 degree resolution.

GPUs, specialized processors designed to handle complex mathematical and graphical calculations, harness their speed from thousands of cores executing multiple tasks concurrently. Their performance is increasing exponentially, at a rate outpacing that of CPUs. This underscores the necessity of designing climate models optimized for GPU architectures to take advantage of their computational performance. However, adoption of GPUs within the ocean and climate modeling community has been hindered by the need to rethink algorithms to fully leverage the potential of GPUs; fully harnessing their potential requires more than a mere translation of legacy CPU codes.

The ocean component of the CliMA model is based on Oceananigans, a finite-volume solver of the dynamical equations governing incompressible flows, written in the Julia programming language. Oceananigans has been built from the ground up to be fast on GPUs and easy to use. Starting from a clean slate allowed us to adopt model design choices customized for GPUs, which often differed from standard ocean modeling practice. Both the model structure and numerical algorithms have been formulated to take advantage of the many parallel cores provided by GPUs, while being mindful that the high-bandwidth memory to which GPUs have access is limited. An effective strategy to leverage these resources with a finite-volume code was to maximize the number of grid points processed by each GPU. This approach significantly reduces the model’s memory footprint, at the cost of higher arithmetic load from recalculating variables that would be typically pre-computed and stored on CPUs. We reduced the need for temporary storage of intermediate variables by incorporating functions into other functions (aggressive inlining) and combining multiple computational tasks into single units (kernel fusion). This allowed us to handle seemingly large computations such as eddying ocean simulations, which may require as many as one billion grid cells, with as few as 8 GPUs.

Parallelization across multiple GPUs, necessary for global climate simulations, also required careful design because data exchange across GPUs can easily become a bottleneck in computational workflows. To address this, we optimized our algorithms for GPUs, leveraging asynchronous memory copies to achieve excellent scalability by minimizing communication latency.

These algorithmic choices illustrate that code optimization on GPUs requires careful design of the entire code. We achieved this by writing a new model from scratch; starting from a legacy model designed for CPUs would have been much more difficult.

The careful design choices have paid dividends. Oceananigans is the world’s fastest ocean model, capable of achieving previously unthinkable performance. As an example, the largest global ocean simulation run 10 years ago, with a grid-size of two kilometers corresponding to around 10 billion computational cells, required the use of the whole NASA Pleiades supercomputer with 70,000 CPU cores to achieve 0.05 simulated years per wall-clock-day. Oceananigans achieves the same time-to-solution on the same problem size on only 32 GPUs, consuming roughly a factor 40 less power than a CPU-based supercomputer achieving similar performance. To put this number into context, current supercomputers (such as Perlmutter, Piz Daint, or Frontier) have up to O(10,000 GPUs), enabling several hundred concurrent eddy-resolving ocean simulations to broadly explore future climate scenarios (Fig. 2).

Figure 2. Scaling of CliMA-Ocean on multiple GPUs on the DOE’s Perlmutter cluster. The numbers near the curves show the horizontal resolution of the model. The green line shows the performance of the 1/12 degree solution visualized in Fig. 1.

References:
[1] Thompson, David W. J. and Susan Solomon. “Interpretation of Recent Southern Hemisphere Climate Change.” Science 296 (2002): 895 – 899.