Machine-learning based river models for climate science

Land surface water routing describes how water flows on hillslopes and through river channels on the Earth’s surface. At a very high level, these processes close the water cycle on Earth and therefore are also required in climate models. Rivers provide freshwater to the ocean at river deltas, play a pivotal role in local wild environments, supply communities with water, and present flood risks during periods of high flow. Modeling river flow and other surface routing processes is an important task for any Earth system model.

**Figure 1:** The river network of South America (from HydroSHEDS), illustrating the concept of a river basin. A single large river basin like the Amazon (top left) can be broken up into smaller basins, including the Madre de Dios Basin (bottom left), which can then be broken up into even smaller basins, like the Upper Madre de Dios basin (top right), and so on to smaller and smaller basins (e.g., bottom right).

Many river models used in land surface models are rooted in the physical laws governing the flow of shallow water across a complex topographical surface [1]; these laws are then discretized over the river network (Figure 1), typically with a spatial resolution ranging from ~3 km to 50 km [e.g., [2]. This representation introduces errors, because it ignores the effect of small-scale topograhy, soil type, vegetation cover, and other “sub-basin” features which affect flow. Because of this, these models often must be calibrated for different river networks independently; a consequence is that the models may not perform well in other regions where they were not calibrated. Indeed, a longstanding challenge in hydrological modeling is that of basin generalizability, also known as model performance in ungauged basins [3].

A promising new approach for modeling streamflow is based on machine learning models called long-short-term-memory recurrent neural networks, or LSTMs. When provided with features about each basin (such as soil type, mean slope, local weather, etc), these models are able to encode that information and use it to predict how streamflow depends on local features [4]. These local-environment-aware models can outperform physical models and generalize to basins not used in training of the model [5,6,7]. One crucial feature of existing models of this type is that they map precipitation to streamflow, and hence simulate all hydrological processes on land (including snow, soil moisture, evapotranspiration, etc.) On the other hand, land surface models in climate models typically simulate each of these processes using physical laws, and predict the water runoff from the land. River models in land surface models then route runoff to streamflow. Figure 2 demonstrates the two types of runoff modeled by land surface models, and the corresponding observed streamflow for a particular basin. Finally, many of these LSTMs were only tested in the USA, rather than globally, while river models in land surface models must be applied globally.

**Figure 2:** Surface runoff and subsurface runoff for a basin [8], and the corresponding streamflow in the same basin as measured at a downstream gauge (data from [9]). River models must capture the delay between the production of runoff and streamflow due to that runoff. Land models must capture the delay of runoff generation from precipitation or snowmelt. The delay can be due to the time to travel down a hillslope to a stream, as for surface runoff, or due to longer-term residency within the soil, as for sub-surface runoff). Figure taken from [7].

Recently, in [7], we have taken concrete steps to use an LSTM model as the river model within a land surface model: first, we retrained the LSTM using runoff as input, rather than precipitation, and second, we trained and tested it using basins globally, rather than just in a specific region. Our experiments were of two designs: “time-split”, which trains a model using a set of basins during one period of time and tests the model on the same basins on a different period in time, or “basin-split”, which trains using a set of basins and tests the model on different basins. For incorporation into the land surface model, the model must perform well using runoff as input at both of these tasks.

Figure 3 shows results of our approach for 5 different model experiments. First, we show the results for a model trained only in the USA with the time-split experiment, but either trained with modeled runoff as input (solid light blue), or with modeled precipitation as input (solid purple). These models perform very similarly to each other (similar area under the curves), showing that runoff can be used as input for the models instead of precipitation [10].

Second, we show that a model trained with basins globally in a time-split configuration (solid red) can outperform a model trained only with basins from the USA (solid light blue). We hypothesize that this is due to an increase in performance when the model is shown more varied training data; this is a positive result for applying these models globally.

Finally, we show that when moving from a time-split experiment to a basin-split experiment, either using basins globally or in the USA (dashed lines – basin-split), there is a performance loss. Again, this is expected due to the challenge of predicting flow in basins unseen during training (“performance in ungauged basins”). The model run that most reflects our intended use case is the basin-split global modal (dashed red line); this is because our river model must be evaluated globally within the land surface model, and the majority of basins worldwide are ungauged. While this is the worst performing model out of our test suite, we show that it still outperforms physical models of river flow [7].

Based on our results, we believe that machine-learning based models with an awareness of local features of basins are a promised route for improving river models in land surface models. We are currently working to integrate this model with the ClimaLand land surface model.

**Figure 3:** Performance of the LSTM-based models created in our work. The x-axis shows the Nash-Sutcliffe-Effiicency (NSE), which is a measure of performance of a model. An NSE=1 would be a perfect model. The y-axis shows the cumulative distribution function (CDF) of NSE over all the test basins. A perfect model would have a CDF of zero until an NSE of 1 (i.e., a perfect score on all basins), and then have a CDF=1. The main takeaway is that a curve with less area under it reflects a better model, as denoted by the arrow and labels “Worse” and “Better” models. From [7].

References:

[1] Shaad, K.: Evolution of river-routing schemes in macro-scale models and their potential for watershed management, Hydrological Sciences Journal, 63, 1062–1077 (2018)

[2] Li, H., et al. “A physically based runoff routing model for land surface and earth system models.” Journal of Hydrometeorology 14.3 (2013): 808-828.

[3] Hrachowitz, M., et al. “A decade of Predictions in Ungauged Basins (PUB)—a review.” Hydrological sciences journal 58.6 (2013): 1198-1255.

[4] Kratzert, F., et al.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrology and Earth System Sciences, 22, 6005–6022 (2018)

[5] Kratzert, F., et al. : Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrology and Earth System Sciences, 23, 5089–5110 (2019)

[6] Koch, J. and Schneider, R.: Long short-term memory networks enhance rainfall-runoff modelling at the national scale of Denmark, GEUS Bulletin, 49 (2022)

[7] Lima, M. ,et al.: Toward Routing River Water in Land Surface Models with Recurrent Neural Networks, under review at HESS (2024) https://arxiv.org/abs/2404.14212

[8] Muñoz Sabater, J. (2019): ERA5-Land hourly data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.e2161bac

[9] GRDC (2020): GRDC Major River Basins. Global Runoff Data Centre. 2nd, rev. ed. Koblenz: Federal Institute of Hydrology (BfG).

[10] Note that the model performs best when the input is observed precipitation [4], rather than modeled precipitation or runoff, but observed runoff is not available.