Accurate rainfall forecasting is essential for agriculture, water resource management, and disaster preparedness. Numerical weather prediction (NWP) models, even state-of-the-art models, are known to struggle to produce skillful rainfall forecasts in tropical regions of Africa. See for example this study.
Over the last decade or so, the increased availability of large-scale meteorological datasets and the development of powerful machine learning models have opened up new opportunities for weather forecasting.
As a proof of concept, focusing on Ghana in West Africa (see map below), we use these tools to develop a model to forecast 24h rainfall at 12h and 18h lead-time. The models that we obtain noticeably outperform the state-of-the-art NWF model of the European Center for Medium Range Weather Forecasting (ECMWF).
This is the Area of Interest (AOI) for our study.
We collect data over Ghana from the following sources from June 1st 2000, to September 30th, 2021. Additional variables used includes the time of the year, and the latitude and longitude coordinates.
We collect a dataset \( \{({\bf x}_{t-h},{\bf y}_t),\; 1\leq t\leq N\}\) as described above where \( {\bf x}_{t-h}\in \mathbb{R}^{57\times 64\times 64}\) (ERA5 input collected h=12hours or h=18hours before date t), \( {\bf y}_t\in \mathbb{R}^{64\times 64}\) (GPM-IMERG rainfall data for date t). We fit the regresson model \[ {\bf y}_t = {\cal F}_W({\bf x}_{t-h}) + \epsilon_t,\] where \( {\cal F}_W\) is a U-Net model, a type of neural network artchitecture (depicted below) to predict the rainfall images from meteorological images.
After training, we evaluate our models on a test dataset by comparing their predictions with actual rainfall amounts as obtained from GPM-IMERG. Specifically, if \( \widehat{W}\) is the estimated model parameter and given a new test data point \( {\bf x}_{t'-h} \), we predict the rainfall image across the AOI at time \(t'\) using \[\hat {\bf y}_{t'} = \mathcal{F}_{\widehat{W}}\left({\bf x}_{t'-h}\right).\]
We evaluate and compare the following models:
| MAE | Sd MAE | |
|---|---|---|
| CLIM | 3.90 | 1.15 |
| NWP | 3.92 | 1.00 |
| UNET_18 | 3.81 | 1.22 |
| UNET_12 | 3.74 | 1.13 |
| HYB | 3.69 | 1.01 |
If \(\hat{y}_{it'}\) is the prediction of \(y_{it'}\) at a pixel \(i\in[64]\times [64]\), we also use a very interesting methodology developed here to estimate the conditional cumulative distribution function of \(y_{it'}\) given \(\hat{y}_{it'}\). This allows to produce confidence intervals in the prediction, and also to compare different methods using the continuous ranked proper scoring (CRPS) \[{\rm CRPS}(F,y) = \int_{-\infty}^{+\infty}\left(F(u) - \textbf{1}_{\{y\leq u\}}\right)^2 du\]
If \(\hat{F}_{i,t'}^{(m)}(\cdot)\) denotes the estimated cdf for method \(m\) at time \(t'\) and pixel \(i\). The CRPS of model \(m\) at pixel \(i\) is defined as: \[ {\rm CRPS}(i,m) = \frac{1}{|\mathcal{D}^{''}|}\sum_{t'\in\mathcal{D}^{''}} {\rm CRPS}(\hat{F}_{i,t'}^{(m)}, y_{i,t'}).\] Using the CLIM prediction as a reference, we compute the pixel-by-pixel CRPS error skill of model \(m\) as \[ \rm{Skill}(i,m) = \frac{ {\rm CRPS}(i,{\rm CLIM}) - {\rm CRPS}(i,m)}{{\rm CRPS}(i,{\rm CLIM})}, \]
The maps below shows the CRPS values (first row) and CRPS skill scores accross the AOI. Positive values means a performance better than the CLIM model. We see that our model performs noticeably better than the NWP.
CRPS values (first row) and CRPS skill scores accross the AOI. Positive values means a performance better than the CLIM model.
> We also compare the models in their ability to correctly predict whether the upcoming day is a rainy (total rainfall \(>0.5mm\)). In the dataset collected 34.9% of all pixels have rainfall above 0.5mm, and 46.1% of rainy pixels (i.e., pixels exceeding 0mm) exceed 0.5mm.
Given a threshold level \(\tau\), and a model \(m\in\{ \rm{CLIM, UNET_{12}, UNET_{18}, NWP, HYB}\}\), its precision and recall at pixel \(i\) are defined respectively as \[ \mathcal{P}_i(m) = \frac{\sum_{t'\in\mathcal{D}^{''}}\textbf{1}_{\left\{|\hat y_{it'}|>\tau\right\}}\textbf{1}_{\left\{|y_{it'}|>\tau\right\}}}{\sum_{t'\in\mathcal{D}^{''}}\textbf{1}_{\{|\hat y_{it'}|>\tau\}}},\] and \[ \mathcal{R}_i(m) = \frac{\sum_{t'\in\mathcal{D}^{''}}\textbf{1}_{\{|\hat y_{it'}|>\tau\}}\textbf{1}_{\{|y_{it'}|>\tau\}}}{\sum_{t'\in\mathcal{D}^{''}}\textbf{1}_{\{|y_{it'}|>\tau\}}}. \] The figures below shows the precision values (first row) and recall (second row) across the area at threshold \(\tau = 0.5mm\). Again our method comes up on top in the comparison.
Precision values (first row) and recall (second row) across the AOI at threshold tau=0.5.
We also compare the models in their ability to correctly predict upcoming heavy rainfall (total rainfall \(>10mm\)). We compute the same precision and recall defined above at threshold \(\tau=10mm\). Such amount of rainfall in 24h is relatively rare in the area. About 13.2% of rainy days record respectively larger than 10mm.
Precision values (first row) and recall (second row) across the AOI.
We also develop a statistical methodology to probe the relative importance of the meteorological variables used as input in our model, leading to useful insights into the factors driving precipitation in the Ghana.