Data-driven rainfall prediction at a regional scale: a case study with Ghana

Indrajit Kalita*,a, Lucia Vilallongab, Yves Atchadea,b
aFaculty of Computing and Data Sciences, Boston University, Boston, USA
bDepartment of Mathematics and Statistics, Boston University, Boston, USA

*Indicates corresponding author

Accurate rainfall forecasting is essential for agriculture, water resource management, and disaster preparedness. Numerical weather prediction (NWP) models, even state-of-the-art models, are known to struggle to produce skillful rainfall forecasts in tropical regions of Africa. See for example this study.

Over the last decade or so, the increased availability of large-scale meteorological datasets and the development of powerful machine learning models have opened up new opportunities for weather forecasting. As a proof of concept, focusing on Ghana in West Africa, we explore the potential of these tools to predict 24h rainfall at 12h and 30h lead-time.

We trained a deep neural network for predicting rainfall over Ghana. We found that our 12h lead-time model has performances that match, and in some accounts are better than the 18h lead-time forecasts produced by the European Center for Mid-range Weather Forecasting (ECMWF). We also found that combining our data-driven model with classical NWP further improves forecast accuracy.

We give a brief description of the study below. To learn more read our paper here.

Area of Interest (AOI)

This is the Area of Interest (AOI) for our study.

Data sets

We collected data over Ghana from the following sources from June 1st 2000, to September 30th, 2021. Additional variables used includes the time of the year, and the latitude/longitude coordinates.

  • GPM-IMERG Precipitation Data: Satellite data providing global rainfall measurements. We collected daily rainfall measurements over Ghana, regridded to 64x64 images. We treated these images as ground truth.
  • ERA5 Meteorological Variables: a database of environmental and meteorological data from ECMWF. As predictors, we collected 55 variables from ERA5 (wind, temperature, pressure, etc) at 12h and 30h before the 24h rainfall window. Regridded to 64x64 images.
  • TIGGE Forecast Data: ECMWF's rainfall forecasts used for comparison, providing 18-hour lead-time predictions regridded to 64x64 images.

Methodology

We trained a U-Net, a type of neural network artchitecture (depicted below) to predict the rainfall images from meteorological images.

After training, we evaluated our models by comparing their predictions with GPM-IMERG rainfall amounts.

Block Diagram of DL Architecture
Figure 1: Block diagram of the proposed DL architecture for regional precipitation forecasting.

Models Compared

We evaluate and compare the following models:

  • UNET12: A trained U-Net model that uses meteorological variables at 6PM to predict 24h rainfall starting 6AM next day.
  • UNET30: A trained U-Net model that uses meteorological variables at midnight to predict 24h rainfall starting 30h later.
  • NWP: The 18-hour lead-time predictions from the ECMWF model, obtained from the TIGGE database.
  • Ens: An ensemble model that averages (weighted) predictions from UNET12 and NWP.
  • CLIM: A reference model based on climatological averages.

Examples of forecasts

Below are the sample forecasts (NWP, CLIM, UNET30, UNET12, and Ens), along with the corresponding ground truth (GT) from GPM-IMERG:

Forecast Comparison Image 1
Forecast Comparison Image 2
Forecast Comparison Image 3

Mean absolute errors skills

Mean Absolute Error (MAE)
Skill Map

This map shows the skill values accross the area. Positive values means a performance better than the traditional NWP model.

Comparison in terms of rain detection

Mean Absolute Error (MAE)
Precision and Recall (Threshold = 0.5)

The Fig 5 & 6 show the precision and recall skills in detecting rainfall at 0.5mm. Positive values means a performance better than the traditional NWP model.

Model interpretation

We also develop a statistical methodology to probe the relative importance of the meteorological variables used as input in our model, leading to useful insights into the factors driving precipitation in the Ghana.

Important Variables
  • The results show that the most important predictive variable in the U-Net model is the space-time variable. This is hardly surprising since rainfall in Ghana is strongly seasonal, with seasons that vary with latitude.
  • Evaporation drives rainfall, and our method indeed highlights specific humidity (𝑞925), relative humidity (𝑟950) and total column water vapor (𝑡𝑐𝑤𝑣) as important inputs.
  • The variable wind (𝑢300) also appears important. This is possibly related to the African Easterly Jet (AEJ), which plays an important role in the West African monsoon.
  • Our methodology also highlights several convection-related ariables: convective inhibition (𝑐𝑖𝑛), K-index (𝑘𝑥) and the convective available potential energy 358 (𝑐𝑎𝑝𝑒) as key input variables.