CHIRPS3 Diagnostics

The CHIRPS v3.0 processing stream integrates rainfall data from tens of thousands of stations globally. Each month, station observations from private and public datastreams are blended with satellite-based estimates (CHIRP3) to produce the CHIRPS v3.0 dataset. A key improvement in CHIRPS v3.0, is an increase in the number of stations used in the processing stream. CHIRPS v3.0 incorporates nearly twice as many stations as CHIRPS v2, sourced from over 90 data providers–nearly four times more sources than in CHIRPS v2. The historical and near-real-time station observations are archived in the Climate Hazards Center database. Currently, the CHC’s archive of rainfall observations is the best rapidly-updated database on the planet, with more station observations than the Global Precipitation Climatology Centre. This document gives a detailed summary of the screening process to quality control station data inputs and diagnostic tools to evaluate various aspects of CHIRP and CHIRPS. Screening routines and resulting products are described in more detail below. 

Screening and Diagnostics

 

Screening

Automated screenings and human visual inspection are applied to identify and remove potentially bad station data from the CHIRPS process. The screening process and available outputs are explained below.
 

Station Comparison

As part of our quality control effort, when we ingest new sources of station precipitation time series, we compare the historical record of each new station to other stations previously ingested into our database. We want to identify and potentially remove stations that are statistically and/or visually different from their neighbors. To this end, we developed routines to calculate correlation and difference statistics producing an index that we can use to rank the goodness-of-fit with the neighboring stations. We also generate graphics displaying the time series of the station with the neighboring stations’ time series during wet months. This allows us to visually inspect stations where the statistics suggest a poor fit with its neighbors during that station’s wettest 3-month period. Individually, we examine the plots and flag those that we suspect may contain erroneous data. We include information about elevation, climatology, geographic locations, and station sources in the plots so that we can determine if the differences we see can be explained by these variables. See the example plot below.
 
 
 
 
 
To start with, the algorithm selects a target station from our precipitation database for a given source and searches for neighboring stations within an increasing radius from the station until a minimum number of "quality" stations is found. This number is usually four, but it can be adjusted if stations are sparse in the area of interest. A quality station is defined as having enough observations (5-15 depending on the data set's temporal length) within the time series of interest. For CHIRPS, we are interested in the time period 1981 to present. At each search iteration, if the minimum number of quality stations is not found, the search radius is increased up to a maximum of 150km. If the minimum number of stations is not met at the maximum search radius, then the station is skipped and noted in a log file. When enough quality neighbors are found, the distance weighted mean (DWM) of the neighbors is calculated. Next, the correlation coefficient (R) and the difference between the target station and the DWM are calculated. The difference is then divided by the DWM to express that as a fraction of average error (FAE). These values are used to compute a composite index of the overall "badness" of the station compared to the neighbor's DWM. The badness index (BI) is calculated with:
 
BI = (1.0 - max(R, 0.0) + min(FAE, 10.0)) * 50.0
 
The BI is calculated for the 3 wettest months at the station location and added together to produce a "total badness index" (TBI) for each station. The time series of the target station (blue), and its neighbors (distance-weighted line thickness), is plotted along with the DWM (green) for each of the wettest three months. Each of these graphics contains the latitude and longitude, elevation, country name, and source name of the station. The nearby stations’ distances from the compared station, differences in elevation, station, source IDs, and station names are listed to the right of each time series plot in order of distance to the station. These values are color-coded to match the color of the time series plot line. A plot of the locations of each station selected within a 150 km radius is generated and color coded to the upper-right of each time series plot. The target station is plotted in the middle as a triangle, and the neighbors are identified geographically around it. The climatological precipitation value (CHPclim2) for each station is notched along the right side of each plot as a reference to use in determining station validity. The country the station is located in is plotted in the upper right of the graphic, along with its location and the outline of the station location box for reference. Below this is a colorized elevation map of the searched area to help determine if the topography is affecting the station comparisons. In the lower right corner of the plot, the TBI is printed. The names of these graphics files are prepended with the TBI to allow for numerical sorting by the computer’s file system. Typically, only stations with TBI greater than 100 are saved for viewing, since TBI values below that are stations are in very good agreement with their neighbors.

Identify False Zeros

The daily GTS and GSOD values undergo screening to flag potential “false zeros”. These are missing values that have been incorrectly coded as zeros and passed through the automated GSOD and GTS networks. False zeros can produce inaccurate low values in the midst of a rainy season. If a GTS or GSOD station recorded a zero value on a day when the daily CHIRP3 value was above the long-term (1991-2020) average daily rainfall intensity at that pixel, that daily station value is treated as missing instead of zero.
 
Additional tests are performed at the pentadal and monthly accumulations. If the station reported zero for pentad or monthly values, but CHIRP indicated 7 mm or more for pentadal or 20 mm or more for monthly, the station data is treated as missing.

Maps of Quality Control Excluded Stations

For each new run of CHIRPS, a number of quality control steps are performed. During this process, maps are made showing the location and number of points excluded due to:
False Zeros: Station reports a zero, but CHIRP > 7mm for a pentad, or 20mm for a month. 
Bad Z-score: a Z-score value more extreme than +/- 4.0. 
Extreme Values: Station value greater than 2000mm, or greater than 5 times CHIRP (with CHIRP being > 20mm). 
 
The monthly maps can be found here:
 
In each of these directories, the file names for the products identified are as described below:
Bad Z-score maps are listed as “bad_zscores.YYYY.MM.png”, where YYYY is the 4-digit year, and MM is the 2-digit month. The false zeros product are listed in the appropriate directory as “false_zeros.YYYY.MM.png”, and the extreme values product is identified as “stn.gt.5xchirp.YYYY.MM.png”. These graphics identify the number and location of stations which were flagged by the listed quality control test.

Reality Checks

The Reality Checks (R-Checks) process is a hands-on approach that helps produce a quality product for hazards monitoring and other scientific activities. In R-checks, a visual inspection is performed by a team of experts via the Early Warning Explorer and statistically using automated analytics. Ancillary information such as FEWS NET data sets, news reports, and government meteorological reports are frequently used in the process. The R-Checks process has been successful in: 1) Validating anomalous wet and dry events around the world as shown by CHIRPS, 2) catching inaccurate station reports that would have otherwise negatively influenced the data set, such as creating false droughts, 3) checking that the semi-automated flow of CHIRPS data creation is working correctly, 4) identifying weaknesses and strengths of the algorithm and data inputs, which helps in planning improvements for future versions. 
 
As part of this process, a unique collection of images (CHIRPS v3.0 RCHECKS) are made available through the Early Warning EXplorer (EWX). The RCHECKS images display values of stations on top of the CHIRPS map to identify values that may not align with neighboring stations or the underlying CHIRPS estimates. It can be used to identify false station data, as well as agreement between stations and the CHIRPS fields. Every month before the release of CHIRPS v3.0-final, a team of data analysts quality checks each month’s station-overlay images to identify stations that are suspect or should be eliminated. A report of this R-Checks effort is available on the CHC Wiki page each month. This report contains information that CHIRPS users may find helpful; for example, users can find notes about major rainfall events shown by the data, and validation for some. Users can utilize the CHC EWX viewer to explore these images combining stations and CHIRPS.
 
In addition to the station-overlay images, analysts also examine several measures of the new CHIRPS values over the entire time series for consistency. For an individual month, regional and global statistics means are plotted for the CHIRPS values, maximum value, standard deviation, number of pixels with rain, Z-score mean, the difference between CHIRPS and CHIRP, and the Z-score and anomalies minimum and maximums in that region for the current month. A sample graphic is below, and other sample graphics can be found here.

Diagnostics

Fill Maps

Due to satellite coverage gaps and cloud obstructions, the satellite-only precipitation estimate—CHIRP—is sometimes discontinuous. In such cases, missing infrared (IR) data is filled using unbiased ERA5 reanalysis data. This gap-filling is performed at the pentad (5-day) timescale and is also reflected in the corresponding dekad and monthly accumulations. Importantly, missing IR data in CHIRP and CHIRPS is no longer gap-filled after 2022. Any missing IR data from 2023 onward will appear as -9999 in both the CHIRP and CHIRPS products. These maps are available for download here:

Legates-Willmott-corrections

Systematic errors in precipitation measurements often arise from factors such as wind-induced gauge-undercatch, evaporation, and splashing. In CHIRPS v3.0, the Legates-Willmott correction factor is applied to gauge data to account for these errors. Monthly maps of the legates-correction factor can be found in the CHC data repository and the EWX.

Global Monthly Station Density

For every month, we provide GeoTiff files of the number of stations within each pixel for 0.05-degree resolution and 0.25-degree resolution. This product is useful for the identification of station-rich and station-poor regions of the globe. The GeoTiff files are available for download here, where you can select the resolution of interest (0.05-degree or 0.25-degree). 
Station Density maps are also captured for CHIRPS prelim and final pentads, and can be accessed here.
 
In addition to the spatial representation of the global data, we also make available three types of products that capture the station inputs, in different ways. 

Prelim-pentads

CHC provides a list of all the stations that make it into the CHIRPS v3.0 preliminary product. It is worth noting that the number of sources of station data going into CHIRPS v3.0 prelim is much more than the two sources used  in CHIRPS v2.0 prelim. This significant improvement has helped minimize discrepancies  between CHIRPS v3.0 preliminary and final data. The prelim-pentad files can be downloaded here.

List of Stations Used

CHC provides a list of all the stations that make it into the CHIRPS v3.0 final product. These files are available here. In this directory you can find files named “global.stationsUsed.v3.YYYY.MM.csv” which cover the full time series of CHIRPS v3.0. These files are updated monthly after that month’s CHIRPS v3.0 final product is released.

Stations per Country

There are two diagnostic products to evaluate stations going into CHIRPS v3.0 for an individual country. The first is a map of the stations' locations falling within a country’s boundaries, as well as neighboring areas, for every month of the historical record. You can find a listing of the mapped countries here, and then click on the country of interest to find the map for each historical month. At the top of each graphic there is also a count of the number of stations within the country, and in the mapped area (including the country’s surrounding regions) for reference. This is an example of one of the maps, displaying station support for Ethiopia in January 2025.
 
 
A second product gives a count of all stations used in CHIRPS in a country. The record covers the full time series of CHIRPSv3.0 from 1981 to present, updated monthly. The plots also compare the number of stations going into CHIRPS v3.0 with that of CHIRPS v2.0. These plots can show the trend in how many stations are reporting over time, and reveal where station support may be increasing or decreasing. 
 

This time series plot shows the changing station support over time, as some sources have decreased coverage while others have come online. Files are available for download here.