NetCDF
is a widely used data storage format, capable of storing high-dimensional, array-oriented data. Climatic variables such as precipitation, temperature , soil moisture, atmospheric concentrations and many more that are produced from sources like satellite observations, reanalysis, climate models etc. requires spatio-temporal analysis and hence are often stored in netcdf .nc
file format.
Fig 1: Structure of a netcdf file
This blog helps to get started with xarray - a powerful tool for reading, writing and manipulating N-dimensional data files and is particularly emphasized on handling netcdf
files. xarray
is very much similar to pandas
, so if you already have a basic grip over pandas
, it will be a lot more intuitive.
LETS CODE!
First of all, import all the required packages
If you don’t have xarray installed, use pip or conda installation:
Load and have a look at the data
Next, we load the data using the xr.open_dataset()
function. Here, we are using precipitation data from APHRODITE
. You can change the directory to your own .nc
file.
We have loaded the netcdf
file as an xarray
dataset. The dataset summary can be viewed by simply runnning ds
. It shows that our file has 3 dimensions: longitude (lon), latitude (lat) and time. The Coordinates
section provide values of the dimensions. The Data variables
shows 2 variables, but we are interested in the precip
variable which stores the values for precipitation amount. The variables can be returned as an array by simply using ds.variable_name
or ds["variable_name"]
.
Quick plotting
We can quickly plot the data for a single day.
Since our file stores spatial data for each time step (every day of year 2000), we first select a single day. In the code above, ds.precip[180]
selects the 179th day of the year (remember that python indexing starts with a 0). The argument origin="lower"
is used for matrix plotting that usually starts at lower corner.
Combining files with same spatial extent but different time periods
The file we are using have data for a single year only. Most of the time, we may have multiple files for different time periods (say for each year) but havin the same spatial extent. Processing each file repeatedly consumes time and is inefficient. xarray
provides a function for merging such multiple files together into a continuous spatio-temporal series.
We store the full path of all the .nc files that we want to merge in the files
variable. Next, merge all the files by co-ordinates (combine
argument). Notice that the number of longitute and latitude are same as the previous dataset but the time variable has increased covering years 2000-2015. Also note that the data variable precip
is now a dask.array
instead of a data.array
.
Clip using spatial extent and time
We can clip the file to contain only a specific area or spanning only a subset of time using the sel
and slice
functions. Make sure that the you are using the name of the coordinates (instead of lat,lon and time) for your own .nc
file.
The clipped file can again be saved as a netCDF file.
We can also extract time-series data for a particular point location of interest, maybe at the location of ground-station, and export it as a csv
file or plot it.
I hope this post helps to get you started with the powerful xarray
package for handling and manipulating netCDF
files. The xarray
package provides more advanced and powerful functionalities with user-friendly and extensive documentation, so do check them out at https://docs.xarray.dev/en/stable/index.html.
Until next time!
-Nischal Karki