ALL THINGS WATER

Handling and Manipulating netCDF file in python

image

NetCDF is a widely used data storage format, capable of storing high-dimensional, array-oriented data. Climatic variables such as precipitation, temperature , soil moisture, atmospheric concentrations and many more that are produced from sources like satellite observations, reanalysis, climate models etc. requires spatio-temporal analysis and hence are often stored in netcdf .nc file format.

structure
Fig 1: Structure of a netcdf file

This blog helps to get started with xarray - a powerful tool for reading, writing and manipulating N-dimensional data files and is particularly emphasized on handling netcdf files. xarray is very much similar to pandas, so if you already have a basic grip over pandas, it will be a lot more intuitive.

LETS CODE!

First of all, import all the required packages

import numpy as np                        # numerical computations
from matplotlib import pyplot as plt      # visualisation
import xarray as xr                       # for netcdf
import pandas as pd                       
import os

If you don’t have xarray installed, use pip or conda installation:

pip install xarray
conda install -c conda-forge xarray dask netCDF4 bottleneck

Load and have a look at the data

Next, we load the data using the xr.open_dataset() function. Here, we are using precipitation data from APHRODITE. You can change the directory to your own .nc file.

ds = xr.open_dataset(".\data\APHRO_MA_025deg_V1901.2000.nc")
ds

ds

We have loaded the netcdf file as an xarray dataset. The dataset summary can be viewed by simply runnning ds. It shows that our file has 3 dimensions: longitude (lon), latitude (lat) and time. The Coordinates section provide values of the dimensions. The Data variables shows 2 variables, but we are interested in the precip variable which stores the values for precipitation amount. The variables can be returned as an array by simply using ds.variable_name or ds["variable_name"].

ds["time"]

image

ds.precip

image

Quick plotting

We can quickly plot the data for a single day.

pr_plot = plt.imshow(ds.precip[180],origin="lower")
plt.colorbar(pr_plot)

quick_plot

Since our file stores spatial data for each time step (every day of year 2000), we first select a single day. In the code above, ds.precip[180] selects the 179th day of the year (remember that python indexing starts with a 0). The argument origin="lower" is used for matrix plotting that usually starts at lower corner.

Combining files with same spatial extent but different time periods

The file we are using have data for a single year only. Most of the time, we may have multiple files for different time periods (say for each year) but havin the same spatial extent. Processing each file repeatedly consumes time and is inefficient. xarray provides a function for merging such multiple files together into a continuous spatio-temporal series.

files=[os.path.join("full_path_to_folder",f) for f in os.listdir(".\data")]

ds = xr.open_mfdataset(files,combine = 'by_coords', concat_dim ="time")
ds

ds_merge

We store the full path of all the .nc files that we want to merge in the files variable. Next, merge all the files by co-ordinates (combine argument). Notice that the number of longitute and latitude are same as the previous dataset but the time variable has increased covering years 2000-2015. Also note that the data variable precip is now a dask.array instead of a data.array.

Clip using spatial extent and time

We can clip the file to contain only a specific area or spanning only a subset of time using the sel and slice functions. Make sure that the you are using the name of the coordinates (instead of lat,lon and time) for your own .nc file.

start_date = "2005-01-01"
end_date = "2008-12-31"
clip_ds=ds.sel(lon=slice(79,90),lat=slice(25,31),time=slice(start_date,end_date))

The clipped file can again be saved as a netCDF file.

clip_ds.to_netcdf('./data/Clipped.nc')

Extract values for single point

We can also extract time-series data for a particular point location of interest, maybe at the location of ground-station, and export it as a csv file or plot it.

stn_lat=29.3689
stn_lon=81.2062
stn_val=ds.precip.sel(lon=stn_lon,lat=stn_lat,method='nearest')

df= stn_val.to_dataframe()
df.head()
df.to_csv('.\data\stn_precip.csv')

df

stn_val.plot()

image

I hope this post helps to get you started with the powerful xarray package for handling and manipulating netCDF files. The xarray package provides more advanced and powerful functionalities with user-friendly and extensive documentation, so do check them out at https://docs.xarray.dev/en/stable/index.html.

Until next time!

-Nischal Karki