xarray
Installation
SKILL.md
xarray - Multi-Dimensional Geoscience Data
Quick Reference
import xarray as xr
# Read
ds = xr.open_dataset('data.nc')
# Access data
temp = ds['temperature'] # DataArray
values = temp.values # numpy array
df = ds.to_dataframe() # pandas DataFrame
# Structure info
print(ds) # Overview
print(ds.dims) # Dimensions
print(ds.data_vars) # Variables
# Write
ds.to_netcdf('output.nc')
Key Classes
| Class | Purpose |
|---|---|
Dataset |
Collection of aligned DataArrays (like NetCDF file) |
DataArray |
Single variable with labeled dimensions |
Coordinates |
Dimension labels (time, lat, lon) |
Essential Operations
Select Data
# By coordinate value
temp_jan = ds['temperature'].sel(time='2020-01-15')
temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))
# Nearest value
temp_point = ds['temperature'].sel(lat=35.5, lon=-120.3, method='nearest')
# By index
temp_first = ds['temperature'].isel(time=0)
Compute Statistics
temp = ds['temperature']
temp_mean_time = temp.mean(dim='time') # Spatial map
temp_mean_space = temp.mean(dim=['lat', 'lon']) # Time series
# Area-weighted mean
import numpy as np
weights = np.cos(np.deg2rad(ds.lat))
temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])
GroupBy and Resample
temp = ds['temperature']
# Temporal aggregations
monthly_mean = temp.groupby('time.month').mean()
annual_mean = temp.groupby('time.year').mean()
# Climatology and anomalies
climatology = temp.groupby('time.month').mean('time')
anomalies = temp.groupby('time.month') - climatology
# Resample time series
monthly = temp.resample(time='1M').mean()
rolling_30d = temp.rolling(time=30, center=True).mean()
Create New Dataset
import numpy as np
import pandas as pd
times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)
da = xr.DataArray(
data=np.random.randn(365, 180, 360),
dims=['time', 'lat', 'lon'],
coords={'time': times, 'lat': lats, 'lon': lons},
attrs={'units': 'degC', 'long_name': 'Temperature'}
)
ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')
Masking
temp_warm = temp.where(temp > 20) # Mask by condition
temp_clipped = temp.where(temp > 0, 0) # Replace negative with 0
tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True) # Mask by coordinate
Large Datasets (Dask)
# Open with chunking (lazy loading)
ds = xr.open_dataset('large_file.nc', chunks={'time': 100})
ds = xr.open_mfdataset('data_*.nc', chunks='auto')
# Operations are lazy until .compute()
result = ds['temperature'].mean(dim='time').compute()
When to Use vs Alternatives
| Tool | Best For | Limitations |
|---|---|---|
| xarray | Labeled multi-dim arrays, NetCDF/Zarr, Dask integration | Learning curve for newcomers from numpy |
| iris | Met Office climate workflows, UGRID mesh support | Smaller community, UK-centric conventions |
| CDO | Fast command-line climate data operations | Not Python-native, limited custom analysis |
| NCO | Quick NetCDF file manipulation and arithmetic | Command-line only, no visualization |
Use xarray when you need labeled dimension handling, seamless NetCDF/Zarr I/O, groupby/resample operations, or Dask-based parallel processing of large datasets.
Consider alternatives when you need fast one-off command-line operations on NetCDF files (use CDO/NCO), or you work within the Met Office ecosystem with UGRID meshes (use iris).
Common Workflows
Climate data analysis with temporal aggregation
- Open NetCDF dataset with
xr.open_dataset()(usechunks=if large) - Inspect dimensions, coordinates, and variables with
print(ds) - Select region of interest with
.sel(lat=slice(), lon=slice()) - Compute climatology with
.groupby('time.month').mean('time') - Calculate anomalies by subtracting climatology from data
- Compute area-weighted spatial mean using cosine latitude weights
- Resample to desired temporal resolution (monthly, annual)
- Save results to NetCDF with
.to_netcdf()
Common Issues
| Issue | Solution |
|---|---|
| Memory error | Use chunks= for lazy loading |
| Time decoding fails | decode_times=False then manual decode |
| Missing coordinates | Check ds.coords and ds.dims |
| Alignment errors | Check coordinate values match |
References
- I/O Formats - NetCDF, Zarr, and other formats
- Computation - Aggregation and analysis methods
Scripts
- scripts/climate_analysis.py - Climate data analysis
Related skills