I am trying to convert a dataset which has a Lambert Conformal Conic projection to the classic WGS 84 with Python. Unfortunately, the dataset does not have a defined EPSG code, but the general definition mentions 2 standard parallels in EPSG:9802.
The dimensions have this format (valid_time: 2432, y: 1069, x: 1069) . However, y and x do not refer to latitude and longitude, but they are an arbitrary grid, starting from 0 to 1069. It is strange that latitude and longitude are treated like variables as they have (y,x) components (excluding time). Here is their format:
Dimensions: (valid_time: 2432, y: 1069, x: 1069)
Coordinates:
* valid_time (valid_time) datetime64[ns] 19kB 2024-01-01T01:00:00 ... 2024...
latitude (y, x) float64 9MB ...
longitude (y, x) float64 9MB ...
Dimensions without coordinates: y, x
Data variables:
tp (valid_time, y, x) float32 11GB ...
What I want to achieve is for the variable tp to be transformed into a 0.1*0.1 WGS 84 grid where the coordinates are the actual lat/lon values (we are talking about Europe so something like [(33,72),(-12, 35)]).
I have managed to do this manually by creating a new grid with flattened lat/lon values along the x and y axis, and linear interpolation of the data for the variable for every snapshot, but it is a very long process and has high memory demand.
import numpy as np
import xarray as xr
from scipy.interpolate import griddata
import time
ds = xr.open_dataset("original_dataset.nc")
x1, x2 = -12, 35
y1, y2 = 33, 72
dx, dy = 0.1, 0.1
tp= ds['tp']
lat = ds['latitude']
lon = ds['longitude']
lon = ((lon + 180) % 360) - 180
lon_new = np.arange(np.floor(x1), np.ceil(x2) + dx, dx)
lat_new = np.arange(np.floor(y1), np.ceil(y2) + dy, dy)
lon_grid, lat_grid = np.meshgrid(lon_new, lat_new)
snapshots = ds.valid_time.size
grid = np.empty((snapshots, len(lat_new), len(lon_new)))
total_time = 0
for t in range(snapshots):
start_time = time.time()
lat_flat = lat.values.flatten()
lon_flat = lon.values.flatten()
tp_flat = tp.values[t, :, :].flatten()
grid[t, :, :] = griddata(
(lon_flat, lat_flat),
tp_flat,
(lon_grid, lat_grid),
method='linear'
)
total_time += time.time() - start_time
print(f"Finished snapshot {t+1}/{snapshots}: {total_time:.2f} seconds")
print(f"Total interpolation time for all snapshots: {total_time:.2f} seconds")
var_da = xr.DataArray(
grid,
dims=['time', 'y', 'x'],
coords={'time': ds['valid_time'][:snapshots], 'y': lat_new, 'x': lon_new},
attrs=tp.attrs
)
ds_regrid = xr.Dataset({'tp': var_da})
ds_regrid.to_netcdf("regrided.nc")
Is there a tool or a different method I can use to speed things up? Maybe there is a way with cdo?
I would appreciate any ideas.