5
\$\begingroup\$

I am trying to convert a dataset which has a Lambert Conformal Conic projection to the classic WGS 84 with Python. Unfortunately, the dataset does not have a defined EPSG code, but the general definition mentions 2 standard parallels in EPSG:9802.

The dimensions have this format (valid_time: 2432, y: 1069, x: 1069) . However, y and x do not refer to latitude and longitude, but they are an arbitrary grid, starting from 0 to 1069. It is strange that latitude and longitude are treated like variables as they have (y,x) components (excluding time). Here is their format:

Dimensions:     (valid_time: 2432, y: 1069, x: 1069)
Coordinates:
  * valid_time  (valid_time) datetime64[ns] 19kB 2024-01-01T01:00:00 ... 2024...
    latitude    (y, x) float64 9MB ...
    longitude   (y, x) float64 9MB ...
Dimensions without coordinates: y, x
Data variables:
    tp          (valid_time, y, x)  float32 11GB ...

What I want to achieve is for the variable tp to be transformed into a 0.1*0.1 WGS 84 grid where the coordinates are the actual lat/lon values (we are talking about Europe so something like [(33,72),(-12, 35)]).

I have managed to do this manually by creating a new grid with flattened lat/lon values along the x and y axis, and linear interpolation of the data for the variable for every snapshot, but it is a very long process and has high memory demand.

import numpy as np
import xarray as xr
from scipy.interpolate import griddata
import time

ds = xr.open_dataset("original_dataset.nc")

x1, x2 = -12, 35
y1, y2 = 33, 72
dx, dy = 0.1, 0.1

tp= ds['tp']    
lat = ds['latitude']
lon = ds['longitude']
lon = ((lon + 180) % 360) - 180  
lon_new = np.arange(np.floor(x1), np.ceil(x2) + dx, dx)
lat_new = np.arange(np.floor(y1), np.ceil(y2) + dy, dy)
lon_grid, lat_grid = np.meshgrid(lon_new, lat_new)
snapshots = ds.valid_time.size
grid = np.empty((snapshots, len(lat_new), len(lon_new)))

total_time = 0

for t in range(snapshots):
    start_time = time.time()  
    
    lat_flat = lat.values.flatten()
    lon_flat = lon.values.flatten()
    tp_flat = tp.values[t, :, :].flatten()

    grid[t, :, :] = griddata(
        (lon_flat, lat_flat),
        tp_flat,
        (lon_grid, lat_grid),
        method='linear'
    )

    total_time += time.time() - start_time  
    print(f"Finished snapshot {t+1}/{snapshots}: {total_time:.2f} seconds")
print(f"Total interpolation time for all snapshots: {total_time:.2f} seconds")

var_da = xr.DataArray(
    grid,
    dims=['time', 'y', 'x'],
    coords={'time': ds['valid_time'][:snapshots], 'y': lat_new, 'x': lon_new},
    attrs=tp.attrs
)


ds_regrid = xr.Dataset({'tp': var_da})

ds_regrid.to_netcdf("regrided.nc")

Is there a tool or a different method I can use to speed things up? Maybe there is a way with cdo?

I would appreciate any ideas.

New contributor
Lefteris Mezilis is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Please don't update the code with feedback from answers. Doing so brings a disconnect between the question and answer for those who come to read the post later. Pleas read What to do after the question has been answered for possible alternates. \$\endgroup\$ Commented Nov 27 at 23:21
  • \$\begingroup\$ Can you post a dataset for us to try? \$\endgroup\$ Commented Nov 27 at 23:43

2 Answers 2

4
\$\begingroup\$

Here are some general coding style suggestions.

Naming

The PEP 8 style guide recommends using all capital letters for constants. For example:

dx, dy = 0.1, 0.1

would be:

DX, DY = 0.1, 0.1

These names are a bit brief. You could give them more meaningful names, such as DELTA_X (if that is what it stands for). You could also add comments to describe the purpose of the constants.

The variable name var_da is also a bit vague. If "var" stands for "variable", that is generally considered a poor choice of a name. If it stands for something else, it would be better to spell it out.

Documentation

The PEP 8 style guide recommends adding a docstring to the top of the code to summarize its purpose, for example:

  • What is the input
  • What is the output
  • How does it go about creating the output
"""
Convert a dataset which has a Lambert Conformal Conic projection
to the classic WGS 84. The dataset does not have a defined EPSG code, etc.
"""
\$\endgroup\$
4
\$\begingroup\$

size of significand

has high memory demand

I'm only seeing 1.5 MiB for each of the pair of grids; sys.getsizeof(lon_grid) reports 1_473_416 bytes.

You don't have to consume that much memory, as there are FP datatypes that could cut that to a half or a quarter as much. Consider using .astype(np.float16)

resolution

Now that you have interpolated results based on a one-tenth grid which are easily written to disk, you might retry at another resolution like 0.2 or 0.5, and compare results. Decide beforehand how much interpolation error your business use case is willing to tolerate, so you'll know when to stop adjusting the resolution.

sparsity

It's not clear if each snapshot of data is based on terrestrial observations made near populous regions, or if we have satellite observations that are uniform across sea and land for the whole grid.

Suppose there's no ocean readings. The continent is gridded out at 0.1, good. Choose a coarser grid, perhaps a 1.0 unit grid. Process a single snapshot, on the assumption that location of fixed measurement stations is fairly uniform across all snapshots.

Coarsely discretize each snapshot coordinate and .add() it to a set(). So e.g. passing (50.45, 30.52) through int() for a unit grid gives us seen.add((50, 30)). Create a new scipy.sparse.csr_matrix or similar sparse matrix, and for each seen 1 × 1 location, grid it out at 0.1 resolution. This combines an irregular collection of high resolution sub-grids. Unpopulated regions and oceans won't take up any storage space -- we focus just on the part of the map that actually has readings.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.