Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Reindexing from 2D index

31 views
Skip to first unread message

Felipe Whitaker

unread,
Jul 4, 2023, 10:11:48 PM7/4/23
to xarray
Greetings,

I started using xarray recently to work with weather data. 

Today I downloaded precipitation data from two different sources: ECMWF's SEAS5 seasonal forecast, and observation data from a local weather institute (CPTEC/INPE).

Both are in different spatial resolutions, but this was easily solved by `xr.Dataset.interp` method.

The issue I am facing is that the seasonal forecast data (description below) is indexed by (number, time, step, latitude and longitude), with a `valid_time` 2D index, and I would like to just `valid_time` as an index instead of `time` and `step`, which will enable dropping duplicates (to make less requests, I queried 31 steps forward for every month, so there is some overlap) and facilitate comparison between both datasets (observation data also has a `valid_time` non primary index).

Is there anyway I can transform `valid_time` into an index for both datasets? I have tried using `reindex`, or directly coding the correct selection of days in month to merge them instead of dropping (same effect), but I had no success.

```python
xarray.Dataset
  • Dimensions:
    • number: 51 
    • time: 12 
    • step: 31 
    • latitude: 8 
    • longitude: 9
  • Coordinates:
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
    • time
      (time)
      datetime64[ns]
      2021-01-01 ... 2021-12-01
    • step
      (step)
      timedelta64[ns]
      1 days 2 days ... 30 days 31 days
    • surface
      ()
      float64
      0.0
    • latitude
      (latitude)
      float64
      -16.0 -17.0 -18.0 ... -22.0 -23.0
    • longitude
      (longitude)
      float64
      -54.0 -53.0 -52.0 ... -47.0 -46.0
    • valid_time
      (time, step)
      datetime64[ns]
      2021-01-02 ... 2022-01-01
  • Data variables:
    • tp
      (number, time, step, latitude, longitude)
      float32
      0.0001335 0.0002556 ... 0.3486
  • Indexes: (5)
  • Attributes: (7)```

Felipe Whitaker

unread,
Jul 4, 2023, 10:44:42 PM7/4/23
to xarray
(just noticed that the previous e-mail got badly formatted, sorry)

By chance I stumbled on a past discussing (https://github.com/ecmwf/cfgrib/issues/97#issuecomment-557190695) and noticed the `backend_kwargs`. Using it to indicate that `time` dimension could be set to `valid_time` (below) seems to have done what I wanted. Thanks!

seasonal_backend = xr.open_dataset(
    seasonal_path, engine="cfgrib", backend_kwargs=dict(time_dims=("valid_time",))
)

What isn't clear to me is whether it dropped the first or the last duplicate. From what I tested (below), it seems to have dropped the last, which is exactly what I wanted. 

seasonal_no_backend = xr.open_dataset(seasonal_path, engine="cfgrib")
dup_fev = (
    seasonal_no_backend.sel(time="2021-02-01")
    .isel(number=0, latitude=0, longitude=0, step=29)
    .tp.item()
)
dup_mar = (
    seasonal_no_backend.sel(time="2021-03-01")
    .isel(number=0, latitude=0, longitude=0, step=0)
    .tp.item()
)

seasonal_backend = xr.open_dataset(
    seasonal_path, engine="cfgrib", backend_kwargs=dict(time_dims=("valid_time",))
)
valid_mar = (
    seasonal_backend.sel(valid_time="2021-03-01T00:00:00")
    .isel(number=0, latitude=0, longitude=0)
    .tp.item()
)

dup_fev, dup_mar, valid_mar  # dup_fev == valid_mar

Thanks for the project,
Felipe
Reply all
Reply to author
Forward
0 new messages