Solution to problems with xarray and GOTM output

109 views
Skip to first unread message

Markus Reinert

unread,
Nov 3, 2021, 5:21:45 AM11/3/21
to GOTM-users
Dear GOTM users,

if you are using Python to analyze GOTM results, the following will be of interest to you.

When I wanted to load a NetCDF file created by GOTM with the Python library xarray, I was confronted with a MissingDimensionsError because the GOTM output uses the same names (z and z_i) for dimensions and for variables. While this is allowed in the NetCDF standard, xarray cannot handle it. Fortunately, there is a fast way around this problem, which consists of renaming the variables with the following command:
ncrename -v z,z_coord -v zi,zi_coord output.nc output_rename.nc
where output.nc is the name of the GOTM output file, and output_rename.nc is the name of the new created file with renamed variables [1]. The new created NetCDF file can then be loaded with xarray as usual, e.g. ds = xr.open_dataset("output_rename.nc").

If the vertical coordinates in the dataset (called ds in the following) are in fact time-independent, which you can check in the following way:

# Check that z-coordinate is independent of time
for t in range(ds.time.size):
    assert np.all(ds.z_coord.isel(time=t) == ds.z_coord.isel(time=0))
    assert np.all(ds.zi_coord.isel(time=t) == ds.zi_coord.isel(time=0))

Then you can assign the correct z-values to the z-dimension of the dataset with this code:

# Assign the z-coordinate of the dataset
ds["z"] = ds.z_coord.isel(time=0).isel(lon=0, lat=0)
ds["zi"] = ds.zi_coord.isel(time=0).isel(lon=0, lat=0)

After doing this transformation, you can use the usual plotting commands of xarray like ds.salt.plot(x="time") and they will show the correct vertical axis, just like you would expect.

Best regards
Markus


Karsten Bolding

unread,
Nov 3, 2021, 5:40:14 AM11/3/21
to gotm-...@googlegroups.com
Hi Markus

Thanks for the report.

What happens if z is in fact time dependent - is there a solution then?

Karsten

--
You received this message because you are subscribed to the Google Groups "GOTM-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gotm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gotm-users/bfa0ea13-1dc5-4d1e-8215-6ecec6cd5940n%40googlegroups.com.


--

Martin Schmidt

unread,
Nov 3, 2021, 8:37:34 AM11/3/21
to gotm-...@googlegroups.com

Dear Markus,

this should be a shortcoming of xarray. The netcdf library permits dedicated access to dimensions, axes and data. So the names of them should be fully by the way. May be it helps to include the xarray developers into the discussion and to let them know about the problem.

You write:

# Check that z-coordinate is independent of time

As far as I understood netcdf, coordinates cannot depend on each other. So the z-coordinate cannot depend on time. The grid dimension must be fixed too. I know what you mean, in many ocean models the vertical coordinate can be time dependent. Modellers still call it "coordinate". But in the sense of the netcdf data model I would not call it a coordinate any more, but a variable. This variable stores the information on the time depended depth of grid points and helps to overcome the limitation of the netcdf data model. So reading these data like coordinates should be the source of the confusion.

I do not know xarray. If the xarray syntax for plots in general coordinates requires to read the variable z-values like coordinates in the sense of the netcdf data model I would consider it as a programming error. The developers should think about this. But may be the problem arises from that time dependend vertical coordinates are not coordinates in the sense of netcdf but variables.

Cheers,

Martin

Bjarne Büchmann

unread,
Nov 3, 2021, 9:02:51 AM11/3/21
to gotm-...@googlegroups.com

> What happens if z is in fact time dependent - is there a solution then?

 

For netcdf, it is allowed (in fact recommended) that so-called “coordinate variables” are named as the dimensions. They *must* then have same size as the dimensions, and just span that dimension. Thus, z should be z[z] – and not z[z,t] (disallowed).

 

Apart from this, the proposed solution (hack, really) should still stand.

 

Best,

 

Bjarne

 

Joint GeoMETOC Support Center

TLF: +45 7281 5624 ; MOBIL: +45 4078 7320

To view this discussion on the web visit https://groups.google.com/d/msgid/gotm-users/CAFvy56X-%3DgG22mrwT%2BzOVpB254d2zdKxqGzkFrYOiq%3DunqQefw%40mail.gmail.com.

 


This email was scanned by Bitdefender

Markus Reinert

unread,
Nov 5, 2021, 5:26:58 AM11/5/21
to GOTM-users
Hi Karsten, Martin, and Bjarne,

thanks for your follow-up questions and explanations.  In the following, I answer your questions and explain the problem in more detail.  In short:  I don't suggest to change GOTM, because it might cause problems for other users, and I don't think xarray will remove this limitation, because I assume they had their reasons to do it this way.  That's why I proposed the workaround.

Regarding Karsten's question:  If the z-coordinate in the model is time-dependent, then the first part of my answer with ncrename works nevertheless.  After ncrename, you can load the dataset into xarray as usual, but you will not have all the nice features that you would expect from xarray.  Instead, xarray would behave much more like the classical "netCDF4" Python library.  In particular, if you use the plot-command of xarray, the result will not have the z-axis in meter.  So you must go the manual way of plotting the data using, e.g., pcolormesh from the matplotlib library.  Note that when you copy all the Python code in my email into a script and run it with a dataset where the z-coordinate changes over time, then an AssertionError will be thrown in the for-loop.  That's my way of causing the script to stop, because otherwise the code labelled "Assign the z-coordinate of the dataset" would remove the time-dependence.

The reason, why xarray cannot handle "model coordinates" that are time-dependent, is in Bjarne's answer.  Any "coordinate" of a NetCDF dataset may only have one dimension, expressed by z[z].  When the "model coordinates" depend on more than one dimension, for example z[z,t] like in GOTM or z[x,y,z,t] like in GETM, then these "model coordinates" are only regular "variables" and not "coordinate variables" in the NetCDF file.  Yes, the names are confusing, as Martin pointed out; maybe my code would be clearer by distinguishing between coordinates of the model and of the dataset.  Nevertheless, the GOTM output files work fine for netCDF4 or pyncview, but xarray does not allow variables that sound like coordinates but do not behave like NetCDF coordinates.  In GOTM files, the variables z and zi sound like coordinates, because dimensions with the same name exist, but they cannot be handled like coordinates because they have an explicit time-dependence.

Martin is correct in saying that this is a short-coming of xarray.  I have no doubt that the developers are aware of this, because they implemented a dedicated error message for this case.  So I don't think it is necessary to tell them about it.  If we think that xarray is used by a big part of GOTM-users, then we could consider changing the GOTM output slightly to prevent this problem with xarray.  But this will probably break workflows or other scripts that expect the GOTM variables to have the names they currently have.  So that's why I propose to use my "hack" for xarray.

Note that this problem does not occur in GETM, because the vertical dimension is called zc (instead of z) and there is no zi.

Best regards,
Markus

Brandon Reichl - NOAA Federal

unread,
Nov 8, 2021, 4:34:53 PM11/8/21
to GOTM-users
Hi Markus and others,

I wanted to chime in quickly if folks are not aware, but there is already an open issue on xarray's github page documenting the issue and explicitly referencing GOTM output as an example (https://github.com/pydata/xarray/issues/2368). It is part of a much larger discussion, so you need to scroll down the page to find this particular issue.  If you search for "GOTM" in the page you should find it right away (November 30, 2018 is the timestamp).

If you follow the discussion, you can see that they are not only aware, but also had a solution in the works.  It looks like that solution code has not been updated in 3 years, so I think it is incomplete and would need some effort to finish.

Best,
Brandon
Reply all
Reply to author
Forward
0 new messages