OpeNDAP query via Python xarray using GLBy0.08-latest randomly fails with "NetCDF: File not found"

309 views
Skip to first unread message

Carlo Barth

unread,
Nov 5, 2019, 5:38:58 PM11/5/19
to fo...@hycom.org
Hi everyone,

I am trying to load data from GLBy0.08-latest using the python xarray package. Depending on the time of the day, but without obvious reason it fails with error message
packages/xarray/backends/common.py", line 55, in robust_getitem
    return array[key]
  File "netCDF4/_netCDF4.pyx", line 4119, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5036, in netCDF4._netCDF4.Variable._get
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: file not found
The traceback is not very helpful, as it is potentially raised at some lower level C library. If I exclude time stamps that are in the past (i.e. not forecast), the queries succeed. Here is a minimum working example in Python 3, which sometimes succeeds, and sometimes fails with above error:
from datetime import datetime, time
import pytz
import pandas as pd
import xarray as xr

# Use coordinates on the supported data grid
lat, lon = -42.68000030517578, 286.96002197265625

# Open the GLBy0.08-latest + forecast data set
ds = xr.open_dataset(
    'http://tds.hycom.org/thredds/dodsC/GLBy0.08/latest',
    decode_times=False)[dict(tau_0=0)]

# Set time interval of interest to be all times today in UTC
today = datetime.now().date()

t_i = datetime.combine(today, time(0, 0, 0)).astimezone(pytz.utc)
t_f = datetime.combine(today, time(23, 0, 0)).astimezone(pytz.utc)
print(f'Querying for time interval: ({t_i}, {t_f})')

# Figure out the starting timestamp of the time coordinate
t_ds_start = pd.Timestamp(ds.time.attrs['units'][12:]).to_pydatetime()
print('Dataset start time:', t_ds_start)

assert t_i > t_ds_start

# Convert these timestamps to hour offsets from dataset start time
t_i_val = (t_i-t_ds_start).total_seconds() / 3600.
t_f_val = (t_f-t_ds_start).total_seconds() / 3600.
print(f'Querying for hour offsets: ({t_i_val}, {t_f_val})')

# Get data subset
ds_sub = ds.sel(lat=lat,
                lon=lon,
                depth=slice(0., 25., None),
                time=slice(t_i_val, t_f_val, None))

# Start actual data acquisition by converting to data frame
df = ds_sub.to_dataframe()

Help is appreciated :)
Carlo.

P.S.: here is some version info for the python env:
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 5.0.0-27-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 4.1.1
pip: 19.3
setuptools: 40.8.0
Cython: 0.29.5
numpy: 1.16.1
scipy: 1.2.0
pyarrow: None
xarray: 0.11.3
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml: 4.3.1
bs4: None
html5lib: None
sqlalchemy: 1.2.16
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0.9.0
pandas_datareader: None

Michael McDonald

unread,
Nov 8, 2019, 1:46:00 PM11/8/19
to Carlo Barth, forum
Please describe what you are trying to obtain back from this opendap query?

* region, variables, time range, etc.

if you happen to be querying this OPENDAP url when the new data is arriving then THREDDS triggers a refresh and this might cause the momentary "NetCDF: file not found" error. You can always verify via a web browser to get more details about the error.







--
--
You received this message because you are a member of HYCOM.org
To ask a question, send an email to fo...@hycom.org



--
Michael McDonald
HYCOM.org Administrator

ca...@cageeye.no

unread,
Nov 9, 2019, 7:08:38 AM11/9/19
to HYCOM.org Forum, ca...@cageeye.no
Thank you very much for your reply Michael!

I guess the script shows quite exactly what I am trying to do and what fails. In this example, I want to load data for "today" from http://tds.hycom.org/thredds/dodsC/GLBy0.08/latest for a specific coordinate. This is of course a minimal working example. In the actual script I first make sure to pick coordinates which are on the grid, etc. For these coordinates and time range given by start/end of the day in UTC, and a depth range of 0 meters to 50 meters, I want to load all the variables in the dataset.

I just tried it again, it is 07:06 am EST and it failed. Referring to this forum, update times for this server are just 1 hour a day from 12:00pm to 01:00pm EST. So this does not seem to be the problem?

Also note that --- as explained above --- if I exclude timestamps which are in the "past" the query succeeds.

Michael McDonald

unread,
Nov 9, 2019, 9:22:12 AM11/9/19
to Carlo Barth, forum
Carlo,
The next time this happens, please check all of the various Access method URLs for this "latest" dataset via your client's web browser.



if *all* of these work (try refreshing each page 1~2 times), then there is likely something wrong with your python code making the OPENDAP subset.

note: the latest data for *today* does not become available until after the system rolling restarts starting around noon EST. So a safer bet would be to check around 1 PM EST for new data from *todays* model run.




On Sat, Nov 9, 2019 at 7:07 AM Carlo Barth <ca...@cageeye.no> wrote:
Thank you very much for your reply Michael!

I guess the script shows quite exactly what I am trying to do and what
fails. In this example, I want to load data for "today" from
http://tds.hycom.org/thredds/dodsC/GLBy0.08/latest for a specific
coordinate. This is of course a minimal working example. In the actual
script I first make sure to pick coordinates which are on the grid, etc.
For these coordinates and time range given by start/end of the day in
UTC, and a depth range of 0 meters to 50 meters, I want to load all the
variables in the dataset.

I just tried it again, it is 07:06 am EST and it failed. Referring to
this forum, update times for this server are just 1 hour a day from
12:00pm to 01:00pm EST. So this does not seem to be the problem?

Also note that --- as explained above --- if I exclude timestamps which
are in the "past" the query succeeds.


ca...@cageeye.no

unread,
Nov 10, 2019, 6:08:19 AM11/10/19
to HYCOM.org Forum, ca...@cageeye.no
That was a very good idea :) I think it demonstrates that there is an actual problem, because the URL-querying seems to fail in exactly the same way. I would really love to learn how I could solve that problem and understand what causes it! I was in the same way not able to find out how I could consistently query for historic data (like 30 days ago) from the same model, so maybe you could give me a hint in the answer :)

Here is the result of that test. First, I reran the python code and the error occured. Then I checked the URLs with the following results.
Query time for all of them (in UTC): 2019-11-10T10:59:35+00:00

Query time: '2019-11-10T10:59:35.597952+00:00'

# URL

# Response
Dataset {
    Grid {
     ARRAY:
        Float32 water_temp[time = 1][depth = 15][lat = 1][lon = 1];
     MAPS:
        Float64 time[time = 1];
        Float64 depth[depth = 15];
        Float64 lat[lat = 1];
        Float64 lon[lon = 1];
    } water_temp;
} GLBy0.08/latest;
---------------------------------------------
water_temp.water_temp[1][15][1][1]
[0][0][0], 11.471999
[0][1][0], 11.453
[0][2][0], 11.339
[0][3][0], 11.160999
[0][4][0], 11.11
[0][5][0], 11.080999
[0][6][0], 11.066
[0][7][0], 10.993
[0][8][0], 10.9939995
[0][9][0], NaN
[0][10][0], NaN
[0][11][0], NaN
[0][12][0], NaN
[0][13][0], NaN
[0][14][0], NaN

water_temp.time[1]
153.0

water_temp.depth[15]
0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 45.0, 50.0

water_temp.lat[1]
-42.68000030517578

water_temp.lon[1]
286.96002197265625


# URL

# Response
Error {
    code = 500;
    message = "opendap.dap.DataReadException: Inconsistent array length read: 1165128303 != 1914731274; water_temp -- 43:43,0:14,933:933,3587:3587";
};


# URL

# Response
Error {
    code = 500;
    message = "opendap.dap.DataReadException: Inconsistent array length read: 1165128303 != 1914731274; water_temp -- 43:54,0:14,933:933,3587:3587";
};


# URL

# Response
Dataset {
    Grid {
     ARRAY:
        Float32 water_temp[time = 4][depth = 15][lat = 1][lon = 1];
     MAPS:
        Float64 time[time = 4];
        Float64 depth[depth = 15];
        Float64 lat[lat = 1];
        Float64 lon[lon = 1];
    } water_temp;
} GLBy0.08/latest;
---------------------------------------------
water_temp.water_temp[4][15][1][1]
[0][0][0], 11.471999
[0][1][0], 11.453
[0][2][0], 11.339
[0][3][0], 11.160999
[0][4][0], 11.11
[0][5][0], 11.080999
[0][6][0], 11.066
[0][7][0], 10.993
[0][8][0], 10.9939995
[0][9][0], NaN
[0][10][0], NaN
[0][11][0], NaN
[0][12][0], NaN
[0][13][0], NaN
[0][14][0], NaN
[1][0][0], 11.32
[1][1][0], 11.308
[1][2][0], 11.269
[1][3][0], 11.217
[1][4][0], 11.177
[1][5][0], 11.146999
[1][6][0], 11.129999
[1][7][0], 11.037
[1][8][0], 11.014
[1][9][0], NaN
[1][10][0], NaN
[1][11][0], NaN
[1][12][0], NaN
[1][13][0], NaN
[1][14][0], NaN
[2][0][0], 11.179999
[2][1][0], 11.181
[2][2][0], 11.1779995
[2][3][0], 11.172999
[2][4][0], 11.165999
[2][5][0], 11.155999
[2][6][0], 11.139999
[2][7][0], 11.084
[2][8][0], 11.042
[2][9][0], NaN
[2][10][0], NaN
[2][11][0], NaN
[2][12][0], NaN
[2][13][0], NaN
[2][14][0], NaN
[3][0][0], 11.124
[3][1][0], 11.125999
[3][2][0], 11.125
[3][3][0], 11.122999
[3][4][0], 11.1189995
[3][5][0], 11.113999
[3][6][0], 11.108999
[3][7][0], 11.099999
[3][8][0], 11.087999
[3][9][0], NaN
[3][10][0], NaN
[3][11][0], NaN
[3][12][0], NaN
[3][13][0], NaN
[3][14][0], NaN

water_temp.time[4]
153.0, 156.0, 159.0, 162.0

water_temp.depth[15]
0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 45.0, 50.0

water_temp.lat[1]
-42.68000030517578

water_temp.lon[1]
286.96002197265625

So it consistently fails if time indexes which are in the past are part of the query, while it works for time indexes that are in the future.

Any help much appreciated,
Thanks,
Carlo

ca...@cageeye.no

unread,
Nov 14, 2019, 5:14:56 AM11/14/19
to HYCOM.org Forum, ca...@cageeye.no, Michael McDonald
Hi Michael,

is there a chance I get feedback on this? We were researching it now in a team and it seems there are really some issues with this API. Most importantly the data inavaiability described above, which quite consistently fails if time stamps of past and forecast are queried at the same time. Another issue is that the time coordinate can change its start time spontaneously.

Looking forward to hearing from you,
Carlo.  

Michael McDonald

unread,
Nov 14, 2019, 5:52:38 PM11/14/19
to Carlo Barth, HYCOM.org Forum
Carlo,
We have seen this issue for a while now, but it was only isolated to accessing forecast data shortly after it arrived and before THREDDS did a rescan of the data files and updated the FMRC. So, as a fix, all of the tomcat servers were restarted daily between 12pm and 1pm EST to do a full refresh once all new data has been unpacked. The FMRC was supposed to be scanning/updating the data dirs on the fly without a restart based on a set schedule, but could not determine the cause why this was not happening. 

Turns out that the featureCollection Name used in the THREDDS catalogs is "global" (even though ours were nested inside other dataset blocks and had unique paths defined) so we were having dataset name collisions where we simply used "FMRC" as the name in different catalogs (i.e., the FMRC for the global model run and the FMRC for the Gulf of Mexico run). Both had the "name" part of their featureCollection set to "FMRC", which resulted in the backend cron/update scheduler throwing a WARN & ERROR.

2019-11-14T17:22:02.089 -0500 WARN  - scheduler failed to add updateJob for UpdateCollection.FMRC. Another Job exists with that identification.
2019-11-14T17:22:02.089 -0500 ERROR - scheduleJob failed to schedule startup Job for FeatureCollectionConfig name ='FMRC' collectionName='FMRC' type='FMRC'

To address this issue we have spun up (in parallel) the same FMRC that you were accessing, only this one is globally unique and should, therefore, have the correct data without the Inconsistent array length errors you were getting.

Please go here and see the replacement FMRC to what you were using,

e.g.,  use this OPENDAP URL instead and test/verify


Please give this a try for a while and see if this response is better. If it does fix the issue, then we will replace the "GLBy0.08/latest" shortcut so that it goes to the new one that is updating via the featureCollection rescan.



 

ca...@cageeye.no

unread,
Nov 27, 2019, 3:01:45 PM11/27/19
to HYCOM.org Forum, ca...@cageeye.no
Dear Michael,

sorry for the delay with my response. First of all thank you very much for taking care of the issue. I really believe it is a bug that could affect many users. Thanks as well for the details you provide, it is very interesting.

On Nov 18th we tested the new thredds URL you provided and reproduced the same errors. Today when I try the route is entirely inaccessible. So it seems the problem is not resolved yet. Do you have any updates for me?

Thanks a lot,
Carlo. 

Michael McDonald

unread,
Nov 27, 2019, 5:21:44 PM11/27/19
to Carlo Barth, HYCOM.org Forum
Carlo,
New thread with the *new* details on this issue here,


start at the top of the catalog (https://tds.hycom.org/thredds/catalog.html) and find the new URL





--
--
You received this message because you are a member of HYCOM.org
To ask a question, send an email to fo...@hycom.org

Reply all
Reply to author
Forward
0 new messages