OpenDAP HyCOM access inconsistencies

255 views
Skip to first unread message

stephen....@gmail.com

unread,
Feb 23, 2017, 4:24:18 PM2/23/17
to HYCOM.org Forum
Hello,
      I trying to process data from the GLBu0.08 and GLBa0l08 HyCOM datasets via OpenDAP in Python. I am calling each variable (u, v, pot. temp, salinity, and ssh) seperately after "opening" and "closing" the HyCOM dataset once for a given day, I subset to my region (i.e., South America) from the global grid, I request every third grid point to closely match my model simulation resolution to try to reduce my burden upon the data servers, and I have tried different times of day (morning, afternoon, nighttime) to request the data. The python code to access the data works and generates the data I need, but during the OpenDAP access phase I have found that the HyCOM server eventually become unresponsive to my requests. I checked my internet connection to make sure it remains active and it does. Sometimes the hang up happens immediately and sometimes after a few model days, but the timing of the hang ups is not consistent. Has anyone else experienced a similar issue or can offer a potential solution? I would appreciate any advice or help.
     Regards,
           --Stephen

Michael McDonald

unread,
Feb 23, 2017, 4:29:01 PM2/23/17
to stephen....@gmail.com, HYCOM.org Forum
Stephen,

Please share your code with the group and we might be able to help debug your issue.

Also, please private message me your IP address and I can check the logs for any query issues. 

--
Michael McDonald
HYCOM.org Administrator

stephen....@gmail.com

unread,
Feb 23, 2017, 6:41:21 PM2/23/17
to HYCOM.org Forum, stephen....@gmail.com

Michael,
    The original code depends upon pre-processed grid files, which are too large to share here. I have created a python script which generates its own grid and uses the same OpenDAP commands as in the original script. I did not see a way to attach the file in the forum, but I have uploaded it to Dropbox (https://dl.dropboxusercontent.com/u/107365301/get_HyCOM.py). I resubmitted my main HyCOM OpenDAP python scripts around 10 minutes ago and they seem to be working great for the time being. I would be curious to hear from others and yourself whether my data hanging problem can be reproduced. To give some idea, one night last week I went home twelve hours later I had three months of processed HyCOM data, while last night I came into the office to find that only three days of data had been processed in the same eight hour time frame.
     Cheers,
          --Stephen

P.S. Make sure to run the Python script for at least a week. Sometimes I will have no problem for a day or two of model data and then it stops.

P.S.S. I have also uploaded sample program output to dropbox (https://dl.dropboxusercontent.com/u/107365301/HyCOM_20070101.nc)

Michael McDonald

unread,
Feb 25, 2017, 12:06:24 PM2/25/17
to Stephen Nicholls, forum
> In terms of the Python commands, here is how it appears in my current script
>
> 1) Open the file
> from netCDF4 import Dataset
> nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1','r')
> 2) Read the variable
> var = nc.variable['water_temp'][188,:,970:1602:2842:3783:3]
>
> 3) The result of these two commands will be that var contains the "water_temp" data within the specified data range


> I also tried all of this as a single step, i.e.,
> var = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1?water_temp[188][0:39][970:3:1602][2842:3:3783]','r')
>
> Both methods get the data, but I figured that "opening" the HyCOM file once would be quicker.


Stephen,

If your application can query a subsetted OPeNDAP URL (e.g.,
.../expt_91.1?water_temp[188][0:39][970:3:1602][2842:3:3783]','r')),
then that is *better* than the two step method.


e.g., this OPenDAP call

nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1','r')

*can* return back the "entire dataset" comprised of several Terabytes
in total data and the header metadata is also larger with all the time
indices, other variables, etc. Most OPeNDAP clients handle this well
(e.g., ferret, python, Panoply), other applications *do not* always
handle this well (e.g,, matlab ocean toolbox). You will need to
experiment some.


The second method you use (all in one) is *safer* since from the start
you "limit the scope" of what OPeNDAP can return back to your client.

Opening a top level (large) dataset requires more index/metadata to be
sent form server to client. Limiting the OPeNDAP query reduces this
index/metadata that must get sent form server to client (which are all
stored in data structures within python). If you multiply this by
several hundred~thousands requests you can immediately see how this
optimization reduces overhead and will save you time and bandwidth.
The servers will also be a lot happier :-)

Does that make sense?

stephen....@gmail.com

unread,
Mar 2, 2017, 4:11:43 PM3/2/17
to HYCOM.org Forum, stephen.d...@nasa.gov
Michael,
      Thank you for your replies and suggestions. I have spent the last few days trying different variants of the python OpenDAP commands and I also testes these commands on NCAR's THREADDS server, Motherload. Your suggestion to specify the variable certainly helped. When the HyCOM data retrieval works, specifying the variable reduced the OpenDAP retrieval time by 50%.

Example:
     Before: nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1','r')
     After: nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1?water_u','r')

 I also compared three variants of Python's OpenDAP commands (i.e., my one step and two step methods). See below.

1-step:
nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1?water_u[176][0:39][0:100][0:100]','r')
var = nc.variable['water_u'][:]

2-step
nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1?water_u[176][0:39][0:100][0:100]','r')
var = nc.variable['water_u'][176][:][0:100][0:100]

Both of these commands work and stored the data to the variable 'var' at exactly the same rate. I also did each call for each separate vertical level. However, no approach solved the eventual "lock up" issue which is on-going at the moment. A colleague of mine suggested my use of the netstat -a | more command to check my connection to the hycom data server during the processing and the connection is always noted as being 'ESTABLISHED' even when hung. However, I did learn that the hanging situation occurs when "opening" the opendap file and occurs irregardless of variable size. For all of this afternoon, I have been unable any variables, even a 1-D data string such as 'MT'.

Also I tested my script against NCAR's opendap Motherload THREADDS server and I experienced no problems processing the data from their server even when I ran through multiple time steps and variables. I am now really at a loss to explain the 'on' or 'off' nature of my experience with OpenDAP access to HyCOM. I have also taken care to avoid the HyCOM server daily updates between 10:30 and noon to eliminate it as a possible problem. 

Also should it be useful, if my python script "hangs" it always does during the opendap access phase (i.e., where I have all the nc = Dataset() commands). I have uploaded the latest version of my get_HYCOM.py script to dropbox so that you have the same code I am using (https://dl.dropboxusercontent.com/u/107365301/get_HyCOM.py). If you have any other suggestions, please do let me know.

stephen....@gmail.com

unread,
Mar 2, 2017, 4:14:14 PM3/2/17
to HYCOM.org Forum, stephen.d...@nasa.gov
Correction to my previous response:


1-step:
nc = Dataset('http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1?water_u[176][0:39][0:100]
[0:100]','r')
var = nc.variable['water_u'][:]

2-step

Michael McDonald

unread,
Mar 5, 2017, 10:15:42 AM3/5/17
to stephen....@gmail.com, HYCOM.org Forum, Stephen Nicholls
Good to know that python is reading in the OPENDAP datasets right.
Limiting the scope of what you can query still has the benefit of
reducing index transfers from server to client. It adds up quickly
with repetitive queries.

I just ran your code and it worked...up to a point that looks *not* to
be OPENDAP related.

Traceback (most recent call last):
File "/home/mcdonald/scripts/get_HyCOM.py", line 384, in <module>
del temp
NameError: name 'temp' is not defined


note: since we last spoke we are also now running on a single THREDDS
server with aggregation cache set to never expire unless there is a
recheck value set. Since you are querying 91.1 data this should be
static now and never change, but the server was having to regenerate
all the indexes and caching each time you re-ran your code.
> --
> --
> You received this message because you are a member of HYCOM.org
> To ask a question, send an email to fo...@hycom.org
Terminal Saved Output.txt
Reply all
Reply to author
Forward
0 new messages