Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Trouble figuring out noaa co-ops opendap server

79 views
Skip to first unread message

Kurt

unread,
Jan 3, 2007, 3:21:35 PM1/3/07
to pydap
Hi All,

I have been trying to figure out how to use pydap to pull data from
http://opendap.co-ops.nos.noaa.gov/dods/ I want to make a little
client that pulls the raw 6 minute data, but am not understanding how
to construct the query. I haven't gotten anythink figured out beyond
this...

dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')
wl = dataset.WATERLEVEL_RAW_PX

type(wl)
Out[15]: <class 'dap.dtypes.SequenceType'>

wl.keys()
Out[18]:
['_STATION_ID',
'_DATUM',
'_BEGIN_DATE',
'_END_DATE',
'DCP',
'SENSOR_ID',
'DATE_TIME',
'WL_VALUE',
'SIGMA',
'O',
'F',
'R',
'L']

I need to set the station id (e.g. 8639348), the datum to 'MLLW' and
the date to bracket the last hour or so. Could someone give me the
quick couple lines that do that and make a request? I am a complete
beginner at pydap and am not sure how to do this. I did a similiar
query with soappy, but am looking to switch to pydap.

Thanks,
-kurt
http://schwehr.org/


The SOAPpy query looks like this:

Results...
2007-01-03 19:53:40.720645 2007-01-03 20:23:40.720645 2007-01-03
20:13:40.720645
<SOAPpy.Types.structType item at 19087112>: {'F': '0', 'timeStamp':
'2007-01-03 20:06:00.0', 'WL': '-0.161', 'L': '0', 'O': '0', 'R': '0',
'sigma': '0.0030'}

def getWaterLevelSoappyNow(stationId,debug=False):
d = datetime.datetime.utcnow()

startD = d + datetime.timedelta(minutes=-20)
endD = d + datetime.timedelta(minutes=10)
print startD,endD,d

beginDate = str(startD.year)+('%02d' % startD.month)+('%02d' %
startD.day)+' '+ ('%02d' % (startD.hour))+':'+('%02d' %
(startD.minute))
endDate = str(endD.year)+('%02d' % endD.month)+('%02d' %
endD.day)+' '+ ('%02d' % (endD.hour))+':'+('%02d' % (endD.minute))

from SOAPpy import SOAPProxy
url =
'http://opendap.co-ops.nos.noaa.gov/axis/services/WaterLevelRawSixMin'
namespace='urn:WaterLevelRawSixMin' # This really can be anything.
It is ignored
server = SOAPProxy(url,namespace)
if debug: server.config.debug=1

response =
server.getWaterLevelRawSixMin(stationId=str(stationId),beginDate=beginDate,endDate=endDate,datum='MLLW',unit=0,timeZone=0)
return response.item[-1]

Rob Cermak

unread,
Jan 3, 2007, 5:31:50 PM1/3/07
to py...@googlegroups.com
Hi,

In the instructions for the server, you need to supply 4 required fields.

The four required fields - Station Id, Datum, Begin Date, and End Date -
must be surrounded in double quotes.
http://opendap.co-ops.nos.noaa.gov/dods/

This will obtain ASCII data for water level from 01/01/2006 to 01/02/2006.

http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level.ascii?WATERLEVEL_RAW_PX.WL_VALUE&WATERLEVEL_RAW_PX._STATION_ID=%221615680%22&WATERLEVEL_RAW_PX._DATUM=%22MLLW%22&WATERLEVEL_RAW_PX._BEGIN_DATE=%2220060101%22&WATERLEVEL_RAW_PX._END_DATE=%2220060102%22&WATERLEVEL_RAW_PX.WL_VALUE!=-999.9

How to do this in dap? Have to set some sequence query variables after
the open statement. See Sequential data:
http://pydap.org/docs/client.html

This seems to imply that you know what station, time and data you want up
front.

import dap.client
dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')

>>> dataset.keys()
['WATERLEVEL_RAW_PX']

Now do this:

seq = dataset['WATERLEVEL_RAW_PX']

filt_seq=seq.filter('_STATION_ID="1615680"&_BEGIN_DATE="20060101"&_END_DATE="20060102"&_DATUM="MLLW"')

Pop out values:
filt_seq['WL_VALUE'][:]

[0.436, 0.46600000000000003, 0.42799999999999999, 0.44400000000000001,
0.45100...

Same values I got using the HTML form:

http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level.ascii?WATERLEVEL_RAW_PX.WL_VALUE&WATERLEVEL_RAW_PX._STATION_ID=%221615680%22&WATERLEVEL_RAW_PX._DATUM=%22MLLW%22&WATERLEVEL_RAW_PX._BEGIN_DATE=%2220060101%22&WATERLEVEL_RAW_PX._END_DATE=%2220060102%22&WATERLEVEL_RAW_PX.WL_VALUE!=-999.9

Dataset {
Sequence {
Float64 WL_VALUE;
} WATERLEVEL_RAW_PX;
} WATERLEVEL_RAW_PX;
---------------------------------------------
WATERLEVEL_RAW_PX.WL_VALUE
0.436
0.466
0.428
0.444
0.451
0.468
0.465

Enjoy! Thanks for the post, this is the first time I have worked
successfully with sequence data.

I like the SOAP solution too! Thanks!

Rob


--
Alaska Ocean Observing System
Database Manager
907-474-7948 (skype:rob_cermak)

Kurt

unread,
Jan 3, 2007, 5:51:44 PM1/3/07
to pydap
Awesome! Thanks much for the help. I will eventually get the
resulting code into my noaadata package...

http://vislab-ccom.unh.edu/~schwehr/ais/waterlevel/downloads/

-kurt

Kurt

unread,
Jan 3, 2007, 8:06:17 PM1/3/07
to pydap
I think now, I am possibly fighting a bug on the server side where it
is not letting me select time windows within the data. I have written
up what I've got so far as a blog post. I still have a lot to figure
out, but this is a great start!

http://schwehr.org/blog/archives/2007-01.html#e2007-01-03T18_53_45.txt

-kurt

Rob Cermak

unread,
Jan 3, 2007, 8:18:22 PM1/3/07
to py...@googlegroups.com
Kurt,

It looks like you have the same date for Begin and End. The full time
format for the variables is:

long_name: "REQUIRED 8- to 14-character Begin Date (yyyymmdd hh:mi)

&_BEGIN_DATE="20060101"&_END_DATE="20060101"

My guess is that the server cannot handle requests where the time string
is the same.

Maybe try getting one or two six minute values?

&_BEGIN_DATE="20060101 00:00"&_END_DATE="20060101 00:06"

One value might be :00 to :05, :06 to :11

Rob

Roberto De Almeida

unread,
Jan 4, 2007, 8:18:14 AM1/4/07
to py...@googlegroups.com

Hi, Kurt.

The problem here is with the spaces in the constraint expression. If
you quote them using ``urllib.quote`` it will work:

filt_seq = seq.filter(urllib.quote('_STATION_ID="1615680"&_BEGIN_DATE="20060101
10:00"&_END_DATE="20060101 11:00"&_DATUM="MLLW"'))

I'll fix pydap to automatically quote the expressions when filtering
-- I thought I had fixed this when I came across this bug, but it
escaped me.

And Rob, thanks for helping with the mailiing list! It's nice to see a
community growing around pydap. :)

--Rob

Kurt

unread,
Jan 4, 2007, 9:13:49 AM1/4/07
to pydap
Rob,

Thanks for the urllib fix! Two more questions...

I am pulling data by calling _get_data() so that I get all the fields
at once, but _ fuctions are supposed to be private. Is there a better
way to do this?

#!/usr/bin/env python
import dap.client
import urllib
dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')
seq = dataset['WATERLEVEL_RAW_PX']
reqStr=urllib.quote('_STATION_ID="1615680"&_BEGIN_DATE="20060101
10:06"&_END_DATE="20060101 10:06"&_DATUM="MLLW"')
print 'reqStr:',reqStr
filt_seq=seq.filter(reqStr)
data = filt_seq._get_data()
print 'Found this many waterlevel points:',len(data)
print data[-1]


Which goes like this. It's a bummer that it takes 22 seconds. Could
the way I fetch the data be causing the query to take so long? The
soap call takes just 7.6 sec.

time ./test_pydap.py
reqStr:
_STATION_ID%3D%221615680%22%26_BEGIN_DATE%3D%2220060101%2010%3A06%22%26_END_DATE%3D%2220060101%2010%3A06%22%26_DATUM%3D%22MLLW%22
Found this many waterlevel points: 1
('1615680', 'MLLW', '20060101 10:06', '20060101 10:06', '1', 'A1', 'Jan
1 2006 10:06AM', 0.35799999999999998, 0.070000000000000007, 0, 0, 0,
0)

real 0m22.424s
user 0m0.360s
sys 0m2.183s

Thanks!
-kurt

Rob Cermak

unread,
Jan 4, 2007, 1:07:32 PM1/4/07
to py...@googlegroups.com
Kurt,

1st question we can handle. 2nd question. Time? I noticed that too.
That isn't python, most likely the server end. Looks like if you need
fast, go the SOAP route. SOAP != OPeNDAP, though in the next year we
will see some SOAP adapters creep into OPeNDAP.

The SOAP server must have a more direct connection to the database. The
OPeNDAP server for Netcdf, HDF, etc is pretty fast. The relational
database connector for OPeNDAP uses java/tomcat so that adds a fair
amount of overhead to each transaction:

HTTP -> tomcat -> DB -> tomcat -> HTTP

Normally, this is pretty fast. I think the query against the database is
taking time. There is a fair number of folks interested in this data. I
would imaging you will also see the SOAP query bog down at times.

Back to the private vs. public...you are using _get_data(), the
public version is data().

Change this line...from:

data = filt_seq._get_data()

to:

data = filt_seq.data

>>> print 'Found this many waterlevel points:',len(data)

Found this many waterlevel points: 1

>>> print data[-1]


('1615680', 'MLLW', '20060101 10:06', '20060101 10:06', '1', 'A1', 'Jan
1 2006 10:06AM', 0.35799999999999998, 0.070000000000000007, 0, 0, 0, 0)

Rob

Roy Mendelssohn

unread,
Jan 4, 2007, 1:48:00 PM1/4/07
to py...@googlegroups.com
We use the OpeNDAP database connector for fairly large files and it
is not that slow - nowhere's near. So my guess is that there is
something more going on here.

-Roy

**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Men...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."

Roberto De Almeida

unread,
Jan 5, 2007, 6:51:06 AM1/5/07
to py...@googlegroups.com
On 1/4/07, Roy Mendelssohn <Roy.Men...@noaa.gov> wrote:
> We use the OpeNDAP database connector for fairly large files and it
> is not that slow - nowhere's near. So my guess is that there is
> something more going on here.

It's a problem with the client. When accessing sequences, pydap 2.2
will do a separate request for each contained variable, instead of
downloading the whole sequence -- in this particular example, 13
requests are made. This was a design decision that made the client
easier to write.

Fortunately, this is fixed in the 2.3 branch that should be out in a
month or so. (In 2.3 I changed a little bit the behavior of sequences
so that pydap works better with dapper servers. One new thing is that
sequences can be indexed, for example). With 2.3 I can run your script
in approximately 6 seconds, versus 4 for the SOAP version.

--Rob

Kurt

unread,
Jan 8, 2007, 10:54:44 AM1/8/07
to pydap

Can't wait! Sounds like some good new stuff.

-kurt

ejav...@gmail.com

unread,
Feb 14, 2014, 9:02:45 PM2/14/14
to py...@googlegroups.com
Hello, sorry to bring this again, Im trying to download data from the noaa co-ops server using python with pydap. I followed the suggestions in the thread but I just handle to get a variable that says SequenceProxy ponting to variable etc. Can anyone help with this. Here is what I do:


import sys
import netCDF4 as nc
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
import datetime
import pydap.client as pydap

lines = [line.strip() for line in open('c_tidecomp.in')]
buoyname=lines[0]
stanum=int(lines[1])

# NOAA #

noaadata=pydap.open_url('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')


WL=noaadata.WATERLEVEL_RAW_PX.WL_VALUE[
        (noaadata.WATERLEVEL_RAW_PX._STATION_ID=="9755371") &
        (noaadata.WATERLEVEL_RAW_PX._DATUM=="MSL") &
        (noaadata.WATERLEVEL_RAW_PX._BEGIN_DATE=="20140210") &
        (noaadata.WATERLEVEL_RAW_PX._END_DATE=="20140114")]



Thanks

James Hiebert

unread,
Feb 22, 2014, 3:31:35 PM2/22/14
to py...@googlegroups.com
Hello,

I believe that the sequence proxy exists to be a layer between data selection and data retrieval, so that you only make a network request for the data that you actually want to use. To pull the data down with a SequenceProxy, you have to actually iterate over it. It might be possible to do it with __getitem__ (e.g. my_proxy[1]), but I'd have to look at the code in a bit more detail.

For your request parameters, I notice that your start and end data are backwards, so the server returns no results:

>>> WL = noaadata.WATERLEVEL_RAW_PX.WL_VALUE[ (noaadata.WATERLEVEL_RAW_PX._STATION_ID=="9755371") &
(noaadata.WATERLEVEL_RAW_PX._DATUM=="MSL") &
(noaadata.WATERLEVEL_RAW_PX._BEGIN_DATE=="20140210") &
(noaadata.WATERLEVEL_RAW_PX._END_DATE=="20140114")]
>>> WL
<SequenceProxy pointing to variable "WATERLEVEL_RAW_PX.WL_VALUE" at "http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level?WATERLEVEL_RAW_PX._STATION_ID="9755371"&WATERLEVEL_RAW_PX._DATUM="MSL"&WATERLEVEL_RAW_PX._BEGIN_DATE="20140210"&WATERLEVEL_RAW_PX._END_DATE="20140114"&">
>>> foo = [x for x in WL ]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/james/pyenv/local/lib/python2.7/site-packages/pydap/proxy.py", line 202, in __iter__
resp, data = request(url)
File "/home/james/pyenv/local/lib/python2.7/site-packages/pydap/util/http.py", line 50, in request
raise ServerError(msg)
pydap.exceptions.ServerError: 'Server error 0: "Your Query Produced No Matching Results."'

But if you swap them around, you get the water levels:

>>> WL = noaadata.WATERLEVEL_RAW_PX.WL_VALUE[ (noaadata.WATERLEVEL_RAW_PX._STATION_ID=="9755371") &
(noaadata.WATERLEVEL_RAW_PX._DATUM=="MSL") &
(noaadata.WATERLEVEL_RAW_PX._END_DATE=="20140210") &
(noaadata.WATERLEVEL_RAW_PX._BEGIN_DATE=="20140114")]
>>> foo = [x for x in WL ]
>>> foo
[0.024, 0.021999999999999999, 0.017000000000000001, 0.012, 0.0040000000000000001, -0.001, -0.0080000000000000002, -0.016, -0.02, -0.028000000000000001, -0.031, -0.040000000000000001, -0.043999999999999997, -0.050999999999999997,

...

Hope that helps!

~James Hiebert

Rob De Almeida

unread,
Feb 23, 2014, 2:51:38 PM2/23/14
to py...@googlegroups.com
On Sat, Feb 22, 2014 at 12:31:35PM -0800, James Hiebert wrote:
> I believe that the sequence proxy exists to be a layer between data selection and data retrieval, so that you only make a network request for the data that you actually want to use. To pull the data down with a SequenceProxy, you have to actually iterate over it. It might be possible to do it with __getitem__ (e.g. my_proxy[1]), but I'd have to look at the code in a bit more detail.

This is correct, to get the data from the sequence proxy you need to
iterate over it like James said. Note than using __getitem__ is not
sufficient, it will still return a proxy object that maps to a slice of
the data.

Like I mentioned on my previous email, I usually use list() or
np.rec.fromrecords() to consume the sequence proxy into array data.

Let me know if this works for you.
Cheers, and thanks James!
--Rob
> --
> You received this message because you are subscribed to the Google Groups "pydap" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pydap+un...@googlegroups.com.
> To post to this group, send email to py...@googlegroups.com.
> Visit this group at http://groups.google.com/group/pydap.
> For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages