I have been trying to figure out how to use pydap to pull data from
http://opendap.co-ops.nos.noaa.gov/dods/ I want to make a little
client that pulls the raw 6 minute data, but am not understanding how
to construct the query. I haven't gotten anythink figured out beyond
this...
dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')
wl = dataset.WATERLEVEL_RAW_PX
type(wl)
Out[15]: <class 'dap.dtypes.SequenceType'>
wl.keys()
Out[18]:
['_STATION_ID',
'_DATUM',
'_BEGIN_DATE',
'_END_DATE',
'DCP',
'SENSOR_ID',
'DATE_TIME',
'WL_VALUE',
'SIGMA',
'O',
'F',
'R',
'L']
I need to set the station id (e.g. 8639348), the datum to 'MLLW' and
the date to bracket the last hour or so. Could someone give me the
quick couple lines that do that and make a request? I am a complete
beginner at pydap and am not sure how to do this. I did a similiar
query with soappy, but am looking to switch to pydap.
Thanks,
-kurt
http://schwehr.org/
The SOAPpy query looks like this:
Results...
2007-01-03 19:53:40.720645 2007-01-03 20:23:40.720645 2007-01-03
20:13:40.720645
<SOAPpy.Types.structType item at 19087112>: {'F': '0', 'timeStamp':
'2007-01-03 20:06:00.0', 'WL': '-0.161', 'L': '0', 'O': '0', 'R': '0',
'sigma': '0.0030'}
def getWaterLevelSoappyNow(stationId,debug=False):
d = datetime.datetime.utcnow()
startD = d + datetime.timedelta(minutes=-20)
endD = d + datetime.timedelta(minutes=10)
print startD,endD,d
beginDate = str(startD.year)+('%02d' % startD.month)+('%02d' %
startD.day)+' '+ ('%02d' % (startD.hour))+':'+('%02d' %
(startD.minute))
endDate = str(endD.year)+('%02d' % endD.month)+('%02d' %
endD.day)+' '+ ('%02d' % (endD.hour))+':'+('%02d' % (endD.minute))
from SOAPpy import SOAPProxy
url =
'http://opendap.co-ops.nos.noaa.gov/axis/services/WaterLevelRawSixMin'
namespace='urn:WaterLevelRawSixMin' # This really can be anything.
It is ignored
server = SOAPProxy(url,namespace)
if debug: server.config.debug=1
response =
server.getWaterLevelRawSixMin(stationId=str(stationId),beginDate=beginDate,endDate=endDate,datum='MLLW',unit=0,timeZone=0)
return response.item[-1]
In the instructions for the server, you need to supply 4 required fields.
The four required fields - Station Id, Datum, Begin Date, and End Date -
must be surrounded in double quotes.
http://opendap.co-ops.nos.noaa.gov/dods/
This will obtain ASCII data for water level from 01/01/2006 to 01/02/2006.
How to do this in dap? Have to set some sequence query variables after
the open statement. See Sequential data:
http://pydap.org/docs/client.html
This seems to imply that you know what station, time and data you want up
front.
import dap.client
dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')
>>> dataset.keys()
['WATERLEVEL_RAW_PX']
Now do this:
seq = dataset['WATERLEVEL_RAW_PX']
filt_seq=seq.filter('_STATION_ID="1615680"&_BEGIN_DATE="20060101"&_END_DATE="20060102"&_DATUM="MLLW"')
Pop out values:
filt_seq['WL_VALUE'][:]
[0.436, 0.46600000000000003, 0.42799999999999999, 0.44400000000000001,
0.45100...
Same values I got using the HTML form:
Dataset {
Sequence {
Float64 WL_VALUE;
} WATERLEVEL_RAW_PX;
} WATERLEVEL_RAW_PX;
---------------------------------------------
WATERLEVEL_RAW_PX.WL_VALUE
0.436
0.466
0.428
0.444
0.451
0.468
0.465
Enjoy! Thanks for the post, this is the first time I have worked
successfully with sequence data.
I like the SOAP solution too! Thanks!
Rob
--
Alaska Ocean Observing System
Database Manager
907-474-7948 (skype:rob_cermak)
http://vislab-ccom.unh.edu/~schwehr/ais/waterlevel/downloads/
-kurt
http://schwehr.org/blog/archives/2007-01.html#e2007-01-03T18_53_45.txt
-kurt
It looks like you have the same date for Begin and End. The full time
format for the variables is:
long_name: "REQUIRED 8- to 14-character Begin Date (yyyymmdd hh:mi)
&_BEGIN_DATE="20060101"&_END_DATE="20060101"
My guess is that the server cannot handle requests where the time string
is the same.
Maybe try getting one or two six minute values?
&_BEGIN_DATE="20060101 00:00"&_END_DATE="20060101 00:06"
One value might be :00 to :05, :06 to :11
Rob
Hi, Kurt.
The problem here is with the spaces in the constraint expression. If
you quote them using ``urllib.quote`` it will work:
filt_seq = seq.filter(urllib.quote('_STATION_ID="1615680"&_BEGIN_DATE="20060101
10:00"&_END_DATE="20060101 11:00"&_DATUM="MLLW"'))
I'll fix pydap to automatically quote the expressions when filtering
-- I thought I had fixed this when I came across this bug, but it
escaped me.
And Rob, thanks for helping with the mailiing list! It's nice to see a
community growing around pydap. :)
--Rob
Thanks for the urllib fix! Two more questions...
I am pulling data by calling _get_data() so that I get all the fields
at once, but _ fuctions are supposed to be private. Is there a better
way to do this?
#!/usr/bin/env python
import dap.client
import urllib
dataset=dap.client.open('http://opendap.co-ops.nos.noaa.gov/dods/IOOS/Raw_Water_Level')
seq = dataset['WATERLEVEL_RAW_PX']
reqStr=urllib.quote('_STATION_ID="1615680"&_BEGIN_DATE="20060101
10:06"&_END_DATE="20060101 10:06"&_DATUM="MLLW"')
print 'reqStr:',reqStr
filt_seq=seq.filter(reqStr)
data = filt_seq._get_data()
print 'Found this many waterlevel points:',len(data)
print data[-1]
Which goes like this. It's a bummer that it takes 22 seconds. Could
the way I fetch the data be causing the query to take so long? The
soap call takes just 7.6 sec.
time ./test_pydap.py
reqStr:
_STATION_ID%3D%221615680%22%26_BEGIN_DATE%3D%2220060101%2010%3A06%22%26_END_DATE%3D%2220060101%2010%3A06%22%26_DATUM%3D%22MLLW%22
Found this many waterlevel points: 1
('1615680', 'MLLW', '20060101 10:06', '20060101 10:06', '1', 'A1', 'Jan
1 2006 10:06AM', 0.35799999999999998, 0.070000000000000007, 0, 0, 0,
0)
real 0m22.424s
user 0m0.360s
sys 0m2.183s
Thanks!
-kurt
1st question we can handle. 2nd question. Time? I noticed that too.
That isn't python, most likely the server end. Looks like if you need
fast, go the SOAP route. SOAP != OPeNDAP, though in the next year we
will see some SOAP adapters creep into OPeNDAP.
The SOAP server must have a more direct connection to the database. The
OPeNDAP server for Netcdf, HDF, etc is pretty fast. The relational
database connector for OPeNDAP uses java/tomcat so that adds a fair
amount of overhead to each transaction:
HTTP -> tomcat -> DB -> tomcat -> HTTP
Normally, this is pretty fast. I think the query against the database is
taking time. There is a fair number of folks interested in this data. I
would imaging you will also see the SOAP query bog down at times.
Back to the private vs. public...you are using _get_data(), the
public version is data().
Change this line...from:
data = filt_seq._get_data()
to:
data = filt_seq.data
>>> print 'Found this many waterlevel points:',len(data)
Found this many waterlevel points: 1
>>> print data[-1]
('1615680', 'MLLW', '20060101 10:06', '20060101 10:06', '1', 'A1', 'Jan
1 2006 10:06AM', 0.35799999999999998, 0.070000000000000007, 0, 0, 0, 0)
Rob
-Roy
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Men...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
It's a problem with the client. When accessing sequences, pydap 2.2
will do a separate request for each contained variable, instead of
downloading the whole sequence -- in this particular example, 13
requests are made. This was a design decision that made the client
easier to write.
Fortunately, this is fixed in the 2.3 branch that should be out in a
month or so. (In 2.3 I changed a little bit the behavior of sequences
so that pydap works better with dapper servers. One new thing is that
sequences can be indexed, for example). With 2.3 I can run your script
in approximately 6 seconds, versus 4 for the SOAP version.
--Rob
-kurt