pydap client + thredds

539 views
Skip to first unread message

laguerra

unread,
May 8, 2009, 7:08:50 AM5/8/09
to pydap
Hi fellows!

How can I use pydap client to have access to a thredds server?

I tried:

hycom = open_url('http://hycom.coaps.fsu.edu:8080/thredds/dodsC/
glb_analysis')

... and got:

---------------------------------------------------------------------------
error Traceback (most recent call
last)

/home/cl80/Projetos/xbt/src/python/geovel.py in <module>()
----> 1
2
3
4
5

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/
client.pyc in open_url(url)
58 """
59 for response in [_ddx, _ddsdas]:
---> 60 dataset = response(url)
61 if dataset: break
62 else:

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/
client.pyc in _ddsdas(url)
196 (scheme, netloc, path + '.das', query, fragment))
197
--> 198 respdds, dds = request(ddsurl)
199 respdas, das = request(dasurl)
200

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/util/
http.pyc in request(url)
39 log.info('Opening %s' % url)
40 resp, data = h.request(url, "GET",
---> 41 headers = {'user-agent': pydap.lib.USER_AGENT})
42
43 # When an error is returned, we parse the error message
from the

/var/lib/python-support/python2.5/httplib2/__init__.pyc in request
(self, uri, method, body, headers, redirections, connection_type)
1048 content = new_content
1049 else:
-> 1050 (response, content) = self._request(conn,
authority, uri, request_uri, method, body, headers, redirections,
cachekey)
1051 except Exception, e:
1052 if self.force_exception_to_status_code:

/var/lib/python-support/python2.5/httplib2/__init__.pyc in _request
(self, conn, host, absolute_uri, request_uri, method, body, headers,
redirections, cachekey)
852 auth.request(method, request_uri, headers, body)
853
--> 854 (response, content) = self._conn_request(conn,
request_uri, method, body, headers)
855
856 if auth:

/var/lib/python-support/python2.5/httplib2/__init__.pyc in
_conn_request(self, conn, request_uri, method, body, headers)
821 for i in range(2):
822 try:
--> 823 conn.request(method, request_uri, body,
headers)
824 response = conn.getresponse()
825 except socket.gaierror:

/usr/lib/python2.5/httplib.pyc in request(self, method, url, body,
headers)
864
865 try:
--> 866 self._send_request(method, url, body, headers)
867 except socket.error, v:
868 # trap 'Broken pipe' if we're allowed to
automatically reconnect

/usr/lib/python2.5/httplib.pyc in _send_request(self, method, url,
body, headers)
887 for hdr, value in headers.iteritems():
888 self.putheader(hdr, value)
--> 889 self.endheaders()
890
891 if body:

/usr/lib/python2.5/httplib.pyc in endheaders(self)
858 raise CannotSendHeader()
859
--> 860 self._send_output()
861
862 def request(self, method, url, body=None, headers={}):

/usr/lib/python2.5/httplib.pyc in _send_output(self)
730 msg = "\r\n".join(self._buffer)
731 del self._buffer[:]
--> 732 self.send(msg)
733
734 def putrequest(self, method, url, skip_host=0,
skip_accept_encoding=0):

/usr/lib/python2.5/httplib.pyc in send(self, str)
697 if self.sock is None:
698 if self.auto_open:
--> 699 self.connect()
700 else:
701 raise NotConnected()

/var/lib/python-support/python2.5/httplib2/__init__.pyc in connect
(self)
713 break
714 if not self.sock:
--> 715 raise socket.error, msg
716
717 class HTTPSConnectionWithTimeout(httplib.HTTPSConnection):

error: (111, 'Connection refused')
-----------------------------------------------------------------------------------------------------------------

Thanks for your attention.

L. Alexandre Guerra

Roberto De Almeida

unread,
May 8, 2009, 7:32:58 AM5/8/09
to py...@googlegroups.com
Hi, Guerra.

It works fine for me (don't you hate when developers say that?). It looks like you have a connection problem. Do you need to use a proxy to access external URLs? You can configure the proxy this way:

  http://pydap.org/client.html#configuring-a-proxy

Otherwise, could be that the server was down at the moment.

Abraços,
--Rob
--
Dr. Roberto De Almeida
http://dealmeida.net/
http://lattes.cnpq.br/1858859813771449
:wq

Luiz Alexandre Guerra

unread,
May 8, 2009, 8:08:34 AM5/8/09
to py...@googlegroups.com
Hi Rob!

Thanks for your quickly reply! It was just the time for a cup of coffe!

You are right. I forgot the proxy. But... how can I authenticate the proxy?

[]'s

Guerra

Roberto De Almeida

unread,
May 8, 2009, 9:12:27 AM5/8/09
to py...@googlegroups.com
Back from my tea. For an authenticated proxy:

>>> import httplib2
>>> from pydap.util import socks
>>> import pydap.lib
>>> pydap.lib.PROXY = httplib2.ProxyInfo(
... socks.PROXY_TYPE_HTTP, 'localhost', 8000, proxy_user="laguerra", proxy_pass="***")

I'll update the website with this information.

Cheers,
--Rob

Luiz Alexandre Guerra

unread,
May 8, 2009, 7:47:35 PM5/8/09
to py...@googlegroups.com
Oi Rob,

Unfortunately, it didn't work. For some reason it doesn't get access to the proxy server. I spent all day working together with two guys from IT team. We've monitored the connection between client and proxy server using netstat and it seems like it doesn't actually reach the proxy. All we got was 403 Forbidden error message. They've got some partial success using urllib (without pydap), when they had some error message about authentication. All tries with httplib2 were unfruitful. Well, we're going to keep trying on Monday. I'll try to compile some information to send you.

But now, I'm home and there is no proxy to bother me. I followed all the examples of pydap client manual. Everything goes right until trying to download some data. Follows the error message:

==============================================================
In [65]: print ssh[0,10:14,10:14]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/guerra/Projetos/python/XBTpy/src/python/<ipython console> in <module>()

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/model.pyc in __getitem__(self, key)
    560             return StructureType.__getitem__(self, key)
    561         else:
--> 562             return self.array[key]
    563
    564     @property

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/model.pyc in __getitem__(self, index)
    189     def __lt__(self, other): return self.data < other
    190     def __iter__(self): return iter(self.data)
--> 191     def __getitem__(self, index): return self.data[index]
    192     def __len__(self): return len(self.data)
    193

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/proxy.pyc in __getitem__(self, index)
    100
    101         resp, data = request(url)
--> 102         dds, xdrdata = data.split('\nData:\n', 1)
    103         dataset = DDSParser(dds).parse()
    104         data = data2 = DapUnpacker(xdrdata, dataset).getvalue()

ValueError: need more than 1 value to unpack
=================================================================

Do you have any idea what's going on? My dataset is:

hycom = open_url('http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis')

Thanks for your attention, friend.

[]'s

Alex Guerra

Roberto De Almeida

unread,
May 8, 2009, 10:55:09 PM5/8/09
to py...@googlegroups.com
Hi, Guerra.

Unfortunately the THREDDS server is broken and can't understand a valid request. Pydap tries to retrieve the data from the URL

    http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh.ssh[0:1:0][10:1:13][10:1:13]

but the server returns the error message "The variable `ssh.ssh' was not found in the dataset.". The URL is perfectly valid and should return the SSH data alone (ie, not with the data for the dimensions). I've seen this before with some servers. There's a workaround for cases like this -- opening a given URL directly -- but it also doesn't seem to work in this case. I'll look at the best way to fix this and then I'll get back to you.

As for the proxy, I never used the client with an authenticated proxy before. I'll also do some testing so that we at least know if the problem is with your proxy or the syntax that I suggested you to use.

Cheers,
--Rob

Roy Mendelssohn

unread,
May 8, 2009, 11:05:38 PM5/8/09
to py...@googlegroups.com
The hycom server appears to be throwing errors period (put http://hycom.coaps.fsu.edu:8080/thredds/dodsC/glb_nrt_analysis.html
in your browser).

I would suggest letting them know and getting that fixed before your
try anything else.

-Roy
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Men...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"

Luiz Alexandre Guerra

unread,
May 9, 2009, 9:16:26 AM5/9/09
to py...@googlegroups.com
Rob,

I think the server is working. Try this: http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh[0:1:0][10:1:13][10:1:13]

What do you think?

Cheers,

Alex Guerra

Roberto De Almeida

unread,
May 11, 2009, 9:08:35 AM5/11/09
to pydap
It's partially working.  When an Opendap client does a request for "?ssh" it's saying "give me all the data (in this case) for the variable ssh". Since ssh is a Grid, this will return the ssh data itself (a 3D array), together with the data for its dimensions (time, lat, lon). The response contains then 4 arrays -- a 3D one with the data, and 3 1D vectors defining the dimensions.

Opendap has a syntax for requesting only a child variable. An Opendap client can do a request for "?ssh.MT", and the server should return only the time axis, in this case. It's also possible to do a request for "?ssh.ssh", which means that you really want only the data itself, but not any of the defining axes.

When you slice a Pydap variable, it does a request only for that particular data. When you slice your ssh variable, it does a request for, you guessed, "?ssh.ssh". This works fine with 95% of the servers, but I've seen some servers that complain of this perfectly valid request. I usually do what Roy recommend, I send an email to the administrator notifying them of the problem.

With this mind, I wrote a function for Pydap that allows you to directly open a given URL, like the one you quoted below. From that URL it should be possible to get the data you want, but for some reason it's also not working with this dataset. I'm taking a look in why this is happening, it should be something trivial to fix and at least you should be able to get the data you want.

I'll get back to you later in the afternoon, I'm currently on the bus travelling to INPE.

Cheers,
--Rob

Roberto De Almeida

unread,
May 11, 2009, 9:10:42 AM5/11/09
to pydap
Hi, Guerra.

Can you send me the code you're using to connect to the proxy (minus authentication info). Are you setting the proxy type correctly (http vs. https)?. Also, did you set the proxy before any other pydap imports (I'm not sure if it matters, but I think it's better to do other imports after setting the proxy).

Thanks,
--Rob


2009/5/8 Luiz Alexandre Guerra <lalex...@gmail.com>

Roberto De Almeida

unread,
May 11, 2009, 9:17:46 AM5/11/09
to py...@googlegroups.com
2009/5/11 Roberto De Almeida <rob...@dealmeida.net>
With this mind, I wrote a function for Pydap that allows you to directly open a given URL, like the one you quoted below. From that URL it should be possible to get the data you want, but for some reason it's also not working with this dataset. I'm taking a look in why this is happening, it should be something trivial to fix and at least you should be able to get the data you want.

Funny, I just tried it now and it's working. Looks like their server is a little bit flacky.

Here's how you do it:

  from pydap.client import open_dods
  dataset = open_dods('http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh[0:1:0][10:1:13][10:1:13]')
  print dataset.ssh.data  # here's your data

Note that you're opening the complete ".dods" URL, with extension. This will open the URL and build a dataset without metadata, only the variables and their data. You can add the metadata this way (untested):

  from pydap.util.http import request
  from pydap.parsers.das import DASParser
  respdas, das = request('http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.das')
  dataset = DASParser(das, dataset).parse()

Basically you retrieve the DAS response and parses it applying to the dataset you opened before.

Best,
--Rob

 

I'll get back to you later in the afternoon, I'm currently on the bus travelling to INPE.

Cheers,
--Rob


On Sat, May 9, 2009 at 10:16 AM, Luiz Alexandre Guerra <lalex...@gmail.com> wrote:
Rob,

I think the server is working. Try this: http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh[0:1:0][10:1:13][10:1:13]

What do you think?

Cheers,

Alex Guerra


On Fri, May 8, 2009 at 11:55 PM, Roberto De Almeida <rob...@dealmeida.net> wrote:
Hi, Guerra.

Unfortunately the THREDDS server is broken and can't understand a valid request. Pydap tries to retrieve the data from the URL

    http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh.ssh[0:1:0][10:1:13][10:1:13]

but the server returns the error message "The variable `ssh.ssh' was not found in the dataset.". The URL is perfectly valid and should return the SSH data alone (ie, not with the data for the dimensions). I've seen this before with some servers. There's a workaround for cases like this -- opening a given URL directly -- but it also doesn't seem to work in this case. I'll look at the best way to fix this and then I'll get back to you.

As for the proxy, I never used the client with an authenticated proxy before. I'll also do some testing so that we at least know if the problem is with your proxy or the syntax that I suggested you to use.

Roberto De Almeida

unread,
May 11, 2009, 12:44:09 PM5/11/09
to py...@googlegroups.com
On Mon, May 11, 2009 at 10:17 AM, Roberto De Almeida <rob...@dealmeida.net> wrote:
Here's how you do it:

  from pydap.client import open_dods
  dataset = open_dods('http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.dods?ssh[0:1:0][10:1:13][10:1:13]')
  print dataset.ssh.data  # here's your data

Note that you're opening the complete ".dods" URL, with extension. This will open the URL and build a dataset without metadata, only the variables and their data. You can add the metadata this way (untested):

  from pydap.util.http import request
  from pydap.parsers.das import DASParser
  respdas, das = request('http://hycom.coaps.fsu.edu/thredds/dodsC/glb_analysis.das')
  dataset = DASParser(das, dataset).parse()

Basically you retrieve the DAS response and parses it applying to the dataset you opened before.

I implemented this in the Mercurial repository (http://code.google.com/p/pydap/source/checkout). Just call the function with the optional ``get_metadata``:

  dataset = open_dods(DODS_URL, get_metadata=True)

And the metadata from the DAS will be downloaded automatically and used to populate the dataset.

--Rob

 

Luiz Alexandre Guerra

unread,
May 15, 2009, 9:08:34 PM5/15/09
to py...@googlegroups.com
> Guerra wrote:
> Unfortunately, it didn't work. For some reason it doesn't get access to the proxy server. I spent all day working
> together with two guys from IT team. We've monitored the connection between client and proxy server using netstat
> and it seems like it doesn't actually reach the proxy. All we got was 403 Forbidden error message. They've got some
> partial success using urllib (without pydap), when they had some error message about authentication. All tries with
> httplib2 were unfruitful. Well, we're going to keep trying on Monday. I'll try to compile some information to send
> you.



Hi Rob,

Back to the proxy problem... As I told you, the IT guys and I have spent time doing some tests to get access to external opendap servers through the company's proxy that is based on NTLM protocol. Follows a summary of the tests:

1. Tests with urllib2 were succeeded to connect external servers. Should be noted that ez_setup.py (used to install pydap, for example) downloads egg files without issues and probably uses environmental variables to authenticate at the proxy (my OS is Linux Ubuntu 8.10).

2. As all tries with httplib2 were unfruitful, we decided for an alternative way. We used ntlmaps-0.9.9.0.1 to intermediate the communications with the company's NTLM proxy. Then something really weird took place... When we run ntlmaps and python pydap script (see it below) in a Windows OS we get success... When we tried exactly the same in a Linux box (Ubuntu 8.10 and 9.04) we got 403 Forbidden as response. Then, we run ntlmaps in a Windows box and from a Linux tried a connection with environmental variables (http_proxy=http://login:pass...@10.10.10.32:5865) pointing to the Windows box with ntlmaps... once again we got success. But when we run ntlmaps in Linux, and set the variables to localhost:5865, we were succeeded with wget. The problem happens only with pydap. We used Python 2.5 in both OS.

Why urllib2 is able to get proxy info from environmental variables and talk to NTLM proxy server and httplib2 not? Do you think it would be possible to make pydap behaviour the same way urllib2 programs do?

#======================================================
from pydap.client import open_url
import pydap.lib, httplib2, socks
pydap.lib.PROXY = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, 'localhost', 5865) # replace localhost
                                                                                                                                          # by ip of remote server
                                                                                 # we also tried proxy_user="laguerra", proxy_pass="***")

dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc')

var = dataset['SST']
var.shape
var.type
print var[0,10:14,10:14]
#======================================================

Cheers,

Guerra

Roberto De Almeida

unread,
May 21, 2009, 10:52:11 AM5/21/09
to py...@googlegroups.com
Hi, Guerra!

If you can connect using urllib2, you can replace the httplib2 code like in this example:

  http://pydap.org/client.html#cas

Here's an untested way of doing this (also at http://etherpad.com/9ldgcACHH0):

import urllib2

def install_urllib2_client():
    def new_request(url):
        f = urllib2.urlopen(url)
        headers = dict(f.info().items())
        body = f.read()
        return headers, body
    
    from pydap.util import http
    http.request = new_request

Then run install_urllib2.client() before anything else. You can customize the function more, if you want.

I'll open a ticket in the httplib2 website about this problem.

Chers,
--Rob

Luiz Alexandre Guerra

unread,
May 21, 2009, 3:20:23 PM5/21/09
to py...@googlegroups.com
Hi Rob,

Thank you very much! It worked perfectly. There is no need for change http_proxy variable anymore or even run an alternative local proxy server like ntlmmaps. That's cool and simple.

By the way, the solution you proposed to access the thredds opendap server using open_dods also worked perfectly! Thanks once again!

Cheers,

Alex Guerra

Roberto De Almeida

unread,
May 21, 2009, 3:24:03 PM5/21/09
to py...@googlegroups.com
On Thu, May 21, 2009 at 4:20 PM, Luiz Alexandre Guerra <lalex...@gmail.com> wrote:
Thank you very much! It worked perfectly. There is no need for change http_proxy variable anymore or even run an alternative local proxy server like ntlmmaps. That's cool and simple.

By the way, the solution you proposed to access the thredds opendap server using open_dods also worked perfectly! Thanks once again!

And pydap saves the day!

(Sorry, couldn't resist... ;-) )

--Rob

Luiz Alexandre Guerra

unread,
May 21, 2009, 7:56:51 PM5/21/09
to py...@googlegroups.com
:)))))))

You deserve it! I should say pydap has saved the entire week.

Thanks buddy.

Guerra

Roberto De Almeida

unread,
May 22, 2009, 9:41:15 AM5/22/09
to py...@googlegroups.com
Thanks!

I updated the documentation on the website:

  http://pydap.org/client.html#configuring-a-proxy

Cheers,
--Rob

Luiz Alexandre Guerra

unread,
May 22, 2009, 7:15:15 PM5/22/09
to py...@googlegroups.com
Houston, we've got a problem!

The proxy problem was fixed. But, there is another issue that I did not realize yesterday. Simply, I opened the remote dataset, got the shape and type but I forgot to get the data. So, follows a small script to demonstrate the issue and the system error message.

================================================================================
# Debugging pydap client
import sys
import logging
logging.basicConfig(stream=sys.stdout)
logger = logging.getLogger('pydap')
logger.setLevel(logging.INFO)

# NTLM proxy fix

import urllib2
def install_urllib2_client():
    def new_request(url):
        f = urllib2.urlopen(url)
        headers = dict(f.info().items())
        body = f.read()
        return headers, body
    from pydap.util import http
    http.request =  new_request
   
install_urllib2_client()

# Pydap.client test
from pydap.client import open_url

dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc')
var = dataset['SST']
var.shape
var.type
print var[0,10:14,10:14]
================================================================================


In [1]: run pydap_test

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/guerra/pydap_test.py in <module>()
     25 var.shape
     26 var.type
---> 27 print var[0,10:14,10:14]
     28
     29

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/model.pyc in __getitem__(self, key)
    560             return StructureType.__getitem__(self, key)
    561         else:
--> 562             return self.array[key]
    563
    564     @property

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/model.pyc in __getitem__(self, index)
    189     def __lt__(self, other): return self.data < other
    190     def __iter__(self): return iter(self.data)
--> 191     def __getitem__(self, index): return self.data[index]
    192     def __len__(self): return len(self.data)
    193

/usr/lib/python2.5/site-packages/Pydap-3.0.b.4-py2.5.egg/pydap/proxy.pyc in __getitem__(self, index)
    100
    101         resp, data = request(url)
--> 102         dds, xdrdata = data.split('\nData:\n', 1)
    103         dataset = DDSParser(dds).parse()
    104         data = data2 = DapUnpacker(xdrdata, dataset).getvalue()

ValueError: need more than 1 value to unpack
WARNING: Failure executing file: <pydap_test.py>

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sounds like it is missing something.

Cheers,

Alex Guerra

Roberto De Almeida

unread,
May 22, 2009, 7:32:44 PM5/22/09
to py...@googlegroups.com
Hmmm...

The logging code didn't do any good, because the monkeypatched function doesn't use it. I first changed it to log the requested URLs:

import urllib2, logging
def install_urllib2_client():
    def new_request(url):
        # log the requested URL
        log = logging.getLogger('pydap')

        log.info('Opening %s' % url)

        f = urllib2.urlopen(url)
        headers = dict(f.info().items())
        body = f.read()
        return headers, body
    from pydap.util import http
    http.request =  new_request

And then I noticed that the client is requesting the data from the url:

  http://test.opendap.org/dap/data/nc/coads_climatology.nc.dods?SST.SST[0:1:0][10:1:13][10:1:13]&

But the server is complaining about the ampersand "&" at the end of the URL (which should be valid, IMHO). I noticed that the original request function strips "&" and "?" from the end of the URL, so I changed the new function to:

import urllib2, logging
def install_urllib2_client():
    def new_request(url):
        # log the requested URL
        log = logging.getLogger('pydap')

        log.info('Opening %s' % url)

        # strip & and ? from the end of the url
        f = urllib2.urlopen(url.rstrip('?&'))

        headers = dict(f.info().items())
        body = f.read()
        return headers, body
    from pydap.util import http
    http.request =  new_request

And then it worked! :)

Best,
--Rob

Luiz Alexandre Guerra

unread,
May 22, 2009, 8:09:47 PM5/22/09
to py...@googlegroups.com
Yes, it works!

Tomorrow I'll do more tests.

Thanks, Rob.

Cheers,

Alex Guerra
Reply all
Reply to author
Forward
0 new messages