Tips for accessing THREDDS catalogs and data with Python?

953 views
Skip to first unread message

Emilio Mayorga

unread,
Apr 11, 2012, 6:09:47 PM4/11/12
to ioos...@googlegroups.com, Charles Seaton
Howdy,

A couple of us in NANOOS are trying to figure out how to interact with
THREDDS/TDS with Python.

Any tips??

Of course, the OPeNDAP part we can take care of with pyDAP. But we'd
prefer to be able to take advantage of the dataset catalogs provided
by THREDDS in a more generic way.

So far we're not finding any convenient, pre-existing solutions.
Ideally, what we'd like is a python module that reads the TDS XML
catalogs into pythonic data structures that can be iterated, queried,
etc. So far we've found only two options, neither of which is
satisfying:

** Bare XML parsing. Yikes. I'd rather have something friendlier.
We found some python code developed for the GALEON project a few years ago; eg:
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/Demos/python/WildCardGet/thredds.py
But it's pretty old, and uses clunky, un-pythonic XML parsing (minidom
rather than ElementTree).

** IS-ENES Thredds Harvester
https://verc.enes.org/collaboration/projects/datainfrastructure/wiki/Thredds_Harvester
It looks exactly like what we're looking for, except it's integrated
with other, highly specific requirements -- eg, it requires
SQLAlchemy, and apparently a relational database configuration.

I appreciate any help or pointers. Cheers,
-Emilio

Emilio Mayorga
NANOOS

Roy Mendelssohn

unread,
Apr 11, 2012, 6:16:18 PM4/11/12
to Emilio Mayorga, Charles Seaton, Roberto De Almeida, ioos...@googlegroups.com
I do not know this for certain, but I would be very surprised if something along these lines doesn't exist in cdat or cdat-lite.

-Roy

> --
> You received this message because you are subscribed to the Google Groups "ioos_tech" group.
> To post to this group, send email to ioos...@googlegroups.com.
> To unsubscribe from this group, send email to ioos_tech+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/ioos_tech?hl=en.
>

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Men...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

Roy Mendelssohn

unread,
Apr 11, 2012, 6:17:48 PM4/11/12
to Emilio Mayorga, Charles Seaton, ioos...@googlegroups.com
Also, now that I think of it, I believe the present version of the THREDDS harvestor developed for the UAF program to make clean catalogs is writen in python. COntact Steve Hankin.

-Roy
On Apr 11, 2012, at 3:09 PM, Emilio Mayorga wrote:

Emilio Mayorga

unread,
Apr 11, 2012, 6:52:17 PM4/11/12
to Roy Mendelssohn, Charles Seaton, ioos...@googlegroups.com
Thanks for both tips, Roy! I wasn't aware of CDAT-Lite; it looks promising.

I'll contact Steve separately.

Cheers,
-Emilio

Rich Signell

unread,
Apr 12, 2012, 7:00:46 AM4/12/12
to ioos...@googlegroups.com, Roy Mendelssohn, Charles Seaton
Emilio,

I'm not sure what your ultimate goal is, but for the IOOS Modeling
Testbed, we are running a GI-CAT service on tomcat that harvests
metadata from multiple endpoints, including the THREDDS NcISO Service.

So we harvest data from multiple thredds servers, with each catalog on
a customizable harvesting schedule.

We then query GI-CAT's ISO metadata database using geo-enabled
OpenSearch (allowing for geospatial & temporal searching combined with
free-text search), or OGC CSW (for more queries involving key:value
pairs such as "title" "keywords", etc).

Both CSW and OpenSearch service calls and responses can be handled in
Python, Matlab or whatever.

-Rich

--
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598

Emilio Mayorga

unread,
Apr 15, 2012, 3:14:14 AM4/15/12
to Roberto De Almeida, ioos...@googlegroups.com, Charles Seaton
Hi Roberto,

Thanks so much for that code fragment. It'll very helpful, and I'm
pretty sure I'll make use of it. I'll try to report back to this list
some other time.

Cheers,
-Emilio


On Wed, Apr 11, 2012 at 6:57 PM, Roberto De Almeida
<rob...@dealmeida.net> wrote:
> I have some code here that does what you want — read the catalog.xml and
> return the datasets so you can iterate over them. It even requests
> additional catalogs referenced by the root catalog.xml, in a recursive way.
> It's pretty simple:
>
> from lxml import etree
> from httplib2 import Http
> from pydap.client import open_url
>
> NAMESPACE = "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
>
> def crawl(catalog):
>     resp, content = Http().request(catalog)
>     xml = etree.fromstring(content)
>     base = xml.find('.//{%s}service' % NAMESPACE)
>     for dataset in xml.iterfind('.//{%s}dataset[@urlPath]' % NAMESPACE):
>         yield urljoin(base.attrib['base'], dataset.attrib['urlPath'])
>     for subdir in xml.iterfind('.//{%s}catalogRef' % NAMESPACE):
>
>   for url in crawl(subdir.attrib['{http://www.w3.org/1999/xlink}href']):
>             yield url
>
> catalog = "http://opendap.ccst.inpe.br/catalog.xml"
> datasets = [open_url(url) for url in crawl(catalog)]
>
>
> You mention "generic" datasets, do you mean datasets that are not served
> through Opendap?
>
> Cheers,
> --Rob

> --
> Dr. Roberto De Almeida
> http://dealmeida.net/
> http://lattes.cnpq.br/1858859813771449
> :wq

Emilio Mayorga

unread,
Apr 15, 2012, 3:17:03 AM4/15/12
to ioos...@googlegroups.com, Charles Seaton
Hi Rich,

Thanks for that description. Steve Hankin sent me a description of the
UAF work separately, and together with your message, I think I now
have a fairly good picture of how you're going about this. My own
goals are more modest, but it's very useful to learn about your
approach.

Cheers,
-Emilio

mag....@gmail.com

unread,
May 9, 2016, 10:40:53 AM5/9/16
to ioos_tech
Hi Emilio

Did you solve your problem? It sounds like we have similar requirements, and I'd appreciate any hints you can give as to a suitable approach.

Magnus

Emilio Mayorga

unread,
May 10, 2016, 8:47:08 AM5/10/16
to ioos...@googlegroups.com
Hi Magnus,

That thread is four years old. Back then, I ended up doing THREDDS catalog parsing "manually", by figuring how the XML catalog structure of just the elements we needed, and traversing the XML's. Not the best way to do things!

These days, I've heard good things about unidata's siphon library/utilities:
I've never used it myself, but I saw a small demo a few months ago and was impressed. That's what I would recommend.

Cheers,
-Emilio


--
You received this message because you are subscribed to the Google Groups "ioos_tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ioos_tech+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rsignell

unread,
May 18, 2016, 5:08:55 PM5/18/16
to ioos_tech
There are quite a few thredds crawler packages floating around on github

try googling this:

thredds crawler site:github.com

I would like to see if we could transfer https://github.com/asascience-open/thredds_crawler to http://github.com/ioos/thredds_crawler to encourage contributions from folks in the wider IOOS community.   Or perhaps http://github.com/unidata/thredds_crawler, for an even wider community?


-Rich
Reply all
Reply to author
Forward
0 new messages