A couple of us in NANOOS are trying to figure out how to interact with
THREDDS/TDS with Python.
Any tips??
Of course, the OPeNDAP part we can take care of with pyDAP. But we'd
prefer to be able to take advantage of the dataset catalogs provided
by THREDDS in a more generic way.
So far we're not finding any convenient, pre-existing solutions.
Ideally, what we'd like is a python module that reads the TDS XML
catalogs into pythonic data structures that can be iterated, queried,
etc. So far we've found only two options, neither of which is
satisfying:
** Bare XML parsing. Yikes. I'd rather have something friendlier.
We found some python code developed for the GALEON project a few years ago; eg:
http://www.unidata.ucar.edu/projects/THREDDS/GALEON/Demos/python/WildCardGet/thredds.py
But it's pretty old, and uses clunky, un-pythonic XML parsing (minidom
rather than ElementTree).
** IS-ENES Thredds Harvester
https://verc.enes.org/collaboration/projects/datainfrastructure/wiki/Thredds_Harvester
It looks exactly like what we're looking for, except it's integrated
with other, highly specific requirements -- eg, it requires
SQLAlchemy, and apparently a relational database configuration.
I appreciate any help or pointers. Cheers,
-Emilio
Emilio Mayorga
NANOOS
-Roy
> --
> You received this message because you are subscribed to the Google Groups "ioos_tech" group.
> To post to this group, send email to ioos...@googlegroups.com.
> To unsubscribe from this group, send email to ioos_tech+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/ioos_tech?hl=en.
>
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Men...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
-Roy
On Apr 11, 2012, at 3:09 PM, Emilio Mayorga wrote:
I'll contact Steve separately.
Cheers,
-Emilio
I'm not sure what your ultimate goal is, but for the IOOS Modeling
Testbed, we are running a GI-CAT service on tomcat that harvests
metadata from multiple endpoints, including the THREDDS NcISO Service.
So we harvest data from multiple thredds servers, with each catalog on
a customizable harvesting schedule.
We then query GI-CAT's ISO metadata database using geo-enabled
OpenSearch (allowing for geospatial & temporal searching combined with
free-text search), or OGC CSW (for more queries involving key:value
pairs such as "title" "keywords", etc).
Both CSW and OpenSearch service calls and responses can be handled in
Python, Matlab or whatever.
-Rich
--
Dr. Richard P. Signell (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
Thanks so much for that code fragment. It'll very helpful, and I'm
pretty sure I'll make use of it. I'll try to report back to this list
some other time.
Cheers,
-Emilio
On Wed, Apr 11, 2012 at 6:57 PM, Roberto De Almeida
<rob...@dealmeida.net> wrote:
> I have some code here that does what you want — read the catalog.xml and
> return the datasets so you can iterate over them. It even requests
> additional catalogs referenced by the root catalog.xml, in a recursive way.
> It's pretty simple:
>
> from lxml import etree
> from httplib2 import Http
> from pydap.client import open_url
>
> NAMESPACE = "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
>
> def crawl(catalog):
> resp, content = Http().request(catalog)
> xml = etree.fromstring(content)
> base = xml.find('.//{%s}service' % NAMESPACE)
> for dataset in xml.iterfind('.//{%s}dataset[@urlPath]' % NAMESPACE):
> yield urljoin(base.attrib['base'], dataset.attrib['urlPath'])
> for subdir in xml.iterfind('.//{%s}catalogRef' % NAMESPACE):
>
> for url in crawl(subdir.attrib['{http://www.w3.org/1999/xlink}href']):
> yield url
>
> catalog = "http://opendap.ccst.inpe.br/catalog.xml"
> datasets = [open_url(url) for url in crawl(catalog)]
>
>
> You mention "generic" datasets, do you mean datasets that are not served
> through Opendap?
>
> Cheers,
> --Rob
> --
> Dr. Roberto De Almeida
> http://dealmeida.net/
> http://lattes.cnpq.br/1858859813771449
> :wq
Thanks for that description. Steve Hankin sent me a description of the
UAF work separately, and together with your message, I think I now
have a fairly good picture of how you're going about this. My own
goals are more modest, but it's very useful to learn about your
approach.
Cheers,
-Emilio
Did you solve your problem? It sounds like we have similar requirements, and I'd appreciate any hints you can give as to a suitable approach.
Magnus
--
You received this message because you are subscribed to the Google Groups "ioos_tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ioos_tech+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.