Re: ULMO's specifics

31 views
Skip to first unread message

Dharhas Pothina

unread,
May 29, 2014, 12:12:03 PM5/29/14
to Rohit Khattar, ul...@googlegroups.com
Hey Rohit,

I'm copying the ulmo mailing list (ul...@googlegroups.com) so that there will be a record of this discussion.

Addressing your points in order.

First a meta point. 

ulmo isn't trying to solve the entire problem of multiple data formats and how to represent them all in a consistent way. That is a very hard problem which many folks have tried and failed to solve in a satisfactory way. Instead ulmo is solving the 'transport' problem in a pragmatic way. i.e. go get the data and give it back to me in a reasonable format, either as python dictionaries or pandas dataframes (see pandas.pydata.org). It does the 'data getting' either by web scraping, web services or hitting ftp sites, whatever makes the most sense for the dataset in question. On the 'local representation' side, it is still your responsibility to know what the parameters mean etc. 

On Wed, May 28, 2014 at 8:33 PM, Rohit Khattar <ro...@byu.edu> wrote:
Hey, 

First of all a great application. Handling all that multiple datasets can be a huge pain in the butt. 
Anyways, I am a grad student at BYU and I am working on developing a translator module for various time series formats. What I was wondering is that how ULMO works in the background? Is there a technical documentation I can refer to? 


for documentation see: http://ulmo.readthedocs.org/

the API reference should help, or you can just look at the python source code on github: https://github.com/ulmo-dev/ulmo

 
Some specific questions are : 

1.) What data formats does it support for CUAHSI WOF? Just WaterML 1.1? We are upgrading it to WaterML 2.0 and if we wanted to add support for that in ULMO, how would we go about doing that? 


WaterML 2.0 has come up several times on the mailing list and the answer is a bit tricky. So with WaterML1/1.1 there is something called WOF, WaterOneFlow. Think of it like this: WOF defines questions you can ask (i.e. give me a list of sites, what parameters exist at a site, etc) and what the responses look like (i.e an XML format for the various downloaded files). With WaterML 2.0, it is only the timeseries download format that was standardized. There is no standard on how to ask the questions (i.e.  no WOF2). So you could write a parser that parses WaterML2 files but thats about it. Also there are hardly any WaterML2 services in the wild so its difficult to write a generalized parser without examples to test against. The WaterML2 spec is quite complicated and it is still up in the air on how to implement it. You are welcome to give it a try. 
 
There were a few other folks on the mailing list who were interested in working on WaterML2 not sure if they have done anything but they might have made some progress.

2.) What data store are we using, just making an array of time series values? 


As I said earlier this is fairly pragmatic. Data is generally converted to a python dict or pandas dataframe. Local caches of data are stored either in the original format (for some datasets) or as hdf5 files.
 
3.) Since you are mapping all these different sources, is there a way to reverse it? Can I use this to pull data from CIRS and turn it into a CUAHSI compatible WaterML? What would be needed to done to add something like that? 


Not in a simple fashion really. There was another project that we have since abandoned called WOFpy that aimed at doing this (https://github.com/swtools/WOFpy). A couple of folks are still using this to serve data in WOF/WaterML1.1 from other datasets, but it hasn't been actively developed in 2 or so years. You would have to do it on a case by case basis and it wouldn't really work for some datasets. i.e. CIRS data is by climate division (if I recall correctly) and WaterML1.1 can only deal with point datasets.
 
Thanks

Rohit Khattar
Graduate Student
Civil & Environmental Engineering
BYU


hope that helps. If you have further questions I would recommend sending it to the mailing list so others can chime in and we can keep a record.

- dharhas 

Andy Wilson

unread,
May 29, 2014, 3:45:10 PM5/29/14
to ul...@googlegroups.com

Hi Rohit.

Parsing WaterML2 from CUAHSI WOF services would be a very welcome addition. There is an open issue on our issue tracker with some information on what needs to be done: https://github.com/ulmo-dev/ulmo/issues/77 . It's kind of rambly, but there's at least some relevant discussion there if you want to follow up with more specific questions.


As Dharhas said, ulmo is just a collection of apis for parsing data from various online sources into python data structures with some optimizations along the way. That's probably all it will ever be, but it could serve as a component of a larger data pipe-line that includes transformations into common data formats and services.


Also, there is another project called dat in the works that you might be interested in. It is explicitly being designed to handle some of the things you are asking about in your third question: https://github.com/maxogden/dat/blob/master/docs/what-is-dat.md

It is still in pre-alpha and is a very ambitious project, but it has a lot of momentum and is being developed by very smart people who understand the problem space well and want to build a larger community of data translation infrastructure.

-andy




--
You received this message because you are subscribed to the Google Groups "ulmo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ulmo+uns...@googlegroups.com.
To post to this group, send email to ul...@googlegroups.com.
Visit this group at http://groups.google.com/group/ulmo.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages