Retrieving USGS Historical flow data using ULMO

Casey Smith

unread,

Apr 30, 2014, 11:20:35 AM4/30/14

to ul...@googlegroups.com

I have been using Ulmo to retrieve USGS NWIS Instantaneous data. Great tool, thanks for the effort!

It doesn't seem that Ulmo can retrieve historical data from the USGS (long term means, etc), as available here: http://waterservices.usgs.gov/rest/Statistics-Service-Test-Tool.html

Does this capability exist? In the works?

Thanks,

Casey

Dharhas Pothina

unread,

Apr 30, 2014, 11:33:11 AM4/30/14

to ul...@googlegroups.com

Casey,

The capability to access that particular usgs service does not exist within ulmo at the moment and it's not currently in the works. New services typically get added to ulmo when we need it for our day jobs (I'm working on a 'go get me raster data' service right now) or if someone builds one and contributes it back to ulmo.

That said however, the statistics available from that particular site (long term means, percentiles etc) can be calculated very easily using the python 'pandas' package after pulling the historic data from nwis using one or two lines of code. That's what we did for the waterdatafortexas.org website. I can probable give you an example if you are interested. Now these values may not exactly match the USGS statistics survey depending on any extra QA the USGS may have done (They might, I've never checked)

- dharhas

--
You received this message because you are subscribed to the Google Groups "ulmo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ulmo+uns...@googlegroups.com.
To post to this group, send email to ul...@googlegroups.com.
Visit this group at http://groups.google.com/group/ulmo.
For more options, visit https://groups.google.com/d/optout.

Casey Smith

unread,

Apr 30, 2014, 12:08:26 PM4/30/14

to ul...@googlegroups.com

Thanks Dharhas!

If I understand correctly, I would pull down the historical data for the day in question and calculate the mean myself? At the moment, my scripts run every hour (with no saved data) so that would probably be a lot of data transfer unless I changed my approach. That said, I would love to see an example if you could!

thanks, casey

Dharhas Pothina

unread,

Apr 30, 2014, 12:19:01 PM4/30/14

to ul...@googlegroups.com

So there is version of the nwis services that is for exactly this kind of use case. Basically the first time you hit the nwis service for a particular site it downloads the entire historical record (which depending on the period of record can take some time) and caches them in an hdf5 binary file on your local machine, on subsequent downloads however it only retrieves newer data that what you have locally so its pretty quick. In python however you have access to the entire historical dataset. The whole thing is transparent to you. You just need to use ulmo.usgs.nwis.hdf5 and set cache=True.

If you give me a site_code and parameter that you are interested in I'll see if I can quickly code up an example.

- dharhas

Casey Smith

unread,

Apr 30, 2014, 12:22:28 PM4/30/14

to ul...@googlegroups.com

Great. Thanks for the help. An example site_code would be '06043500' and parameter would be '00060'

Thanks!

Dharhas Pothina

unread,

Apr 30, 2014, 12:32:34 PM4/30/14

to ul...@googlegroups.com

what are you wanting to calculate? annual means?

- dharhas

Casey Smith

unread,

Apr 30, 2014, 12:42:09 PM4/30/14

to ul...@googlegroups.com

Oh sorry, I am wanting to calculate the historical mean discharge for the day in question.

cs

Dharhas Pothina

unread,

Apr 30, 2014, 1:20:01 PM4/30/14

to ul...@googlegroups.com

in that case it is probably best to use the daily mean service from nwis (saves you having to calculate the daily means from the instantaneous)

see:

https://gist.github.com/dharhas/e93ac930e32e9c80c738

for a simple example.

- dharhas

Casey Smith

unread,

May 1, 2014, 4:02:06 PM5/1/14

to ul...@googlegroups.com

Thanks Dharhas. That's pretty amazing package support! I ended up munging the data from the USGS NWIS Statistics service, since I got caught in dependency hell trying to get HDF5 installed. Thanks for your help!

Dharhas Pothina

unread,

May 1, 2014, 4:19:15 PM5/1/14

to ul...@googlegroups.com

anaconda or canopy are a good way to get all the packages you need.

- dharhas

Emilio Mayorga

unread,

May 2, 2014, 12:23:18 AM5/2/14

to ul...@googlegroups.com

FYI, I've had no problems building ulmo for anaconda. Would be happy to provide instructions. It's straightforward.

-Emilio

Joseph Gutenson

unread,

Mar 19, 2015, 9:18:16 PM3/19/15

to ul...@googlegroups.com

Dharhas,

If I am working with water level and care only for a long term average, how would the code in your link change?

Thanks,
Joseph

Dharhas Pothina

unread,

Mar 20, 2015, 10:01:42 AM3/20/15

to ul...@googlegroups.com

Joseph,

the parameter code for daily mean gage height would be '00065:00003'

If all you want is the overall statistics of the timeseries, you can use just use df.describe() or df.mean() after line 18 in the gist and ignore the rest. df.describe() gives you the min,max,mean,standard deviation, various percentiles etc.

- dharhas

Joseph Gutenson

unread,

Mar 24, 2015, 6:01:38 PM3/24/15

to ul...@googlegroups.com

Dharhas,

Thank you for the quick reply. I actually figured out my own backwoods approach to the solution. I didn't use pandas in this case. Take a look below for reference:

def find_daily_mean_historic_depth(gage, begin_date, end_date, shp):

# download and cache site data (this will take a long time the first time)

# currently downloads all available parameters

#nwis.hdf5.update_site_data(gage)

# read daily mean water level data from cache (statistics code 00003)

data = nwis.get_site_data(gage, parameter_code='00065', start=begin_date, end=end_date)

count = 0

total_height = 0

site_name = data['00065:00011']['site']['name']

lat = data['00065:00011']['site']['location']['latitude']

long = data['00065:00011']['site']['location']['longitude']

for item in data['00065:00003']['values']:

count=count+1

total_height=total_height+float(item['value'])

mean = float(total_height)/int(count)

This served my ends. Hope it can help someone else! Great job with this though! Solves a lot of issues!

-Joseph

Dharhas Pothina

unread,

Mar 25, 2015, 2:57:04 PM3/25/15

to ul...@googlegroups.com

Joseph,

Glad you found something that worked for you, as an fyi, your code will probably break/give spurious results if there is bad/missing data in what you get from the USGS.

Also if you are going to be doing a lot of analysis/calculations in python, I highly recommend taking the time to try out pandas, there are good tutorials online and it is a very powerful tool for data munging. It is also going to be a lot faster that using a python loop.

- dharhas

Reply all

Reply to author

Forward