--
You received this message because you are subscribed to the Google Groups "US Government APIs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to us-government-a...@googlegroups.com.
To post to this group, send email to us-govern...@googlegroups.com.
Visit this group at http://groups.google.com/group/us-government-apis.
FWIW, in our Green Button architecture we have data identified in packages called “resources”. The resources are linked via metadata. They are available individually and in collections through REST API. In addition, they can be accumulated into “bulk” data sets also available by API and SFTP through the common interface.
In Green Button, the data provider is typically an electric utility. They publish electric/gas meter data daily. For third party service providers that can have many thousands of customers with a single utility, they fetch the daily data via the bulk interface.
Green Button data is described in an XSD and usually transferred as XML so that data can be validated against the schema. JSON is a future export planned. We also provide an XSLT that can transform the XML to CSV for those who want what is essentially time-stamped measurement data in that form.
HTH,
Marty
Dr. Martin J. Burns,
National Institute of Standards and Technology
Smart Grid & Cyber Physical Systems Program Office
Tel: 301-975-6283
Cel: 202-379-8021
I'm looking for some resources on Bulk Data to supplement an API.Our primary use case: New users of the existing API are often looking to 'download' a comprehensive data set for offline analysis. This creates a lot of work for them, and a lot of resource usage for us.A few things I'm contemplating:- How often to update? New data is continuously added to the data set. Currently ~5-million 'documents'; ~3k/week, 12k/month being added.
- Offer one large set, or break it down by some increment?
- I'm preferring JSON because of it's self describing nature. Also we will have fields with potentially very long text, so I'm thinking CSV isn't really a good fit.
- There is a lot of binary content - PDF, Word, etc. We have this text indexed in a full-text search engine, so we're considering including the contents as text - as our users are primarily looking to get at this text more so than the metadata.
- Should we instead export the metadata with API links to the binary downloads?
- Should we also offer the binary files as bulk zipped downloads - say in a tree structure?
Any feedback is greatly appreciated!
--
You received this message because you are subscribed to the Google Groups "US Government APIs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to us-government-a...@googlegroups.com.
To post to this group, send email to us-govern...@googlegroups.com.
Visit this group at http://groups.google.com/group/us-government-apis.
- How often to update? New data is continuously added to the data set. Currently ~5-million 'documents'; ~3k/week, 12k/month being added.
- Offer one large set, or break it down by some increment?
- I'm preferring JSON because of it's self describing nature. Also we will have fields with potentially very long text, so I'm thinking CSV isn't really a good fit.
- There is a lot of binary content - PDF, Word, etc. We have this text indexed in a full-text search engine, so we're considering including the contents as text - as our users are primarily looking to get at this text more so than the metadata.
- Should we instead export the metadata with API links to the binary downloads?
- Should we also offer the binary files as bulk zipped downloads - say in a tree structure?