Hi everybody,
The Technical Working Group (
https://www.imls.gov/sites/default/files/proposal_narritive_lg-71-15-0174_internet_archive.pdf?page=9) will be meeting for the first face-to-face meeting of the grant at the end of March. With a month to go (and consulting the handy project timeline:
https://www.imls.gov/sites/default/files/proposal_narritive_lg-71-15-0174_internet_archive.pdf?page=13), now seemed like a good time to revisit the motivating use cases and possible features for the proposed export API(s).
We articulated two high-level use cases in the proposal:
- Facilitating distributed local preservation: at the most basic level, this means making it easier to replicate W/ARC data from one repository to another. It may also logically include associated metadata to enable integration with the collection management layer.
- Standardizing research data delivery: enabling researchers to request and receive W/ARC and/or derivative web archive data over the network.
Some questions I have thinking about translating these use cases to the grant work package and/or candidate features for the future roadmap:
- What other high level use cases, other than those above, or what practical examples do folks have of what the export API(s) will enable you to do? Two examples we're interested in are streamlining replication of data from Archive-It to our local Hydra repository and packaging up the contents of the LOCKSS-USDOCS network into WARCs to upload into the Internet Archive Wayback Machine.
- What web archive data formats could/should be served by an export API, beyond W/ARC?
- It seems like a query and extraction API - that would allow a user to request repackaged or derivative W/ARC data such that required processing by the source repository - would be the next most logical complement to an export API. Do folks agree and, if so, do we think that is in the scope of work for the grant? On a related note, the OpenWayback team has been soliciting feedback for CDX Server APIs (https://github.com/iipc/openwayback/wiki/CDX-Server-requirements). These could be a good framework for a query and extraction API and, in fact, the ArchiveSpark (https://github.com/helgeho/ArchiveSpark) developers have been using it in just that way.
Other questions or comments? Any input you could provide would be greatly appreciated, and will help shape the first (among hopefully many) web archiving APIs we work on together.
Thanks!
~Nicholas