Stanford WASAPI downloader sprint 3 concluded

6 views
Skip to first unread message

Nicholas Taylor

unread,
May 22, 2017, 2:33:33 PM5/22/17
to WASAPI-Community
Hi everybody,

The team working on web archive crawl data download automation has concluded sprint 3 of its work cycle. Though substantial work was completed, the nature of the work didn’t lend itself to a demo – stay tuned on that front for sprint 4.

In this sprint, we advanced the utilities’ ability to download and parse manifests from the WASAPI data transfer API, perform checksum validation, and retrieve actual WARC data.

Milestones completed this sprint:
•    Retrieve JSON from WASAPI endpoint
•    Make an example request and get responses from the WASAPI endpoint
•    Parse response JSON into an object
•    Given list of results from WASAPI endpoint, select and organize which files we want by crawl, then by file
•    Be able to validate using one type of checksum
•    Parse checksum methods and values out of response JSON
•    Download a WARC file specified in a JSON response

Look forward to more exciting features in the next installment of the web archive crawl data download automation work cycle!

~Nicholas
Reply all
Reply to author
Forward
0 new messages