Public Archive List and Web Archive Manifest (WAM)

37 views
Skip to first unread message

Ilya Kreymer

unread,
May 23, 2017, 5:07:08 PM5/23/17
to memen...@googlegroups.com
Hi,

I wanted to announce a new project that we've been working on, a public listing of available web archives in a machine-readable and human-readable format.

The listing is available at: https://github.com/webrecorder/public-web-archives and the format that describes it, which we've decided to call Web Archive Manifest (WAM) is also described here:


The actual listing is provided, one archive per file, at:

There is some overlap with the memento archivelist.xml, but the intent is of course a bit different.

The idea is to list all known web archives, and then add info about apis that they support.

The three apis we've included currently are memento, cdx (CDX server API), and wayback (for de-facto wayback machine-style access).

For the wayback api, it includes urls for different replay types, "rewritten" and "raw" can both be specified, and more can be added as needed.

It is also possible to define web archives that have multiple collections, and list them (if there is a finite amount).

There is currently a single yaml file per archive, although there is no limit for that.

A possible future goal is for web archives to serve their own WAM file, advertising what apis they support.

The format has gone through some iteration on our end, and we are happy to release it for broader feedback.

Let us know if you have any feedback, especially via issues on github.

Thanks,
Ilya
Reply all
Reply to author
Forward
0 new messages