Hello PCDM Community,
I’m writing to let you all know that the Webrecorder team is interested in creating a PCDM model and profile for web archive data. (Webrecorder, hosted at
https://webrecorder.io/, is a free and open-source tool for creating high-fidelity web archives of any web page.) Our goal is to facilitate the preservation of web archive objects in Fedora, and longer term, potentially expose web archive data to other Fedora-based repositories.
I’ve chatted with Nick Ruest a bit about this project, and we’ve presented this idea on the Fedora tech call, but wanted to share this goal with the official PCDM list and to gauge interest in this work from the community. We’ve already experimented with writing WARC files to Fedora and providing replay access directly from Fedora-stored WARCs with promising results, but there is not yet a linked data model for the entirety of web archive data and metadata.
The intent will be to describe all objects associated with web archives, such as files (WARCs, indexes), crawls/recordings, and collections of crawls/recordings, as well as entry point URLs, for example seeds or bookmarks. While our initial use case will be with Webrecorder, we would like the data model to apply to any web archiving use case. It may also be possible to describe individual urls stored in web archives, and to enable them to be linked from other objects.
The first phase of this effort will be to fully include all web archive objects via PCDM which Webrecorder will manage on its own. As a later phase, we also hope that this effort can lead to an eventual way to share web archives stored in Fedora with other repositories, for example, by providing UI/discovery level links from Islandora or Samvera-based repositories to specific web archive bookmarks.
I wanted to ask if anyone here would be interested in being more involved with this process.
We’d like to officially start on this project early next year, but wanted to reach out to the group now in case anyone has any questions, concerns or suggestions. We’d be happy to help organize a call as well to discuss this more if there is interest. Thanks for your consideration, and we look forward to collaborating with the PCDM community!
Thank you,
Ilya
Webecorder Lead Developer, Rhizome