Resend: Memento Support

101 views
Skip to first unread message

ghost archive

unread,
Sep 11, 2021, 12:47:34 PM9/11/21
to memen...@googlegroups.com
Resending as my previous email went to spam for some reason.
----
Hi all,

We now are in the process of memento support. We have quite a bit of archived pages so we wanted to make this available.   We have time map support already (mostly).

Example of time map support here: ghostarchive dot org/search?term=https://www.youtube.com/watch?v=7GhO2lncvBg&memento=timemap 

Timegate support coming soon.

One question I do have is of the question of "transformations'' - some of our pages are transformed from its original presentation, whether it's through inlining  or through page extraction.

For example, videos are transformed into a easier to use/parse page like this one: ghostarchive dot org/varchive/7GhO2lncvBg 

While the presentation is different, the content is the same. I wonder how that fits into memento web protocol, even if it fits in at all.


ghost archive

unread,
Sep 11, 2021, 12:50:44 PM9/11/21
to memen...@googlegroups.com
test

Nelson, Michael L.

unread,
Nov 7, 2021, 7:52:20 PM11/7/21
to memen...@googlegroups.com, ghosta...@ghostarchive.org
Hi ghostarchive,

Sorry for the delay in answering; things are still kind of crazy.

Thanks for working on Memento support.

Looking at:

$ curl -iLk "https://ghostarchive.org/search?term=https://www.youtube.com/watch?v=7GhO2lncvBg&memento=timemap"
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Mon, 08 Nov 2021 00:16:00 GMT
Content-Type: application/link-format; charset=utf-8
Content-Length: 284
Connection: keep-alive
Cache-control: no-store
Access-Control-Allow-Origin: *
ETag: W/"11c-98nCDxHjEnpuRDWoG2phWCGfAuE"
X-Cache-Status: MISS

<https://www.youtube.com/watch?v=7GhO2lncvBg>; rel="original",
<https://ghostarchive.org/search?term=https://www.youtube.com/watch?v=7GhO2lncvBg&memento=timemap>; rel="timegate",
<https://ghostarchive.org/varchive/7GhO2lncvBg>; rel="memento"; datetime="Mon, 06 Sep 2021 19:44:44 GMT",


rel="timegate" should be: rel="timemap"

also, the TimeMap response should send back 404s (and not soft 404s) if they don't have the page archived:

$ curl -iLk "https://ghostarchive.org/search?term=www.odu.edu&memento=timemap"
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Mon, 08 Nov 2021 00:10:58 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 1360
Connection: keep-alive
Cache-control: no-store
Access-Control-Allow-Origin: *
ETag: W/"550-uTefGlcAx69i2EuDb1U1G8W8JQo"
X-Cache-Status: MISS

<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta http-equiv="Content-type" content="text/html;charset=UTF-8"><link rel="stylesheet" href="/ghostarchive.css" type="text/css"><link rel="icon" href="/favicon.ico" type="image/x-icon"><link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"><link rel="manifest" href="/site.webmanifest"><title>Ghostarchive search</title></head><body bgcolor="#FFFFFF"><table id="mid" width="100%" cellspacing="0" cellpadding="0"><tr><td id="left" valign="top"><ul><li><a href="/"><img src="/gafall.png" width="100%" height="100%"></a></li><br><br></ul><ul><li><a href="/index.html">Home</a></li><br><ul></ul><li><a href="/about.html">About Ghostarchive</a></li><br><ul></ul><li><a href="https://forms.gle/jKF1pmkCmcdTfNv19">Archived webpage broken? Questions? Send (anonymous) feedback/contact </a></li></ul></td><td id="body" valign="top"> <h2><meta charset="utf-8">Archives for <meta charset="utf-8">www.odu.edu
</h2><p>Page 0 out of 0</p> <br><h3>No archives for that site.</h3><form action="/archive" method="POST"><input id="archive" name="archive" required="" type="hidden" value="www.odu.edu"><input value="Archive it now?" type="submit"></form></td></tr></table></body></html>

I just archived that page, but I guess the index had not been updated yet:

https://ghostarchive.org/archive/mXi9j?kreymer=true

Arguably more important than TimeMap or TimeGate support is getting the mementos to self-identify:

$ curl -ILk https://ghostarchive.org/varchive/7GhO2lncvBg
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Mon, 08 Nov 2021 00:24:32 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 5264
Connection: keep-alive
Cache-control: no-store
Access-Control-Allow-Origin: *
Link: <https://ghostarchive.org/oembed?url=https://ghostarchive.org/varchive/7GhO2lncvBg&format=json>; rel="alternate"; type="application/json+oembed"; title="Video embed info"
Link: <https://ghostarchive.org/oembed?url=https://ghostarchive.org/varchive/7GhO2lncvBg&format=xml>; rel="alternate"; type="application/xml+oembed"; title="Video embed info"
ETag: W/"1490-sw9y461Mr140bHyn9whN4uqV15Y"
X-Cache-Status: MISS

should have a Memento-Datetime header, like:

Memento-Datetime: Thu, 28 Oct 2021 06:47:59 GMT

and it should have a Link header to restate the URI-R:

Link: <https://www.youtube.com/watch?v=7GhO2lncvBg>; rel="original"

That way tools will know it's a memento when they encounter it. See:

https://github.com/oduwsdl/Memento-aware-Browser
& the first presentation at:
https://github.com/oduwsdl/2021-research-expo/

Looking at:

https://ghostarchive.org/archive/mXi9j?kreymer=true

it looks like you're using https://replayweb.page; it *should* have Memento support, but I don't know enough about it to say. If it can't set HTTP response headers, at the very least it should be able to set HTML meta http-equiv="..."

As for mixing images (screenshots) with HTML, the Memento protocol doesn't say anything about it. My personal recommendation is to just mix images & html at this point; you can always split them out later. There are some proposals to specify the difference (mime type is probably not be sufficient), but nothing has been definitively chosen.

Please let me know any questions you have -- I promise I won't wait so long next time ;-)

regards,

Michael


----
Michael L. Nelson m...@cs.odu.edu https://twitter.com/phonedude_mln
Web Sciences and Digital Libraries Research Group https://twitter.com/WebSciDL
Department of Computer Science, Old Dominion University, Norfolk VA 23529
Virginia Modeling, Analysis, and Simulation Center, 1030 University Blvd, Suffolk, VA 23435
+1 757 683-6393 +1 757 683-4900 (f) +1 757 570-7376 (c)


________________________________________
From: memen...@googlegroups.com <memen...@googlegroups.com> on behalf of ghost archive <ghostar...@gmail.com>
Sent: Saturday, September 11, 2021 12:50 PM
To: memen...@googlegroups.com
Subject: Re: Resend: Memento Support

EXTERNAL to ODU: This email is not from an ODU account. Do not click links or open attachments unless you recognize the sender and know the content is safe.

test

Timegate support coming soon.


--

---
You received this message because you are subscribed to the Google Groups "Memento Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memento-dev...@googlegroups.com<mailto:memento-dev...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/memento-dev/CADh8q0cCA65aAA-M2%3D7Ak5o%3DpqtL2gXa8eHx8zKa8%3D_VU%2B6GWg%40mail.gmail.com<https://groups.google.com/d/msgid/memento-dev/CADh8q0cCA65aAA-M2%3D7Ak5o%3DpqtL2gXa8eHx8zKa8%3D_VU%2B6GWg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages