Problem with memento configuration

77 views
Skip to first unread message

Erik Hetzner

unread,
Apr 24, 2012, 7:34:18 PM4/24/12
to archive-access-discuss, memento-dev, Abhishek Salve
Hi,

We are experimenting with the Memento configuration for Open Source
Wayback, but are experiencing some difficulties.

We have built and installed the latest wayback from svn (r3625). We
have made minimal changes to the config, changing only hostnames and
ports, and enabling a CDX collection. (See cleaned results of diff -r
below). Wayback is working fine. We can visit, e.g.,
http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov

Unfortunately, memento keeps redirecting us to the current date, and
then to the closest version to that (in the below case, 3 Feb).

As you can see by the Link headers in the last response, a closer
version to the requested Accept-Datetime does exist.

Does anybody have an idea of what is happening here? Let me know if
there is any other information that would help with this.

Thank you!

best, Erik

Here is an example of a session, from Firebug, using Mementofox:

GET /memento/http://www.abc.ca.gov/ HTTP/1.1
Host: XXX.cdlib.org:8090
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT

HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/
Content-Length: 0
Date: Tue, 24 Apr 2012 23:19:06 GMT

GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1
Host: XXX.cdlib.org:8090
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT

HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/
Content-Length: 0
Date: Tue, 24 Apr 2012 23:19:06 GMT

GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1
Host: XXX.cdlib.org:8090
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT
Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate"
X-Archive-Guessed-Charset: cp1252
X-Archive-Orig-Connection: close
X-Archive-Orig-Content-Length: 30168
X-Archive-Orig-Content-Type: text/html
X-Archive-Orig-X-Powered-By: ASP.NET
X-Archive-Orig-Server: Microsoft-IIS/6.0
X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT
Content-Type: text/html;charset=cp1252
Content-Length: 30835
Date: Tue, 24 Apr 2012 23:19:06 GMT

diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml
32,37c32,33
< <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore">
< <property name="db">
< <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB">
< <property name="path" value="${wayback.basedir}/path-index.txt" />
< </bean>
< </property>
---
> <bean class="org.archive.wayback.resourcestore.SimpleResourceStore">
> <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/>
50c46
< <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" />
---
> <property name="path" value="/was/wayback.public.index/everything.cdx"/>
diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml
18c18
< wayback.urlprefix=http://localhost.archive.org:8080/wayback/
---
> wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/
72d71
< <import resource="BDBCollection.xml"/>
73a73,74
> <import resource="BDBCollection.xml"/>
> -->
74a76
> <!--
148c150
< <property name="matchPort" value="8080" />
---
> <property name="matchPort" value="8090" />
152c154
< <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint">
---
> <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint">
176d177
< <property name="collection" ref="localbdbcollection" />
178c179
< <property name="collection" ref="localcdxcollection" />
---
> <property name="collection" ref="localbdbcollection" />
179a181
> <property name="collection" ref="localcdxcollection" />
227d228
< <!--
229,231c230,232
< <bean name="8080:memento" parent="8080:wayback">
< <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" />
< <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" />
---
> <bean name="8090:memento" parent="8090:wayback">
> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" />
> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" />
234c235
< <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop>
---
> <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop>
247c248
< <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"/>
---
> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"/>
264,267c265,268
< <bean name="8080:list" parent="8080:memento">
< <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" />
< <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" />
< <property name="staticPrefix" value="http://localhost.archive.org:8080/list/" />
---
> <bean name="8090:list" parent="8090:memento">
> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" />
> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" />
> <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/" />
270c271
< <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop>
---
> <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop>
283c284
< <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"/>
---
> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"/>
287d287
< -->

Herbert Van de Sompel

unread,
Apr 24, 2012, 10:20:40 PM4/24/12
to memen...@googlegroups.com, erik.h...@ucop.edu, archive-access-discuss, memento-dev, Abhishek Salve
Erik,

I will discuss in detail with my team, tomorrow. A few observations:

- I am not sure what the status of Memento compliance is of the latest Wayback. I know work has been going on at IA to revise the approach towards Memento compliance (closer integration between regular Wayback operation and Memento operation, e.g. Mementos with the same URI for both access points). But, again, I am not sure what the status of the work is. Next week, several of us will be at IIPC and that would be a good occasion to talk into some detail about the status.

- In your below, I assume (I am it sure) /memento/http://www.abc.ca.gov/ is a TimeGate for http://www.abc.ca.gov/. A TimeGate that is approached without an Accept-Datetime value is supposed to redirect to the most recent Memento. Now, I see that the request has an Accept-Datetime. If this is indeed a TimeGate, as suggested by the redirect to the most recent Memento, it seems that the TimeGate does't "see" the Accept-Datetime. Then again, if this is a TimeGate, it doesn't behave like one with other regards: there are is no HTTP Link header, and a TimeGate should provide all kinds of HTTP Links.

- I have no immediate explanation for the second redirect, except that - maybe - that could just be an internal Wayback redirect that doesn't have to do with Memento. But, I'm guessing.

- Another clear indication that something is really wrong can be seen in the response of the final Memento: the Link header contains a lot of want-to-be HTTP links but not a single URI. Not a lot of linking gong on there ;-)

Again, I will discuss in detail with my team, tomorrow. And hopefully get back to you with a better interpretation. And, next week at IIPC, we can discuss with colleagues from IA. Will you be there?

Thanks a lot for your continued interest in Memento!

Cheers

Herbert

Sent from my iPad

> Sent from my free software system <http://fsf.org/>.

Ahmed AlSum

unread,
Apr 25, 2012, 1:40:01 AM4/25/12
to memen...@googlegroups.com, Herbert Van de Sompel, erik.h...@ucop.edu, archive-access-discuss, Abhishek Salve
Hello Eric,

Additional to the details explanation from Herbert, I would like to add two things based on my experience in helping in fixing a memento problem with global wayback machine.

Coding Inconsistency: two months ago, the wayback memento code has inconsistency, some classes were missing so it has a strange behavior (not similar to yours). IA has fixed it and it should be working now. Also, IA has moved wayback open source to GitHub (
https://github.com/internetarchive/wayback ). Could you send me the link of the repository that you grab the code from? I'm sorry I couldn't find r3625 version.

Configuration: If the code is up-to-date, then we may have a configuration problem. I believe your attached configuration is correct. I have the same configuration on my machine and it works fine.

So if you can confirm the code version and I will try to make sure of the consistency of this version.

Thanks for your interest in memento.

Best regards,
Ahmed AlSum

Internet Archive, Software Engineer Intern
Old Dominion University, PhD Student

Balakireva, Lyudmila L

unread,
Apr 25, 2012, 11:43:25 AM4/25/12
to memen...@googlegroups.com, Herbert Van de Sompel, erik.h...@ucop.edu, archive-access-discuss, Abhishek Salve
I downloaded  version 1.6  of wayback . 
I  have to  correct  that timegate is expected at http://[myarchive]/memento/timegate/[url]

[ludab@megalodon ~]$ more my.txt
HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=8C88E2D937C605D65390B0347E183D3D; Path=/
Vary: negotiate,accept-datetime
0409233226/http://dans.knaw.nl/>;rel="last memento"; datetime="Mon, 09 Apr 2012 23:32:26 GMT", <http://lanlproto.santafe.edu:8080/memento/20120328164435/http://dans.knaw.nl/>;rel="fir
st memento"; datetime="Wed, 28 Mar 2012 16:44:35 GMT", <http://lanlproto.santafe.edu:8080/memento/20120409233006/http://dans.knaw.nl/>;rel="prev memento"; datetime="Mon, 09 Apr 2012 2
3:30:06 GMT" , <http://lanlproto.santafe.edu:8080/list/timemap/link/http://dans.knaw.nl/>;rel="timemap"; type="application/link-format"
Content-Type: text/html
Content-Length: 0
Date: Wed, 25 Apr 2012 15:05:10 GMT

 curl -D my.txt  -H Accept-Datetime:"Wed, 28 Mar 2012 16:44:36 GMT"  http://lanlproto.santafe.edu:8080/memento/timegate/http://dans.knaw.nl/
[ludab@megalodon ~]$ more my.txt
HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=767A080EF3BB8B6E384541324C504BDA; Path=/
Vary: negotiate,accept-datetime
0328164435/http://dans.knaw.nl/>;rel="first memento"; datetime="Wed, 28 Mar 2012 16:44:35 GMT", <http://lanlproto.santafe.edu:8080/memento/20120409233226/http://dans.knaw.nl/>;rel="la
st memento"; datetime="Mon, 09 Apr 2012 23:32:26 GMT", <http://lanlproto.santafe.edu:8080/memento/20120328165056/http://dans.knaw.nl/>;rel="next "; datetime="Wed, 28 Mar 2012 16:50:56
 GMT" , <http://lanlproto.santafe.edu:8080/list/timemap/link/http://dans.knaw.nl/>;rel="timemap"; type="application/link-format"
Content-Type: text/html
Content-Length: 0
Date: Wed, 25 Apr 2012 15:14:34 GMT



From: memen...@googlegroups.com [memen...@googlegroups.com] on behalf of Ahmed AlSum [aal...@cs.odu.edu]
Sent: Tuesday, April 24, 2012 11:40 PM
To: memen...@googlegroups.com
Cc: Herbert Van de Sompel; erik.h...@ucop.edu; archive-access-discuss; Abhishek Salve
Subject: Re: Problem with memento configuration

Erik Hetzner

unread,
Apr 25, 2012, 2:35:23 PM4/25/12
to memen...@googlegroups.com, Herbert Van de Sompel, archive-access-discuss, Abhishek Salve, Balakireva, Lyudmila L
At Wed, 25 Apr 2012 15:43:25 +0000,
Balakireva, Lyudmila L wrote:
>
> I downloaded version 1.6 of wayback .
> I have to correct that timegate is expected at http://[myarchive]/memento/timegate/[url]

Hi Lyudmila,

Thanks so much! This seems to be exactly the issue I was having. I
didn’t realize the timegate was located under /memento/timegate. Now
when I try to request it works for me (see below).

I am having some problems with the Firefox plugin, but it looks like
we’ve got Memento set up right now.

Thanks again!

best, Erik

GET /memento/timegate/http://www.abc.ca.gov/ HTTP/1.1
User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3
Host: XXX.cdlib.org:8090
Accept: */*
Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT

HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Vary: negotiate,accept-datetime
Link: <http://XXX.cdlib.org:8090/list/timebundle/http://www.abc.ca.gov/>;rel="timebundle", <http://www.abc.ca.gov/>;rel="original", <http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/>;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", <http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/>;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", <http://XXX.cdlib.org:8090/memento/20101208015408/http://www.abc.ca.gov/>;rel="next "; datetime="Wed, 08 Dec 2010 01:54:08 GMT" , <http://XXX.cdlib.org:8090/list/timemap/link/http://www.abc.ca.gov/>;rel="timemap"; type="application/link-format"
Location: http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/
Content-Type: text/html
Content-Length: 0
Date: Wed, 25 Apr 2012 18:28:23 GMT

Erik Hetzner

unread,
Apr 25, 2012, 2:42:14 PM4/25/12
to memen...@googlegroups.com, Herbert Van de Sompel, archive-access-discuss, Abhishek Salve, Ahmed AlSum
At Tue, 24 Apr 2012 22:40:01 -0700,
Ahmed AlSum wrote:
>
> Hello Eric,
>
> Additional to the details explanation from Herbert, I would like to add
> two things based on my experience in helping in fixing a memento problem
> with global wayback machine.
>
> Coding Inconsistency: two months ago, the wayback memento code has
> inconsistency, some classes were missing so it has a strange behavior
> (not similar to yours). IA has fixed it and it should be working now.
> Also, IA has moved wayback open source to GitHub
> (https://github.com/internetarchive/wayback ). Could you send me the
> link of the repository that you grab the code from? I'm sorry I couldn't
> find r3625 version.
>
> Configuration: If the code is up-to-date, then we may have a
> configuration problem. I believe your attached configuration is correct.
> I have the same configuration on my machine and it works fine.
>
> So if you can confirm the code version and I will try to make sure of
> the consistency of this version.

Hi Ahmed,

Thanks for the pointer. I hadn’t realized that Wayback development had
moved. I will be checking out the new code. I was using the svn repo
linked on the archive access page.

It looks like the issue was me using the wrong timegate URL. Whoops!
Everything seems to be working as expected now.

Thanks for your help! I hope to be able to report a working Memento
installation at CDL soon.

best, Erik

Herbert van de Sompel

unread,
Apr 25, 2012, 4:29:42 PM4/25/12
to memen...@googlegroups.com, Erik Hetzner, archive-access-discuss, Abhishek Salve, Balakireva, Lyudmila L
On Wed, Apr 25, 2012 at 12:35 PM, Erik Hetzner <erik.h...@ucop.edu> wrote:
> I am having some problems with the Firefox plugin, but it looks like
> we’ve got Memento set up right now.
>

Any info on this would be very welcome too.

Thanks

Herbert

> Thanks again!
>
> best, Erik
>
>  GET /memento/timegate/http://www.abc.ca.gov/ HTTP/1.1
>  User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3
>  Host: XXX.cdlib.org:8090
>  Accept: */*
>  Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT
>
>  HTTP/1.1 302 Moved Temporarily
>  Server: Apache-Coyote/1.1
>  Vary: negotiate,accept-datetime
>  Link: <http://XXX.cdlib.org:8090/list/timebundle/http://www.abc.ca.gov/>;rel="timebundle", <http://www.abc.ca.gov/>;rel="original", <http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/>;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", <http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/>;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", <http://XXX.cdlib.org:8090/memento/20101208015408/http://www.abc.ca.gov/>;rel="next "; datetime="Wed, 08 Dec 2010 01:54:08 GMT" , <http://XXX.cdlib.org:8090/list/timemap/link/http://www.abc.ca.gov/>;rel="timemap"; type="application/link-format"
>  Location: http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/
>  Content-Type: text/html
>  Content-Length: 0
>  Date: Wed, 25 Apr 2012 18:28:23 GMT
>
> Sent from my free software system <http://fsf.org/>.
>



--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/

==
Reply all
Reply to author
Forward
0 new messages