Accessing archive.org

567 views
Skip to first unread message

Alan Poulter

unread,
Jan 10, 2009, 5:19:43 PM1/10/09
to

Is anyone else having problems accessing archive.org? If you are please
can you email me rather than reply here.

Alan Poulter
al...@poulter.demon.co.uk

Peter Ceresole

unread,
Jan 10, 2009, 5:31:59 PM1/10/09
to
Alan Poulter <l...@poulter.demon.co.uk> wrote:

No problems.

Mailed as well, although I shouldn't...
--
Peter

James Coupe

unread,
Jan 11, 2009, 1:05:01 AM1/11/09
to
Peter Ceresole <pe...@cara.demon.co.uk> wrote:
>Alan Poulter <l...@poulter.demon.co.uk> wrote:
>
>> Is anyone else having problems accessing archive.org? If you are please
>> can you email me rather than reply here.
>
>No problems.
>
>Mailed as well, although I shouldn't...

What links do you get when you go to
http://web.archive.org/web/*/http://www.demon.net ?

I get links to, for example,
http://iwfwebfilter.thus.net/web/20080113145506/http://www.demon.net/index.html
rather than the proper link to
http://web.archive.org/web/20080113145506/http://www.demon.net/index.html
which itself is full of broken iwfwebfilter.thus.net URLs.

--
James Coupe
PGP Key: 0x5D623D5D YOU ARE IN ERROR.
EBD690ECD7A1FB457CA2 NO-ONE IS SCREAMING.
13D7E668C3695D623D5D THANK YOU FOR YOUR COOPERATION.

Peter Ceresole

unread,
Jan 11, 2009, 3:57:37 AM1/11/09
to
James Coupe <ja...@zephyr.org.uk> wrote:

> What links do you get when you go to
> http://web.archive.org/web/*/http://www.demon.net ?

I don't really know. I get Waybackmachine, with 864 results; urls.

But I've never tried this before and don't know what they're for.

Last link is Jan 13 2008. 'The requested URL
/web/20080113145506/http://www.demon.net/index.html was not found on
this server.'
--
Peter

James Coupe

unread,
Jan 11, 2009, 5:01:43 AM1/11/09
to
Peter Ceresole <pe...@cara.demon.co.uk> wrote:
>James Coupe <ja...@zephyr.org.uk> wrote:
>
>> What links do you get when you go to
>> http://web.archive.org/web/*/http://www.demon.net ?
>
>I don't really know. I get Waybackmachine, with 864 results; urls.
>
>But I've never tried this before and don't know what they're for.

They're links to automatically archived copies of websites, recording
changes over time.

>Last link is Jan 13 2008. 'The requested URL
>/web/20080113145506/http://www.demon.net/index.html was not found on
>this server.'

Look at the URL in your address bar:

http://iwfwebfilter.thus.net/web/20080113145506/http://www.demon.net/index.html
^^^^^^^^^^^^^^^^^^^^^

i.e. THUS is intercepting access to archive.org for some reason,
presumably because one website ever has had an IWF warning at some point
in history, or something.

Without THUS's fiddling, you would go to:

http://web.archive.org/web/20080113145506/http://www.demon.net/index.html

Which works.

Andrew Holme

unread,
Jan 11, 2009, 5:54:46 AM1/11/09
to

"James Coupe" <ja...@zephyr.org.uk> wrote in message
news:ikfcRiEN...@gratiano.zephyr.org.uk...

> Peter Ceresole <pe...@cara.demon.co.uk> wrote:
>>Alan Poulter <l...@poulter.demon.co.uk> wrote:
>>
>>> Is anyone else having problems accessing archive.org? If you are please
>>> can you email me rather than reply here.
>>
>>No problems.
>>
>>Mailed as well, although I shouldn't...
>
> What links do you get when you go to
> http://web.archive.org/web/*/http://www.demon.net ?
>
> I get links to, for example,
> http://iwfwebfilter.thus.net/web/20080113145506/http://www.demon.net/index.html
> rather than the proper link to
> http://web.archive.org/web/20080113145506/http://www.demon.net/index.html
> which itself is full of broken iwfwebfilter.thus.net URLs.

I'm seeing the same thing.

It looks like Demon are messing with our HTTP connections, replacing the
text "web.archive.org" with "iwfwebfilter.thus.net"


Andrew Holme

unread,
Jan 11, 2009, 6:05:45 AM1/11/09
to

"Andrew Holme" <a...@nospam.co.uk> wrote in message
news:Cdkal.39001$t64....@newsfe17.ams2...

Googling for iwfwebfilter.thus.net is interesting.

Paul Terry

unread,
Jan 11, 2009, 6:11:26 AM1/11/09
to
In message <4D9q4yKH...@gratiano.zephyr.org.uk>, James Coupe
<ja...@zephyr.org.uk> writes

>i.e. THUS is intercepting access to archive.org for some reason,

Oh joy ... yet more "unintended collateral damage" from the IWF.

Didn't they actually learn ANYTHING from their Wikipedia disaster just
before Christmas?

--
Paul Terry

Paul Terry

unread,
Jan 11, 2009, 6:15:08 AM1/11/09
to
In message <Cdkal.39001$t64....@newsfe17.ams2>, Andrew Holme
<a...@nospam.co.uk> writes

>It looks like Demon are messing with our HTTP connections, replacing the
>text "web.archive.org" with "iwfwebfilter.thus.net"

http://www.demon.net/helpdesk/technicallibrary/faq/iwf_web_filter/

--
Paul Terry

Andy

unread,
Jan 11, 2009, 9:04:55 AM1/11/09
to
In message <r7zbO7B8SdaJFAp$@musonix.demon.co.uk>, Paul Terry
<ne...@nospam.demon.co.uk> wrote
I tried that, and as soon as the site was displayed my PC crashed,
followed by auto-reboot.

That's clever - how did they do that?
--
Andy Taylor [Editor, Austrian Philatelic Society].
Visit <URL:http://www.austrianphilately.com>

Alan Poulter

unread,
Jan 13, 2009, 2:59:58 AM1/13/09
to

Paul Terry <ne...@nospam.demon.co.uk> wrote in
news:wbLdGhBe...@musonix.demon.co.uk:

The irony is that www.iwf.org.uk and www.thus.net, those
known havens of child porn, are both blocked via Archive.org ;-)
I have emailed TheRegister the story but no interest so far :-(

Alan

Wm...

unread,
Jan 13, 2009, 3:46:14 AM1/13/09
to
Tue, 13 Jan 2009 01:59:58
<Xns9B925146386Bal...@216.196.109.144> demon.service Alan
Poulter <al...@poulter.demon.co.uk>

I get the irony, what I am not sure about is why archive.org is
important. It could be I misunderstand the importance of your message,
I remain puzzled as to why archive.org should be the measure.

The IWF have done a number of embarrassing things that they probably
don't want everyone to know about. thus, on the other hand is a company
that should expect itself to be examined by investors, etc.

Are you joining thus and the IWF together in a significant way?

My understanding is that they are separate.

--
Wm...
Reply-To: address valid for at least 7 days

Alan Poulter

unread,
Jan 13, 2009, 4:51:35 AM1/13/09
to

"Wm..." <tcn...@blackhole.do-not-spam.me.uk> wrote in
news:veg9u2FWTFbJFwzv@[127.0.0.1]:

> Tue, 13 Jan 2009 01:59:58
> <Xns9B925146386Bal...@216.196.109.144> demon.service Alan
> Poulter <al...@poulter.demon.co.uk>
>
>>
>>Paul Terry <ne...@nospam.demon.co.uk> wrote in
>>news:wbLdGhBe...@musonix.demon.co.uk:
>>
>>> In message <4D9q4yKH...@gratiano.zephyr.org.uk>, James Coupe
>>> <ja...@zephyr.org.uk> writes
>>>
>>>>i.e. THUS is intercepting access to archive.org for some reason,
>>>
>>> Oh joy ... yet more "unintended collateral damage" from the IWF.
>>>
>>> Didn't they actually learn ANYTHING from their Wikipedia disaster just
>>> before Christmas?
>>
>>The irony is that www.iwf.org.uk and www.thus.net, those
>>known havens of child porn, are both blocked via Archive.org ;-)
>>I have emailed TheRegister the story but no interest so far :-(
>
> I get the irony, what I am not sure about is why archive.org is
> important. It could be I misunderstand the importance of your message,
> I remain puzzled as to why archive.org should be the measure.

Archive.org I find to be a very useful site and I am not alone. According
to alexa.com it is ranked 408th in the world for traffic. It spiders sites
according to set policies in order to preserve them and it will remove
sites if notified. Therefore it is extremely unlikely to be a child porn
haven or have anything that the IWF should worry about.


> The IWF have done a number of embarrassing things that they probably
> don't want everyone to know about. thus, on the other hand is a company
> that should expect itself to be examined by investors, etc.
>
> Are you joining thus and the IWF together in a significant way?
>
> My understanding is that they are separate.

That is my understanding as well. Since I know other UK ISPs have
not blocked Archive.org (I can access it via my mobile ISP and from work)
then I can only assume someone at Thus has been stupid enough to use Thus's
IWF filter to block it. It makes you wonder how much coordination there is
between the IWF and UK ISPs. Are other ISPs operating rogue blocks under
IWF auspices? Are some ISPs not blocking sites they should?

Alan

Wm...

unread,
Jan 13, 2009, 6:35:48 AM1/13/09
to
Tue, 13 Jan 2009 03:51:35
<Xns9B926432D6C66al...@216.196.109.144> demon.service
Alan Poulter <al...@poulter.demon.co.uk>

>That is my understanding as well. Since I know other UK ISPs have
>not blocked Archive.org (I can access it via my mobile ISP and from work)
>then I can only assume someone at Thus has been stupid enough to use Thus's
>IWF filter to block it.

That is a leap of imagination I cannot make. demon have, in the past,
been over zealous about IWF matters but I don't think they really don't
want people to see what they had to say. Could it be archive.org is in
error?

> It makes you wonder how much coordination there is
>between the IWF and UK ISPs. Are other ISPs operating rogue blocks under
>IWF auspices? Are some ISPs not blocking sites they should?

Hmmmn. Have you considered paranoia? I am not suggesting you are
paranoid so much as suggesting archive.org may have a mental block.

Jack Campin - bogus address

unread,
Jan 13, 2009, 12:17:18 PM1/13/09
to
> i.e. THUS is intercepting access to archive.org for some reason,

I can access the archive.org homepage. What's being blocked?

==== j a c k at c a m p i n . m e . u k === <http://www.campin.me.uk> ====
Jack Campin, 11 Third St, Newtongrange EH22 4PU, Scotland == mob 07800 739 557
CD-ROMs and free stuff: Scottish music, food intolerance, and Mac logic fonts

Paul Terry

unread,
Jan 13, 2009, 1:07:10 PM1/13/09
to
In message <bogus-1B953E....@news.albasani.net>, Jack Campin
- bogus address <bo...@purr.demon.co.uk> writes

>I can access the archive.org homepage. What's being blocked?

In the slot at the top of the page (labelled Wayback Machine) enter:

http://www.demon.net and click "take me back"

You will see a list of the archived front pages of Demon's website.
Click on any of the links, and you will see that they have been rendered
useless because Demon's iwfwebfilter has wrecked the URL by adding the
following:

http://iwfwebfilter.thus.net/web/ ..........

Repeat for any URL you like (e.g. bbc.co.uk)

The site has been rendered unusable to Demon customers for weeks, but
nobody seems to be bothered to fix it.

Personally, I don't think it's Web Archive's fault that Demon haven't
got their IWF filters properly sorted.
--
Paul Terry

Paul Terry

unread,
Jan 13, 2009, 1:20:32 PM1/13/09
to
In message <$8gZmyGUyHbJFwzr@[127.0.0.1]>, Wm...
<tcn...@blackhole.do-not-spam.me.uk> writes

>Could it be archive.org is in error?

I'm not sure how. Demon's IWF web filter is rewriting every link on
their site - I don't think Demon intend to block every WWW site that has
ever been archived by the company, but that's what they have been doing
for quite some weeks.

Try going to http://www.archive.org/index.php and then entering
www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
links, and instead of seeing the BBC's frontpage you will see "Not
found" because Demon's web filter has added:

http://iwfwebfilter.thus.net/web/ ....

to the front of the URL.

We complained to the IWF today, and they say they are not responsible:
"I can confirm that there is no entry for webarchive.org ... on the IWF
URL list or reported to the IWF and therefore I am unable to take your
complaint further".

I guess I could complain to archive.org, but they are almost certain to
say that it's not their fault that Demon is rewriting the URL (without
the authority of the IWF it would now seem).

--
Paul Terry

rothers

unread,
Jan 13, 2009, 2:24:33 PM1/13/09
to
On Tue, 13 Jan 2009 18:20:32 +0000, Paul Terry <ne...@nospam.demon.co.uk> wrote:

>In message <$8gZmyGUyHbJFwzr@[127.0.0.1]>, Wm...
><tcn...@blackhole.do-not-spam.me.uk> writes
>
>>Could it be archive.org is in error?
>
>I'm not sure how. Demon's IWF web filter is rewriting every link on
>their site - I don't think Demon intend to block every WWW site that has
>ever been archived by the company, but that's what they have been doing
>for quite some weeks.
>
>Try going to http://www.archive.org/index.php and then entering
>www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
>links, and instead of seeing the BBC's frontpage you will see "Not
>found" because Demon's web filter has added:
>
>http://iwfwebfilter.thus.net/web/ ....

Doesn't do that for me, all looks fine, who's DNS are you using ?

Paul Terry

unread,
Jan 13, 2009, 2:40:06 PM1/13/09
to
In message <biqpm4ht3mu7eehgd...@4ax.com>, rothers
<ne...@rothers.demon.co.uk> writes

>Doesn't do that for me, all looks fine,

So what exactly do you see?

> who's DNS are you using ?

Demon's.

Same results using three different browsers.
--
Paul Terry

Peter Ceresole

unread,
Jan 13, 2009, 3:05:37 PM1/13/09
to
Paul Terry <ne...@nospam.demon.co.uk> wrote:

> I guess I could complain to archive.org, but they are almost certain to
> say that it's not their fault that Demon is rewriting the URL (without
> the authority of the IWF it would now seem).

I tried (using Demon's DNS as always) news.bbc.co.uk and it works fine-
all the web pages I tried are accessible. But an attempt to get plain
bbc.co.uk fails, with the iwf redirection.

So it's not so simple. Demon is rewriting selectively (which they should
do, of course) but it looks like the grounds for rewriting are broken,
by any standards.

As I understand from the IWF page on Demon's site, the onus to get a
site unblocked is on the site owner. So it's up to Demon to contact the
IWF and ask what the hell is going on?
--
Peter

Wm...

unread,
Jan 13, 2009, 2:57:08 PM1/13/09
to
Tue, 13 Jan 2009 18:20:32 <FpRwq$IwtNb...@musonix.demon.co.uk>
demon.service Paul Terry <ne...@nospam.demon.co.uk>

>In message <$8gZmyGUyHbJFwzr@[127.0.0.1]>, Wm...
><tcn...@blackhole.do-not-spam.me.uk> writes
>
>>Could it be archive.org is in error?
>
>I'm not sure how. Demon's IWF web filter is rewriting every link on
>their site - I don't think Demon intend to block every WWW site that
>has ever been archived by the company, but that's what they have been
>doing for quite some weeks.

I feel ill. I mean physically sick. I didn't want to believe this.

>Try going to http://www.archive.org/index.php and then entering
>www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
>links, and instead of seeing the BBC's frontpage you will see "Not
>found" because Demon's web filter has added:
>
>http://iwfwebfilter.thus.net/web/ ....
>
>to the front of the URL.

I see what you say, Paul. I am horrified.

>We complained to the IWF today, and they say they are not responsible:
>"I can confirm that there is no entry for webarchive.org ... on the IWF
>URL list or reported to the IWF and therefore I am unable to take your
>complaint further".
>
>I guess I could complain to archive.org, but they are almost certain to
>say that it's not their fault that Demon is rewriting the URL (without
>the authority of the IWF it would now seem).

What is a person meant to say at this point?

If the bbc is bad what recourse do we have? Yes I can still access the
bbc website but I am very unhappy demon is abusing the IWF (I am not
currently a fan of theirs) in order to (presumably) prevent me seeing
something the bbc has produced.

I think this goes beyond Daily Mail puritanism.

Fix this very soon, demon

Rex M F Smith

unread,
Jan 13, 2009, 3:14:53 PM1/13/09
to
In message <FpRwq$IwtNb...@musonix.demon.co.uk>, Paul Terry
<ne...@nospam.demon.co.uk> writes

>Try going to http://www.archive.org/index.php and then entering
>www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
>links, and instead of seeing the BBC's frontpage you will see "Not
>found" because Demon's web filter has added:

>http://iwfwebfilter.thus.net/web/ ....
>to the front of the URL.

Indeed ... :-(
--
Rex M F Smith

Andy

unread,
Jan 13, 2009, 3:50:37 PM1/13/09
to
In message <y3wqkuJUIPbJFwy6@[127.0.0.1]>, Wm...
<tcn...@blackhole.do-not-spam.me.uk> wrote
[

>>Try going to http://www.archive.org/index.php and then entering
>>www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
>>links, and instead of seeing the BBC's frontpage you will see "Not
>>found" because Demon's web filter has added:
>>
>>http://iwfwebfilter.thus.net/web/ ....
>>
>>to the front of the URL.
>
>I see what you say, Paul. I am horrified.

I'm not greatly impressed either.

There's a further oddity - the URL-as-displayed doesn't work if I remove
the http://iwfwebfilter.thus.net/web/ bit, as it then has a numerical
string that looks like the date you're trying to go back to *followed
by* the BBC front page URL.

Ie, it looks as if someone has not only added a prefix, but also
inverted the original URL.

Andy

unread,
Jan 13, 2009, 3:57:05 PM1/13/09
to
In message <1ithz78.12iiu3q18s0zooN%pe...@cara.demon.co.uk>, Peter
Ceresole <pe...@cara.demon.co.uk> wrote

>
>I tried (using Demon's DNS as always) news.bbc.co.uk and it works fine-
>all the web pages I tried are accessible. But an attempt to get plain
>bbc.co.uk fails, with the iwf redirection.

Here, www.bbc.co.uk works and gets me "all the news that's fit to read",
or some of it anyway.

If I try bbc.co.uk then my browser (IE7) turns it into www.bbc.co.uk.

news.bbc.co.uk gets me a different BBC News page; possibly it's the UK
news while the www variant is world news.

John Hall

unread,
Jan 13, 2009, 4:08:56 PM1/13/09
to
In article <r1pFgUB9...@gehena.demon.co.uk>,

Yes, that's what I see too. Is it a cock-up, or has someone at Demon
decided that, as one or more of the archived pages at the site contain
paedophilic images (assuming that to be the case), the site itself is
therefore verboten and /all/ the links on the site must be made
inaccessible?
--
John Hall
"It is a very sad thing that nowadays there is so little useless
information."
Oscar Wilde (1854-1900)

Andy

unread,
Jan 13, 2009, 4:16:47 PM1/13/09
to
In message <y3wqkuJUIPbJFwy6@[127.0.0.1]>, Wm...
<tcn...@blackhole.do-not-spam.me.uk> wrote
[
>>Try going to http://www.archive.org/index.php and then entering
>>www.bbc.co.uk in the "Wayback Machine".
(or any website - including those hosted by Demon!!)

Then, choose an archived link, and hover your mouse over it. Your
browser may well display the text of the link; if so write down the very
long number (eg 19991023132451).

At the top of the browser you should be seeing the URL that's on screen.
It will resemble

web.archive.org/web/*/http://your.desired.website.

Select the * and replace it with the long number. Click 'Go'.

Works here :)

Les

unread,
Jan 13, 2009, 4:51:30 PM1/13/09
to

No problems for me either viewing the BBC's archive although I'm
getting not found on most pages dated 2001 and earlier, I'm using
Demon's standard DNS.
--
Les

Peter Ceresole

unread,
Jan 13, 2009, 5:03:40 PM1/13/09
to
Andy <an...@kitzbuhel.demon.co.uk> wrote:

> Here, www.bbc.co.uk works and gets me "all the news that's fit to read",
> or some of it anyway.

Ah; that works here too.

> news.bbc.co.uk gets me a different BBC News page; possibly it's the UK
> news while the www variant is world news.

No; www.bbc.co.uk is the BBC's general home page, news.bbc.co.uk is the
home page for BBC News.
--
Peter

Les

unread,
Jan 13, 2009, 5:10:45 PM1/13/09
to
In message <rK+LS0GO...@musonix.demon.co.uk>, Paul Terry
<ne...@nospam.demon.co.uk> writes

Seems to be a bit hit and miss here, when I first tried the BBC archives
they were ok after 2001 as I posted first. After trying a few other URLs
and getting the IWF filter notification I tried the BBC again and now
occasionally get the IWF message but mostly get the BBC pages ok.

A friends web site which I know has no dodgy content brings up the IWF
filter every time.
--
Les

Wm...

unread,
Jan 13, 2009, 5:10:24 PM1/13/09
to
Tue, 13 Jan 2009 21:08:56 <RU6ji2Ao...@jhall.demon.co.uk.invalid>
demon.service John Hall <nospam...@jhall.co.uk>

>In article <r1pFgUB9...@gehena.demon.co.uk>,
> Rex M F Smith <use...@gehena.demon.co.uk> writes:
>>In message <FpRwq$IwtNb...@musonix.demon.co.uk>, Paul Terry
>><ne...@nospam.demon.co.uk> writes
>>
>>>Try going to http://www.archive.org/index.php and then entering
>>>www.bbc.co.uk in the "Wayback Machine". Click on any of the archived
>>>links, and instead of seeing the BBC's frontpage you will see "Not
>>>found" because Demon's web filter has added:
>>
>>>http://iwfwebfilter.thus.net/web/ ....
>>>to the front of the URL.
>> Indeed ... :-(
>
>Yes, that's what I see too. Is it a cock-up, or has someone at Demon
>decided that, as one or more of the archived pages at the site contain
>paedophilic images (assuming that to be the case), the site itself is
>therefore verboten and /all/ the links on the site must be made
>inaccessible?

PaulT said the IWF weren't interfering. I am inclined to believe him on
the basis of "you have to trust someone once or else the whole thing
falls apart"

The likelihood of the bbc hosting images of child abuse is remote.

So, we are left with an idiot at demon.

Or, in plain words, a cock up.

Peter Grange

unread,
Jan 13, 2009, 5:37:26 PM1/13/09