Browser can get it but I can't ...

John M Chambers

unread,

Aug 17, 1999, 3:00:00 AM8/17/99

to

While writing and using a number of programs (in both C and perl)
that grab files from web servers and do interesting things with
them (that aren't relevant to this question), it has come to my
attention that a number of sites have a curious anomaly: The
common browsers can fetch files that my programs can't. I've
tested them in a number of ways, including a direct telnet to
port 80, and found that telnet plus an HTTP GET command also
fails. I'm trying to learn what the browsers' tricks are that
successfully fetch these files.

One that just came up yesterday is the URL
http://www.downie65.freeserve.co.uk/abcfiles/
The owner of this directory told me about it and invited me
to scan the files there into my growing index, so I clearly
have his permission. Netscape and IE can read this directory.
But consider the following:

: telnet www.downie65.freeserve.co.uk 80
Trying 195.92.193.55...
Connected to webspace.pol.co.uk.
Escape character is '^]'.
GET /abcfiles/ HTTP/1.0

HTTP/1.1 404 Not Found
Date: Tue, 17 Aug 1999 13:53:51 GMT
Server: CnG Webspace Server - based on Apache (Linux)
Connection: close
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /abcfiles/ was not found on this server.<P>
</BODY></HTML>
Connection closed by foreign host.
:

This should have fetched the directory, but the server told
me (i.e., it told telnet) that /abcfiles/ doesn't exist. Well,
it exists when Netscape asks.

Any clues as to what's going on here? Is there something that
Netscape sends, probably before or after the GET, that tells
this and various other servers that the request is ok? Is
this documented somewhere?

Patrick Hardlentil

unread,

Aug 17, 1999, 3:00:00 AM8/17/99

to

jc...@world.std.com (John M Chambers) writes:

> While writing and using a number of programs (in both C and perl)
> that grab files from web servers and do interesting things with
> them (that aren't relevant to this question), it has come to my
> attention that a number of sites have a curious anomaly: The
> common browsers can fetch files that my programs can't.

[...]
> GET /abcfiles/ HTTP/1.0

I had this problem for the same reason as you, and solved it by
sending the full URL, ie http://www.downie65.freeserve.co.uk/abcfiles/
The protocol doesn't seem to require this (unless I'm reading it
wrong :-), but there ya go...

Cheers,
--
| Patrick ... Bring me your dogs ... Alas, they cannot hold water
| pat...@dogslobber.demon.co.uk http://www.dogslobber.demon.co.uk

Toby Speight

unread,

Aug 18, 1999, 3:00:00 AM8/18/99

to

Patrick> Patrick Hardlentil
Patrick> <URL:mailto:pat...@dogslobber.demon.co.uk>

0> In <URL:news:m31zd2j...@dogslobber.demon.co.uk>, Patrick wrote:

Patrick> jc...@world.std.com (John M Chambers) writes:

>> While writing and using a number of programs (in both C and perl)
>> that grab files from web servers and do interesting things with
>> them (that aren't relevant to this question), it has come to my
>> attention that a number of sites have a curious anomaly: The common
>> browsers can fetch files that my programs can't.
>>
>> ...
>> GET /abcfiles/ HTTP/1.0

Patrick> I had this problem for the same reason as you, and solved it
Patrick> by sending the full URL, ie
Patrick> http://www.downie65.freeserve.co.uk/abcfiles/
Patrick> The protocol doesn't seem to require this (unless I'm reading it
Patrick> wrong :-), but there ya go...

You're using a different mechanism now - you're now asking the server
to act as a proxy for itself.

One common cause for inability to retrieve is because the server is
using name-based virtual hosting, and needs a Host header (note that
it gave a HTTP/1.1 response).

Patrick Hardlentil

unread,

Aug 19, 1999, 3:00:00 AM8/19/99

to

Toby Speight <Toby.S...@streapadair.freeserve.co.uk> writes:

> 0> In <URL:news:m31zd2j...@dogslobber.demon.co.uk>, Patrick wrote:
> Patrick> I had this problem for the same reason as you, and solved it
> Patrick> by sending the full URL, ie
>

> You're using a different mechanism now - you're now asking the server
> to act as a proxy for itself.
>
> One common cause for inability to retrieve is because the server is
> using name-based virtual hosting, and needs a Host header (note that
> it gave a HTTP/1.1 response).

Thanks for this - it's a useful tip (and I'm sorry for posting a
half-baked answer). But I'm also confused: RFC2616 (5.1.2) says that
an absolute URI is required only for proxies, but must be accepted by
all servers, and implies that it will be *required* in future versions
of HTTP. So, am I really using a different mechanism? Is it likely to
be less efficient?

Whatever, I'll fix my programs with your Host header suggestion... one
day :-)

Toby Speight

unread,

Aug 19, 1999, 3:00:00 AM8/19/99

to

Patrick> Patrick Hardlentil
Patrick> <URL:mailto:pat...@dogslobber.demon.co.uk>

0> In <URL:news:m3zozni...@dogslobber.demon.co.uk>, Patrick wrote:

Patrick> But I'm also confused: RFC2616 (5.1.2) says that an
Patrick> absolute URI is required only for proxies, but must be
Patrick> accepted by all servers, and implies that it will be
Patrick> *required* in future versions of HTTP. So, am I really
Patrick> using a different mechanism? Is it likely to be less
Patrick> efficient?

I'd missed this. Sorry for the misinformation. (And, to answer your
questions, no and probably not - for newish servers, anyway).