Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How do you test for the existence of a file at a URL?

3 views
Skip to first unread message

Mark Patterson

unread,
Jul 16, 2008, 8:28:04 PM7/16/08
to
Hi

I hope someone can answer this quickly. I am using TIdHttp to read a
file from the web. Is there a way to test for the existence of a file
before downloading it without waiting for a timeout?

TIA
--
Mark Patterson
www.piedsoftware.com

yannis

unread,
Jul 17, 2008, 4:30:45 AM7/17/08
to
Mark Patterson wrote:

> Hi
>
> I hope someone can answer this quickly. I am using TIdHttp to read a
> file from the web. Is there a way to test for the existence of a file
> before downloading it without waiting for a timeout?
>
> TIA

On HTTP when a file is not found on the server a 404 error should be
returned, a time out meens that there are other unidentified problems
like network problems server busy etc.

regards
Yannis
--
"Quotation confesses inferiority." -- Ralph Waldo Emerson

Marc Rohloff [TeamB]

unread,
Jul 17, 2008, 10:47:05 AM7/17/08
to
On Thu, 17 Jul 2008 10:28:04 +1000, Mark Patterson wrote:

> I hope someone can answer this quickly. I am using TIdHttp to read a
> file from the web. Is there a way to test for the existence of a file
> before downloading it without waiting for a timeout?

You would want to use the 'Head' command instead of the 'Get' command,
but if your next step is to download it then I don't see the benefit
of issuing two requests to the web server.

--
Marc Rohloff [TeamB]
marc -at- marc rohloff -dot- com

Remy Lebeau (TeamB)

unread,
Jul 17, 2008, 3:24:37 PM7/17/08
to

"Mark Patterson" <nos...@stopbots.com> wrote in message
news:487e...@newsgroups.borland.com...

> I am using TIdHttp to read a file from the web. Is there a way
> to test for the existence of a file before downloading it without
> waiting for a timeout?

Perform a Head() request, and then check the reply.

If you are going to download it anyway, though, then just Get() it normally.
The reply will tell you if the file existed or not.


Gambit


Remy Lebeau (TeamB)

unread,
Jul 17, 2008, 3:26:33 PM7/17/08
to

"Marc Rohloff [TeamB]" <ma...@nospam.marcrohloff.com> wrote in message
news:1892gxro...@dlg.marcrohloff.com...

> if your next step is to download it then I don't see the
> benefit of issuing two requests to the web server.

There is one benefit, which web browsers use - to retreive the filename and
prompt the user with it before actually downloading the data, in case the
user decides to cancel. With a 'Get' request, you would have to close the
connection in order to cancel the download.


Gambit


Marc Rohloff [TeamB]

unread,
Jul 17, 2008, 4:28:18 PM7/17/08
to
On Thu, 17 Jul 2008 12:26:33 -0700, Remy Lebeau (TeamB) wrote:

> There is one benefit, which web browsers use - to retreive the filename and
> prompt the user with it before actually downloading the data, in case the
> user decides to cancel. With a 'Get' request, you would have to close the
> connection in order to cancel the download.

IE only issues a single GET request. I don't know if it caches the
download while waiting for the user to respond or if it just stops
receiving (Judging by the response it is doing the former).
Assuming most people intend to continue their download it makes more
sense to do a GET and cancel occasionally than to almost always issue
two requests.

Remy Lebeau (TeamB)

unread,
Jul 17, 2008, 6:05:31 PM7/17/08
to

"Marc Rohloff [TeamB]" <ma...@nospam.marcrohloff.com> wrote in message
news:trac8me3...@dlg.marcrohloff.com...

> IE only issues a single GET request.

That's not what I've seen it do in the past, but sure enough I just checked
it and it's not doing it anymore.

> I don't know if it caches the download while waiting for
> the user to respond or if it just stops receiving

The server has already started pushing the data to the browser in reply to
the GET, but the browser does not read it from the connection until after
the prompt is accepted by the user.

> Assuming most people intend to continue their download it makes
> more sense to do a GET and cancel occasionally than to almost
> always issue two requests.

I wasn't saying to always issue HEAD before GET. When an automated
downloaded is needed, that doesn't make sense to do. But when the download
needs to interact with the user, it does make more sense to reduce
bandwidth.


Gambit


Mark Patterson

unread,
Jul 18, 2008, 12:03:52 AM7/18/08
to

Thanks for that. I'm not sure what to look for.
Head is: procedure Head(AURL: string);
It calls DoRequest which is a 70 line procedure with nothing that is
immediately obviously the way to check.

Can you point me to the next step?

Marc Rohloff [TeamB]

unread,
Jul 18, 2008, 8:53:40 AM7/18/08
to
On Fri, 18 Jul 2008 14:03:52 +1000, Mark Patterson wrote:

> Thanks for that. I'm not sure what to look for.
> Head is: procedure Head(AURL: string);
> It calls DoRequest which is a 70 line procedure with nothing that is
> immediately obviously the way to check.
>
> Can you point me to the next step?

You need to check the result code. 200 (or anything in the 2xx range)
is OK, 404 means 'file not found'. You could probably treat anything
else as an error.
If you want more information then you could check the returned headers
which would normally give you the file's mime type, size and last
modification time.

Marc Rohloff [TeamB]

unread,
Jul 18, 2008, 8:50:18 AM7/18/08
to
On Thu, 17 Jul 2008 15:05:31 -0700, Remy Lebeau (TeamB) wrote:

> The server has already started pushing the data to the browser in reply to
> the GET, but the browser does not read it from the connection until after
> the prompt is accepted by the user.

I'm not sure about this, I would expect the server to time out at some
point if the client didn't read the data. I have also noticed that if
I leave the prompt up for a while then the progress dialog starts with
part, or sometimes all, of the file downloaded

Mark Patterson

unread,
Jul 20, 2008, 4:04:41 AM7/20/08
to
Hi,

I've been trying out some of the ideas from my first post, but for the
case I'm dealing with, I still don't have a way of sorting out whether
the file I want is available or not.

Here is my code:

httpReader := TIdHTTP.Create(nil);
try
httpReader.Head(edtURL.Text);
addLog('Head resp code', IntToStr(httpReader.ResponseCode));
addLog('Head resp text', httpReader.ResponseText);
lstContents.Items.Text := httpReader.Get(edtURL.Text);
addLog('Response code', IntToStr(httpReader.ResponseCode));
addLog('Response text', httpReader.ResponseText);
addLog('R/R text', httpReader.Response.ResponseText);
addLog('Redirect count', IntToStr(httpReader.RedirectCount));
addLog('URL', httpReader.URL.Path);
finally
httpReader.Free;
HourGlass(false);
end;

I'm always getting a code of 200. What the web-site is doing is
generating a standard "Page not found" page,
See:
http://www.afrsmartinvestor.com.au/asxdata/20080523/SM_SECYG.csv

For most other friday dates you get a multi-1000 line csv file.
e.g.
http://www.afrsmartinvestor.com.au/asxdata/20080711/SM_SECYG.csv

I would like to be able to test is the file I'm after is present or not,
using the INDY components. Can anyone help?


--
Mark Patterson
www.piedsoftware.com

Remy Lebeau (TeamB)

unread,
Jul 21, 2008, 3:04:43 AM7/21/08
to

"Mark Patterson" <nos...@stopbots.com> wrote in message
news:4882f199$1...@newsgroups.borland.com...

> Here is my code:
<snip>


> I'm always getting a code of 200.

Then the request is succeeding and returning back valid data, from the
server's perspective.

> What the web-site is doing is generating a standard
> "Page not found" page

The server is sending a valid HML page back with a non-error ResponseCode.
You will have to parse the data to get the actual error message from it
separately. You should also be looking at the response's "Content-Type"
header to know what kind of data is actially being returned. If you expect
something to be specified for the requested file but don't get it, then you
know something went wrong. In this case, a valid .csv file is delivered
with a "Content-Type" of "application/octet-stream", whereas the error page
has a "Content-Type" of "text/html" instead.


Gambit


Marc Rohloff [TeamB]

unread,
Jul 22, 2008, 10:40:27 AM7/22/08
to
On Sun, 20 Jul 2008 18:04:41 +1000, Mark Patterson wrote:

> I'm always getting a code of 200. What the web-site is doing is
> generating a standard "Page not found" page,

In theory it should still return a 404 error code.

> http://www.afrsmartinvestor.com.au/asxdata/20080523/SM_SECYG.csv
The returned headers are:
HTTP/1.1 200 OK
Date: Tue, 22 Jul 2008 14:31:19 GMT
Server: Microsoft-IIS/6.0
P3P: policyref="http://f2.com.au/w3c/p3p.xml", CP="CAO DSP LAW CURa
ADMa DEVa TAIa PSAa PSDa IVAi IVDi OUR IND PHY ONL UNI PUR FIN COM NAV
INT DEM CNT PRE GOV"
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Cache-Control: private
Content-Type: text/html; charset=iso-8859-1
Content-Length: 20847

> http://www.afrsmartinvestor.com.au/asxdata/20080711/SM_SECYG.csv
Here the headers are:
HTTP/1.1 200 OK
Content-Length: 393469
Content-Type: application/octet-stream
Last-Modified: Fri, 11 Jul 2008 12:15:04 GMT
Accept-Ranges: bytes
ETag: "014cfbf4fe3c81:3da"
Server: Microsoft-IIS/6.0
P3P: policyref="http://f2.com.au/w3c/p3p.xml", CP="CAO DSP LAW CURa
ADMa DEVa TAIa PSAa PSDa IVAi IVDi OUR IND PHY ONL UNI PUR FIN COM NAV
INT DEM CNT PRE GOV"
X-Powered-By: ASP.NET
Date: Tue, 22 Jul 2008 14:33:21 GMT

I would look at the content-type header returned and decide based on
that.

Remy Lebeau (TeamB)

unread,
Jul 22, 2008, 1:43:56 PM7/22/08
to

"Marc Rohloff [TeamB]" <ma...@nospam.marcrohloff.com> wrote in message
news:12004212oncpl$.dlg@dlg.marcrohloff.com...

> In theory it should still return a 404 error code.

In theory, yes. In practice, not always. Many servers return a 200 with an
HTML error page. Had those servers been implemented properly, they could
still return HTML error pages even when returning 404, though the browser
would decide whether to actually display it or not.


Gambit


Mark Patterson

unread,
Jul 24, 2008, 2:01:11 AM7/24/08
to
Marc Rohloff [TeamB] wrote:
> On Sun, 20 Jul 2008 18:04:41 +1000, Mark Patterson wrote:
>
>> I'm always getting a code of 200. What the web-site is doing is
>> generating a standard "Page not found" page,
>
> In theory it should still return a 404 error code.
>
>> http://www.afrsmartinvestor.com.au/asxdata/20080523/SM_SECYG.csv
> The returned headers are:

etc

Yes, that looked useful. I'm not after an html file. How did you code that?

--
Mark Patterson
www.piedsoftware.com

Marc Rohloff [TeamB]

unread,
Jul 24, 2008, 8:53:40 AM7/24/08
to
On Thu, 24 Jul 2008 16:01:11 +1000, Mark Patterson wrote:

> Yes, that looked useful. I'm not after an html file. How did you code that?

You can read TidHttp.Response.ContentType. It is safe to do this in
the OnHeadersAvailable event. There are other places you can check it,
but you need to be careful since it is not valid at the earliest
stages of processing.

Mark Patterson

unread,
Jul 24, 2008, 9:44:57 PM7/24/08
to
Great! That solved it.

Marc Rohloff [TeamB] wrote:
> You can read TidHttp.Response.ContentType. It is safe to do this in
> the OnHeadersAvailable event. There are other places you can check it,
> but you need to be careful since it is not valid at the earliest
> stages of processing.
>


--
Mark Patterson
www.piedsoftware.com

0 new messages