I hope someone can answer this quickly. I am using TIdHttp to read a
file from the web. Is there a way to test for the existence of a file
before downloading it without waiting for a timeout?
TIA
--
Mark Patterson
www.piedsoftware.com
> Hi
>
> I hope someone can answer this quickly. I am using TIdHttp to read a
> file from the web. Is there a way to test for the existence of a file
> before downloading it without waiting for a timeout?
>
> TIA
On HTTP when a file is not found on the server a 404 error should be
returned, a time out meens that there are other unidentified problems
like network problems server busy etc.
regards
Yannis
--
"Quotation confesses inferiority." -- Ralph Waldo Emerson
> I hope someone can answer this quickly. I am using TIdHttp to read a
> file from the web. Is there a way to test for the existence of a file
> before downloading it without waiting for a timeout?
You would want to use the 'Head' command instead of the 'Get' command,
but if your next step is to download it then I don't see the benefit
of issuing two requests to the web server.
--
Marc Rohloff [TeamB]
marc -at- marc rohloff -dot- com
> I am using TIdHttp to read a file from the web. Is there a way
> to test for the existence of a file before downloading it without
> waiting for a timeout?
Perform a Head() request, and then check the reply.
If you are going to download it anyway, though, then just Get() it normally.
The reply will tell you if the file existed or not.
Gambit
> if your next step is to download it then I don't see the
> benefit of issuing two requests to the web server.
There is one benefit, which web browsers use - to retreive the filename and
prompt the user with it before actually downloading the data, in case the
user decides to cancel. With a 'Get' request, you would have to close the
connection in order to cancel the download.
Gambit
> There is one benefit, which web browsers use - to retreive the filename and
> prompt the user with it before actually downloading the data, in case the
> user decides to cancel. With a 'Get' request, you would have to close the
> connection in order to cancel the download.
IE only issues a single GET request. I don't know if it caches the
download while waiting for the user to respond or if it just stops
receiving (Judging by the response it is doing the former).
Assuming most people intend to continue their download it makes more
sense to do a GET and cancel occasionally than to almost always issue
two requests.
> IE only issues a single GET request.
That's not what I've seen it do in the past, but sure enough I just checked
it and it's not doing it anymore.
> I don't know if it caches the download while waiting for
> the user to respond or if it just stops receiving
The server has already started pushing the data to the browser in reply to
the GET, but the browser does not read it from the connection until after
the prompt is accepted by the user.
> Assuming most people intend to continue their download it makes
> more sense to do a GET and cancel occasionally than to almost
> always issue two requests.
I wasn't saying to always issue HEAD before GET. When an automated
downloaded is needed, that doesn't make sense to do. But when the download
needs to interact with the user, it does make more sense to reduce
bandwidth.
Gambit
Thanks for that. I'm not sure what to look for.
Head is: procedure Head(AURL: string);
It calls DoRequest which is a 70 line procedure with nothing that is
immediately obviously the way to check.
Can you point me to the next step?
> Thanks for that. I'm not sure what to look for.
> Head is: procedure Head(AURL: string);
> It calls DoRequest which is a 70 line procedure with nothing that is
> immediately obviously the way to check.
>
> Can you point me to the next step?
You need to check the result code. 200 (or anything in the 2xx range)
is OK, 404 means 'file not found'. You could probably treat anything
else as an error.
If you want more information then you could check the returned headers
which would normally give you the file's mime type, size and last
modification time.
> The server has already started pushing the data to the browser in reply to
> the GET, but the browser does not read it from the connection until after
> the prompt is accepted by the user.
I'm not sure about this, I would expect the server to time out at some
point if the client didn't read the data. I have also noticed that if
I leave the prompt up for a while then the progress dialog starts with
part, or sometimes all, of the file downloaded
I've been trying out some of the ideas from my first post, but for the
case I'm dealing with, I still don't have a way of sorting out whether
the file I want is available or not.
Here is my code:
httpReader := TIdHTTP.Create(nil);
try
httpReader.Head(edtURL.Text);
addLog('Head resp code', IntToStr(httpReader.ResponseCode));
addLog('Head resp text', httpReader.ResponseText);
lstContents.Items.Text := httpReader.Get(edtURL.Text);
addLog('Response code', IntToStr(httpReader.ResponseCode));
addLog('Response text', httpReader.ResponseText);
addLog('R/R text', httpReader.Response.ResponseText);
addLog('Redirect count', IntToStr(httpReader.RedirectCount));
addLog('URL', httpReader.URL.Path);
finally
httpReader.Free;
HourGlass(false);
end;
I'm always getting a code of 200. What the web-site is doing is
generating a standard "Page not found" page,
See:
http://www.afrsmartinvestor.com.au/asxdata/20080523/SM_SECYG.csv
For most other friday dates you get a multi-1000 line csv file.
e.g.
http://www.afrsmartinvestor.com.au/asxdata/20080711/SM_SECYG.csv
I would like to be able to test is the file I'm after is present or not,
using the INDY components. Can anyone help?
--
Mark Patterson
www.piedsoftware.com
> Here is my code:
<snip>
> I'm always getting a code of 200.
Then the request is succeeding and returning back valid data, from the
server's perspective.
> What the web-site is doing is generating a standard
> "Page not found" page
The server is sending a valid HML page back with a non-error ResponseCode.
You will have to parse the data to get the actual error message from it
separately. You should also be looking at the response's "Content-Type"
header to know what kind of data is actially being returned. If you expect
something to be specified for the requested file but don't get it, then you
know something went wrong. In this case, a valid .csv file is delivered
with a "Content-Type" of "application/octet-stream", whereas the error page
has a "Content-Type" of "text/html" instead.
Gambit
> I'm always getting a code of 200. What the web-site is doing is
> generating a standard "Page not found" page,
In theory it should still return a 404 error code.
> http://www.afrsmartinvestor.com.au/asxdata/20080523/SM_SECYG.csv
The returned headers are:
HTTP/1.1 200 OK
Date: Tue, 22 Jul 2008 14:31:19 GMT
Server: Microsoft-IIS/6.0
P3P: policyref="http://f2.com.au/w3c/p3p.xml", CP="CAO DSP LAW CURa
ADMa DEVa TAIa PSAa PSDa IVAi IVDi OUR IND PHY ONL UNI PUR FIN COM NAV
INT DEM CNT PRE GOV"
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Cache-Control: private
Content-Type: text/html; charset=iso-8859-1
Content-Length: 20847
> http://www.afrsmartinvestor.com.au/asxdata/20080711/SM_SECYG.csv
Here the headers are:
HTTP/1.1 200 OK
Content-Length: 393469
Content-Type: application/octet-stream
Last-Modified: Fri, 11 Jul 2008 12:15:04 GMT
Accept-Ranges: bytes
ETag: "014cfbf4fe3c81:3da"
Server: Microsoft-IIS/6.0
P3P: policyref="http://f2.com.au/w3c/p3p.xml", CP="CAO DSP LAW CURa
ADMa DEVa TAIa PSAa PSDa IVAi IVDi OUR IND PHY ONL UNI PUR FIN COM NAV
INT DEM CNT PRE GOV"
X-Powered-By: ASP.NET
Date: Tue, 22 Jul 2008 14:33:21 GMT
I would look at the content-type header returned and decide based on
that.
> In theory it should still return a 404 error code.
In theory, yes. In practice, not always. Many servers return a 200 with an
HTML error page. Had those servers been implemented properly, they could
still return HTML error pages even when returning 404, though the browser
would decide whether to actually display it or not.
Gambit
etc
Yes, that looked useful. I'm not after an html file. How did you code that?
--
Mark Patterson
www.piedsoftware.com
> Yes, that looked useful. I'm not after an html file. How did you code that?
You can read TidHttp.Response.ContentType. It is safe to do this in
the OnHeadersAvailable event. There are other places you can check it,
but you need to be careful since it is not valid at the earliest
stages of processing.
Marc Rohloff [TeamB] wrote:
> You can read TidHttp.Response.ContentType. It is safe to do this in
> the OnHeadersAvailable event. There are other places you can check it,
> but you need to be careful since it is not valid at the earliest
> stages of processing.
>
--
Mark Patterson
www.piedsoftware.com