Im trying to recieve content from a web server that can serve gzip pages
(i would like the data to be gzipped by the server prior to serving them, as
the data is large text).
What are the options i need to set for idhttp so that it can recieve
gzipped data
i have set the following :
idhttp.AcceptEncoding := 'gzip' <-- tell the webserver that i can recieve
pages.
idhttp.ContentEncoding := 'gzip'
but still my memo shows binary data- im getting data like this :
memo1.Lines.Add(idhttp.Get('http://myhost.com/zlib_compressed_page.php'));
Gamin.
Bas
"dvorak" <dvorak.m_at_ifrance_dot_com> wrote in message
news:3e8e...@newsgroups.borland.com...
> but still my memo shows binary data- im getting data like this :
From how I read your post, you are expecting TIdHTTP to do the
decompression. As far as I know it will not do that itself.
johannes
Correct.
--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"
Need extra help with an Indy problem?
http://www.atozedsoftware.com/indy/experts/support.html
ELKNews - Get your free copy at http://www.atozedsoftware.com
uses zlib;
procedure DecompressStream(inpStream,outStream: TStream);
var InpBuf,OutBuf: Pointer;
var OutBytes,sz: integer;
begin
InpBuf := nil;
OutBuf := nil;
sz := inpStream.size-inpStream.Position;
if sz > 0 then try
GetMem(InpBuf,sz);
inpStream.Read(InpBuf^,sz);
DecompressBuf(InpBuf,sz,0,OutBuf,OutBytes);
outStream.Write(OutBuf^,OutBytes);
finally
if InpBuf <> nil then FreeMem(InpBuf);
if OutBuf <> nil then FreeMem(OutBuf);
end;
outStream.Position := 0;
end;
procedure CheckAndDecompress(var inpStream: TMemoryStream);
var
must_decompress : boolean;
zip_header : array[1..8] of char;
tmpStream : TMemoryStream;
begin
inpStream.Read(zip_header,8);
must_decompress := zip_header = #31'<'#8#0#0#0#0#0;
if not must_decompress then begin
inpStream.position := 0;
exit;
end;
//else: position of inpStream must stay at position 8 (length of lz
header)
tmpStream := TMemoryStream.Create;
tmpStream.position := 0;
DecompressStream(inpStream, tmpStream);
inpStream.free;
inpStream := tmpStream;
end;
Alternatively I have an HTTP component that can identify and
decompress the stream transparently. If you are interested
have a look here http://www.badfan.com/delphi
Regards,
Kyriacos
> uses zlib;
>
> procedure DecompressStream(inpStream,outStream: TStream);
-- skip functions --
Thank you for the functions i tried them and i think the header of the
zgipped file is different. The content of the page is simply '123' which is
gzipped. Check out what i got -
started....
WWWConnect::Connect("192.168.0.2","80")\n
source port: 3280\r\n
REQUEST: **************\n
GET /pima/test.php HTTP/1.1\r\n
Accept: */*\r\n
Accept-Language: en-us\r\n
Accept-Encoding: gzip, deflate\r\n
Host: 192.168.0.2\r\n
\r\n
RESPONSE: **************\n
HTTP/1.1 200 OK\r\n
Date: Mon, 07 Apr 2003 08:33:06 GMT\r\n
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) mod_ssl/2.8.12 OpenSSL/0.9.6b
DAV/1.0.2 PHP/4.3.1 mod_perl/1.24_01\r\n
X-Powered-By: PHP/4.3.1\r\n
Content-Encoding: gzip\r\n
Vary: Accept-Encoding\r\n
Connection: close\r\n
Transfer-Encoding: chunked\r\n
Content-Type: text/html\r\n
\r\n
\x01F<\x008\x000\x000\x000\x000\x000\x000\x003242\x006\x000\x000\x000ÿÿ\x003
\x000ÒcH^\x003\x000\x000\x000
WWWConnect::Close("192.168.0.2","80")\n
closed source port: 3280\r\n
finished.
Thx for the help
D
[snip]
After watching this now for a while I decided to take a quick look at the
gzip format.
GZIP is essentially a wrapper around ZLIB, which adds an extra header. You
will have to strip that off first, and check the CRC afterwards.
johannes
must_decompress := zip_header = #31#139#8#0#0#0#0#0;
Anyway if that does not work, just delete the lines that do the check.
if not must_decompress then begin
inpStream.position := 0;
exit;
end;
"dvorak" <dvorak.m_at_ifrance_dot_com> wrote in message
news:3e91...@newsgroups.borland.com...
> The problem was that I accidentally included ASCII char 139 in my message
> and it was converted to something similar ('>').The correct line should
be.
>
> must_decompress := zip_header = #31#139#8#0#0#0#0#0;
>
i figured the typo and corrected it, but still im getting an error
inpStream.Read(zip_header,8);
must_decompress := zip_header = #31#139#8#0#0#0#0#0;
if not must_decompress then begin
inpStream.position := 0;
exit;
end;
//else: position of inpStream must stay at position 8 (length of lz
header)
tmpStream := TMemoryStream.Create;
tmpStream.position := 0;
DecompressStream(inpStream, tmpStream); // at this point Dcheck in Zlib
raises and exception that the data is bad
inpStream.free;
inpStream := tmpStream;
This could be something to do with the CRC (as Johannes pointed out), does
the CRC need to be that of the contents (which are being gzipped) or of the
gzip header and the contents.
What are your suggestions ?
Thank you and regards
D
> inpStream.Read(zip_header,8);
zip_header <> gzip_header, and what you have here:
> must_decompress := zip_header = #31#139#8#0#0#0#0#0;
is surely not a gzip header.
gzip files start at least with "GZ".
johannes
> > inpStream.Read(zip_header,8);
>
> zip_header <> gzip_header, and what you have here:
>
> > must_decompress := zip_header = #31#139#8#0#0#0#0#0;
>
> is surely not a gzip header.
> gzip files start at least with "GZ".
Maybe im confused (well, i guess without a doubt im confused). But what does
it mean when it says content-encoding : gzip in an HTTP response.
D
> Maybe im confused (well, i guess without a doubt im confused). But what does
> it mean when it says content-encoding : gzip in an HTTP response.
It means that the file was compressed with gzip befoe sending it to you.
Just as if you had compressed it with the gzip program yourself. Actually,
some web servers store the gzipped version of files, and ungzip when
necessary (because most clients nowadays can deal with gzip).
johannes
If i understand you right, you mean that the header is not a valid gzip
header.
But if save the stream to a file - winzip, winrar and gzip (on linux) can
successfully decompress the data. Then is the header gzip ?
But i still cant get delphi to decompress this stream. :-(
D
> If i understand you right, you mean that the header is not a valid gzip
> header.
No. I said that the stuff the code was looking for was not a gzip header,
but you should be looking for a gzip header _instead_.
johannes
Just another question if the header is bad then how is IE displaying the
webpage (it is sending content-encoded gzip, as i also printed the
accept-encoding to double check). How is gunzip able to successfully
decompress the data ?
Sincere thanks to you for at least answering
D
> That puts me in a dilemma, I checked the header being sent, it is exaclty
> that #31#139#8#0#0#0#0#0, Now is that the correct gzip header or not, that
> i dont know. I have been in the :-{ state for the last 2-3 days.
Ouch! I was wrong, my apologies! Turns out I was thinking bzip (which as a
BZ header), and for some reason I figured gzip has a GZ header. Thats
wrong.
Now, you should take a look at this:
http://www.faqs.org/rfcs/rfc1952.html
It turns out that the 4th byte is flags, and then modification time etc,
and after that there is still a variable length header before the deflated
data.
I hope with the information from the RFC you can solve your problem.
johannes
Are you using borland's zlib library or another one? I remember having
problems
with other libraries. Can you put that sample page on a public ip to have a
look?
By the way, do you compress the content by yourself on the server?
yes im using the zlib library from borland, it is -
source/rtl/commom/zlib.pas
Can you put that sample page on a public ip to have a
> look?
http://209.61.155.108/test.php (the content is '123')
> By the way, do you compress the content by yourself on the server?
>
PHP offers 2 ways of transparently compressing files (unless i use mod_gzip
which does not work directly with mod_ssl so im not going for it)
zlib.output_compression directive and use of ob_start("ob_gzhandler") {ob =
output buffering} inside the script. Only one of the 2 should be enabled
otherwise the resultant page gets compressed twice. At present im using
"ob_gzhandler"
Thx
D
The fact that you have a "Content-Encoding: gzip" does not sufficient.
That is, if the content is not compressed, IE will just display the
uncompressed
output normally.
Here is a sample php script you could try to use.
<?
$force_compression = 1; //decide if you want to compress the php output or
not
ob_start();
print "123"; //print your contents here
//footer
$str = ob_get_contents();
ob_end_clean();
if ($use_compression) {
$str = gzcompress($str);
Header("Content-Encoding: gzip");
print "\x1f\x8b\x08\x00\x00\x00\x00\x00";
}
print $str; //print uncompressed results
?>
A have a working example here: http://212.31.100.206/test_gz.php
"dvorak" <dvorak.m_at_ifrance_dot_com> wrote in message
news:3e91...@newsgroups.borland.com...
> Here is a sample php script you could try to use.
>
> <?
> $use_compression = 1; //decide if you want to compress the php output or
not
Thank you, this works. Im still confused why the simply using
ob_start("ob_gzhander") does not work. But that belongs in the PHP forums.
Thank you again, a great relief for me
D
> Maybe im confused (well, i guess without a doubt im confused). But what does
> it mean when it says content-encoding : gzip in an HTTP response.
The content is the same thing as if the server would be giving you a file
that is compressed with gzip.
Some servers even store the compressed files on HD nowadays because its
cheaper to uncompress for a few clients than to compress for a lot of
clients (most clients support gzip compression).
So, once you got the file you have two options:
a) store it and run gzip -d over it
or
b) read the header, then use zlib deflate uncompression, then check the
CRC that was in the header
johannes