compress/gzip getting uncompressed file size

751 views
Skip to first unread message

Vasiliy Tolstov

unread,
Jan 19, 2015, 9:54:23 AM1/19/15
to golang-nuts
Does it possible to get uncompressed file size in compress/gzip?


--
Vasiliy Tolstov,
e-mail: v.to...@selfip.ru
jabber: va...@selfip.ru

Michael Gehring

unread,
Jan 19, 2015, 10:45:09 AM1/19/15
to Vasiliy Tolstov, golang-nuts
On Mon, Jan 19, 2015 at 06:53:51PM +0400, Vasiliy Tolstov wrote:
> Does it possible to get uncompressed file size in compress/gzip?

compress/gzip doesn't expose it, but the last 4 bytes in a gzip
file/stream contain the uncompressed size (litte-endian).

Nick Craig-Wood

unread,
Jan 19, 2015, 11:27:10 AM1/19/15
to Michael Gehring, Vasiliy Tolstov, golang-nuts
Except that is only good for files up to 4GB :-(

I had to do exactly this recently, but since I had to read the gzipped
file anyway I used a TeeReader to decompress it and count the size as I
went along.

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Vasiliy Tolstov

unread,
Jan 19, 2015, 4:40:59 PM1/19/15
to Michael Gehring, golang-nuts
2015-01-19 18:44 GMT+03:00 Michael Gehring <m...@ebfe.org>:
> compress/gzip doesn't expose it, but the last 4 bytes in a gzip
> file/stream contain the uncompressed size (litte-endian).


if i get file via http, a i understand i need ReaderAt, but net/http
package does not have thi interface to response Body. How the best get
this needed data via http? Or i need to do manually range request?

Brad Fitzpatrick

unread,
Jan 19, 2015, 4:51:29 PM1/19/15
to Vasiliy Tolstov, Michael Gehring, golang-nuts
On Mon, Jan 19, 2015 at 1:40 PM, Vasiliy Tolstov <v.to...@selfip.ru> wrote:
2015-01-19 18:44 GMT+03:00 Michael Gehring <m...@ebfe.org>:
> compress/gzip doesn't expose it, but the last 4 bytes in a gzip
> file/stream contain the uncompressed size (litte-endian).


if i get file via http, a i understand i need ReaderAt, but net/http
package does not have thi interface to response Body. How the best get
this needed data via http? Or i need to do manually range request?

Depends how big the resource is, and whether you'll want the rest of it later. If it's small, I'd just slurp it all into memory. If it's huge, yes, you can do a Range request to get the final 4 bytes or whatever.

Vasiliy Tolstov

unread,
Jan 19, 2015, 4:52:43 PM1/19/15
to Brad Fitzpatrick, Michael Gehring, golang-nuts
2015-01-20 0:51 GMT+03:00 Brad Fitzpatrick <brad...@golang.org>:
> Depends how big the resource is, and whether you'll want the rest of it
> later. If it's small, I'd just slurp it all into memory. If it's huge, yes,
> you can do a Range request to get the final 4 bytes or whatever.


Ok. Does go-team have plans to satisfy ReaderAt interface for response
body and translate to range request automatic?

Brad Fitzpatrick

unread,
Jan 19, 2015, 4:58:33 PM1/19/15
to Vasiliy Tolstov, Michael Gehring, golang-nuts
On Mon, Jan 19, 2015 at 1:52 PM, Vasiliy Tolstov <v.to...@selfip.ru> wrote:
2015-01-20 0:51 GMT+03:00 Brad Fitzpatrick <brad...@golang.org>:
> Depends how big the resource is, and whether you'll want the rest of it
> later. If it's small, I'd just slurp it all into memory. If it's huge, yes,
> you can do a Range request to get the final 4 bytes or whatever.


Ok. Does go-team have plans to satisfy ReaderAt interface for response
body and translate to range request automatic?

No. That's probably too specific for the standard library.

I did a bunch of that sort of stuff for dl.google.com.  See http://talks.golang.org/2013/oscon-dl.slide#51 and that whole slide deck.

ReaderAt access patterns are often dumb enough that you'll want to do some minimal amount of caching on your side, like maybe 256KB aligned chunks minimum, and always hold on to the last few, just in case the caller reads it again right afterwards. (like the archive/zip reader reading the final few bytes, and then backing up to read the TOC, which is likely in the same 256KB, so you might as well do it all in one HTTP request if they're going to be relatively high-latency)





Vasiliy Tolstov

unread,
Jan 20, 2015, 5:38:55 AM1/20/15
to Brad Fitzpatrick, Michael Gehring, golang-nuts
2015-01-20 0:58 GMT+03:00 Brad Fitzpatrick <brad...@golang.org>:
> No. That's probably too specific for the standard library.
>
> I did a bunch of that sort of stuff for dl.google.com. See
> http://talks.golang.org/2013/oscon-dl.slide#51 and that whole slide deck.
>
> ReaderAt access patterns are often dumb enough that you'll want to do some
> minimal amount of caching on your side, like maybe 256KB aligned chunks
> minimum, and always hold on to the last few, just in case the caller reads
> it again right afterwards. (like the archive/zip reader reading the final
> few bytes, and then backing up to read the TOC, which is likely in the same
> 256KB, so you might as well do it all in one HTTP request if they're going
> to be relatively high-latency)


THanks.

Vasiliy Tolstov

unread,
Jan 20, 2015, 5:44:07 AM1/20/15
to Vasiliy Tolstov, golang-nuts
2015-01-19 17:53 GMT+03:00 Vasiliy Tolstov <v.to...@selfip.ru>:
> Does it possible to get uncompressed file size in compress/gzip?


Another question - does compress/zip compressor have ability to use
not only one cpu ?

Matt Harden

unread,
Jan 21, 2015, 9:13:28 PM1/21/15
to Vasiliy Tolstov, golang-nuts
On Tue Jan 20 2015 at 4:44:05 AM Vasiliy Tolstov <v.to...@selfip.ru> wrote:
2015-01-19 17:53 GMT+03:00 Vasiliy Tolstov <v.to...@selfip.ru>:
> Does it possible to get uncompressed file size in compress/gzip?


Another question - does compress/zip compressor have ability to use
not only one cpu ?

Vasiliy Tolstov

unread,
Jan 22, 2015, 2:09:52 AM1/22/15
to Matt Harden, golang-nuts
2015-01-22 5:13 GMT+03:00 Matt Harden <matt....@gmail.com>:
> No, but see this previous post -
> https://groups.google.com/d/msg/golang-nuts/5DxiA0qxahI/BXW8BqaGfNAJ


Thanks, but i'm happy with pgzip by https://github.com/klauspost/pgzip#Header
Reply all
Reply to author
Forward
0 new messages