I have come back to a project I started a couple of months ago that
depends on multi-part gzip compression[1] and have got to the stage of
integrating the data handler with compression back end.
I seem to be having an issue with reading members subsequent to the
first, although this passes in the tests for the blocked gzip
compressor.
I suspect that the issue is that the way that I'm finishing members is
incorrect: I'm writing a whole gzip for each member[2]. This looks to me
like it should be correct from RFC1952, but the problems in my package
occur at member boundaries indicating it's not.
The relevant sections of RFC1952[3] are below.
Can anyone see what I am doing wrong?
thanks
Dan
[1]
https://groups.google.com/d/topic/golang-nuts/VFfzYiI2rDc
[2]
http://code.google.com/p/biogo/source/browse/bgzf/bgzf.go?repo=bam#108
[3]
http://www.ietf.org/rfc/rfc1952.txt
2.2. File format
A gzip file consists of a series of "members" (compressed data
sets). The format of each member is specified in the following
section. The members simply appear one after another in the file,
with no additional information before, between, or after them.
2.3. Member format
Each member has the following structure:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
(if FLG.FEXTRA set)
+---+---+=================================+
| XLEN |...XLEN bytes of "extra field"...| (more-->)
+---+---+=================================+
(if FLG.FNAME set)
+=========================================+
|...original file name, zero-terminated...| (more-->)
+=========================================+
(if FLG.FCOMMENT set)
+===================================+
|...file comment, zero-terminated...| (more-->)
+===================================+
(if FLG.FHCRC set)
+---+---+
| CRC16 |
+---+---+
+=======================+
|...compressed blocks...| (more-->)
+=======================+
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| CRC32 | ISIZE |
+---+---+---+---+---+---+---+---+
2.3.1.1. Extra field
If the FLG.FEXTRA bit is set, an "extra field" is present in
the header, with total length XLEN bytes. It consists of a
series of subfields, each of the form:
+---+---+---+---+==================================+
|SI1|SI2| LEN |... LEN bytes of subfield data ...|
+---+---+---+---+==================================+
SI1 and SI2 provide a subfield ID, typically two ASCII letters
with some mnemonic value. Jean-Loup Gailly
<
gz...@prep.ai.mit.edu> is maintaining a registry of subfield
IDs; please send him any subfield ID you wish to use. Subfield
IDs with SI2 = 0 are reserved for future use. The following
IDs are currently defined:
SI1 SI2 Data
---------- ---------- ----
0x41 ('A') 0x70 ('P') Apollo file type information
LEN gives the length of the subfield data, excluding the 4
initial bytes.