Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Exception in thread "main" java.util.zip.ZipException: Not in GZIP format

4,076 views
Skip to first unread message

takoua...@gmail.com

unread,
Oct 23, 2014, 4:52:02 AM10/23/14
to
Good Moorning,

I'm trying in my program to Compress/Decompress data using GZIP streams and when using the charset "ISO-8859-1", everything working well but when changing the charset to "UTF-8", i'm getting the Error message "Exception in thread "main" java.util.zip.ZipException: Not in GZIP format".
this is my code:

public static String compress(String str) throws IOException {
if (str == null || str.length() == 0) {
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
String outStr = out.toString("UTF-8");
System.out.println("Output String lenght : " + outStr.length());
System.out.println("Output : " + outStr.toString());
return outStr;
}

public static String decompress(String str) throws IOException {
if (str == null || str.length() == 0) {
return str;
}
System.out.println("Input String length : " + str.length());
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes("UTF-8")));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
String outStr = "";
String line;
while ((line=bf.readLine())!=null) {
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
return outStr;
}

public static void main(String[] args) throws IOException {


String string = "my data";
System.out.println("after compress:");
String compressed = compress(string);
System.out.println(compressed);
System.out.println("after decompress:");
String decomp = decompress(compressed);
System.out.println(decomp);

}


Please can you people help me find a solutioN??

Martin Gregorie

unread,
Oct 23, 2014, 7:25:26 AM10/23/14
to
On Thu, 23 Oct 2014 01:51:51 -0700, takouarnauld wrote:

> Good Moorning,
>
> I'm trying in my program to Compress/Decompress data using GZIP streams
> and when using the charset "ISO-8859-1", everything working well but
> when changing the charset to "UTF-8", i'm getting the Error message
> "Exception in thread "main" java.util.zip.ZipException: Not in GZIP
> format".
>
I'd say that the success of ZIP compression depends on the textual
content of the UTF8 document that you're trying to decompress and that
ZIP encoding uses characters as delimiters, etc that are not used by
ECMA-94 encodings but that *are* valid UTF8 encodings. As long as the doc
uses only characters that can be represented by ASCII or ECMA-94, which
includes ISO-8859-1/2/3/4 as part of its specification, then it should
work just fine.

However, ECMA-94 leaves gaps (0x00-0x1F [control characters], 0x7F
[control character] and 0x80-0x9F). UTF8 is the same as ASCII/ECMA-84
over the range 0x00-0x7F but uses the whole of the range 0x80-0xFF except
for 0xC0, 0xC1 and 0xF5-0xFF which are all invalid, so its entirely
possible for valid UTF8 text to include standard ZIP delimiters and other
control characters.

As it happens, the ZIP authors have thought of this and there's a
workround you've missed. See:

http://www.java2s.com/Questions_And_Answers/Java-File/Zip/encoding.htm

Point 4 says exactly what you need to do.

GIYF - this popped straight out of an IXQuick query for "encode UTF8 ZIP
file"


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

Steven Simpson

unread,
Oct 23, 2014, 9:45:15 AM10/23/14
to
On 23/10/14 09:51, takoua...@gmail.com wrote:
> I'm trying in my program to Compress/Decompress data using GZIP streams and when using the charset "ISO-8859-1", everything working well but when changing the charset to "UTF-8", i'm getting the Error message "Exception in thread "main" java.util.zip.ZipException: Not in GZIP format".
> this is my code:

> ByteArrayOutputStream out = new ByteArrayOutputStream();
...
> String outStr = out.toString("UTF-8");

I printed out the bytes from out.toByteArray() and outStr.getBytes("UTF-8"):

Bytes from gzip:
0000: 1F 8B 08 00 00 00 00 00 00 00 CB AD 54 48 49 2C
0010: 49 04 00 29 AD 56 16 07 00 00 00
Bytes as string:
0000: 1F EF BF BD 08 00 00 00 00 00 00 00 CB AD 54 48
0010: 49 2C 49 04 00 29 EF BF BD 56 16 07 00 00 00

The interpretation of the bytes from gzip as UTF-8 fails because 8B 08
isn't a valid UTF-8 sequence. The conversion seems to deal with this by
replacing 8B with EF BF BD, which is the UTF-8 encoding for U+FFFD
(REPLACEMENT CHARACTER). It does the same with AD later.


--
ss at comp dot lancs dot ac dot uk

Mike Amling

unread,
Oct 23, 2014, 10:30:11 AM10/23/14
to
On 10/23/14 3:51 AM, takoua...@gmail.com wrote:
> Good Morning,
>
> I'm trying in my program to Compress/Decompress data using GZIP streams and when using the charset "ISO-8859-1", everything working well but when changing the charset to "UTF-8", i'm getting the Error message "Exception in thread "main" java.util.zip.ZipException: Not in GZIP format".
> this is my code:
>
> public static String compress(String str) throws IOException {
> if (str == null || str.length() == 0) {
> return str;
> }
> System.out.println("String length : " + str.length());
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> GZIPOutputStream gzip = new GZIPOutputStream(out);
> gzip.write(str.getBytes());
> gzip.close();
> String outStr = out.toString("UTF-8");

You have no reason to believe that the byte array output from GZIP is
the UTF-8 encoding of a String. <a
href="http://docs.oracle.com/javase/8/docs/api/java/io/ByteArrayOutputStream.html#toString-java.lang.String-">Java
doc for ByteArrayOutputStream.toString(String encoding)</a> says that
"This method always replaces malformed-input and unmappable-character
sequences with this charset's default replacement string". After that
replacement, you can never recover the original contents of the byte
array from the resultant String.

--Mike Amling
SWYgeW91IHdhbnQgdG8gcmVwcmVzZW50IGFuIGFyYml0cmFyeSBieXRlIGFycmF5IGFzIGEgU3Ry
aW5nLCB1c2UgaGV4IG9yIGJhc2UgNjQu
0 new messages