On Thu, 23 Oct 2014 01:51:51 -0700, takouarnauld wrote:
> Good Moorning,
>
> I'm trying in my program to Compress/Decompress data using GZIP streams
> and when using the charset "ISO-8859-1", everything working well but
> when changing the charset to "UTF-8", i'm getting the Error message
> "Exception in thread "main" java.util.zip.ZipException: Not in GZIP
> format".
>
I'd say that the success of ZIP compression depends on the textual
content of the UTF8 document that you're trying to decompress and that
ZIP encoding uses characters as delimiters, etc that are not used by
ECMA-94 encodings but that *are* valid UTF8 encodings. As long as the doc
uses only characters that can be represented by ASCII or ECMA-94, which
includes ISO-8859-1/2/3/4 as part of its specification, then it should
work just fine.
However, ECMA-94 leaves gaps (0x00-0x1F [control characters], 0x7F
[control character] and 0x80-0x9F). UTF8 is the same as ASCII/ECMA-84
over the range 0x00-0x7F but uses the whole of the range 0x80-0xFF except
for 0xC0, 0xC1 and 0xF5-0xFF which are all invalid, so its entirely
possible for valid UTF8 text to include standard ZIP delimiters and other
control characters.
As it happens, the ZIP authors have thought of this and there's a
workround you've missed. See:
http://www.java2s.com/Questions_And_Answers/Java-File/Zip/encoding.htm
Point 4 says exactly what you need to do.
GIYF - this popped straight out of an IXQuick query for "encode UTF8 ZIP
file"
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |