Look closely at the docs for writeUTF and you will find that it
also writes a 2-byte binary length indicator at the front. I guess
this is the problem. I suggest that you use an OutputStreamWriter
instead, like this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(baos);
out.write(text_input);
Steve
The U in UTF stands for 'Unicode', so you want to convert Unicode to Unicode.
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> DataOutputStream dataOut = new DataOutputStream(out);
> dataOut.writeUTF(text_input);
The first problem here is that writeUTF8() does /NOT/ write UTF-8. It's an
incredibly, unbelievably, stupidly, misleadingly-named method. What it does is
write a two-byte character count (as Steve has already mentioned) followed by
some bytes that represent the string in a format that is (conceptually) related
to, but completely incompatible with, UTF-8.
UTF-8 is a a way of taking a stream/string of Unicode characters (and Java
Strings can be viewed as such, although the correspondence is not as close as
it looks), and representing them as bytes in a binary stream or similar. In
Java that conversion is ultimately provided by a "charset", specifically the
one named "UTF-8". Probably the easiest way for you to use that would be
either to ask your String for its
aString.getBytes("UTF-8");
or to use an OutputStreamWriter constructed with a 'charsetname' of "UTF-8".
-- chris
Since UTF-8 was explicitly requested, that should be:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(baos, "UTF-8");
out.write(text_input);
--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Thanks to your code-snippet and with the getEncoding()-method of the
OutputStreamWriter I found out that the encoding that is apparently
being used inside the JTextPane is "Cp1252".
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(baos);
out.write(input_string);
String encoding = out.getEncoding();
Now I have two - maybe stupid - questions:
1) How is that possible if the sun-documentation about Documents (used
in JTextPanes) reads as follows:
"To support internationalization, the Swing text model uses unicode
characters..." ???
2) how do I get a String out of the OutputStreamWriter as there is no
getText() method available?
Thanks for any help!
Peter
OutputStreamWriter by default uses the *platform* default encoding, not the
Swing default encoding.
> 2) how do I get a String out of the OutputStreamWriter as there is no
> getText() method available?
If you want the string back, you'd get the original input_string back. I
recommend you use input_string.getBytes("UTF-8") instead.
I would like to convert unicode text (coming from a swing JTextPane -
I think that is unicode by default!?) to UTF-8. I tried the code
underneath, but the xml-database I am using still complains about
wrong characters (error message: "Invalid byte 2 of 3-byte UTF-8
sequence").
ByteArrayOutputStream out = new ByteArrayOutputStream();
DataOutputStream dataOut = new DataOutputStream(out);
dataOut.writeUTF(text_input);