I have a problem with XML: AFAIK XML allows umlaut and other special
characters as sharp s (ß ö etc.). However, if I try to write
something like this into the XML-document I get this exception
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1addac)
was found in the element content of the document.
at MyXML.deserialize(MyXML.java:204)
at TestClass.main(TestClass.java:80)
Although the exception occurs when de-serializing the document it the error
is in the serializing part, because e.g. IE5 and NS6 cannot read the file,
and the umlaut is replaced by "?" when viewing in NS6 and IE5, the plain
text file correctly contains the umlaut-string.
I don't know why it occurs... I'm using XML4J 3.1.0 (IBM alphaworks) which
is based on Apache Xerces, encoding of the file is "UTF-8" (the default one,
UTF-16 doesn't seem to be supported), serialization is done by this code,
where d is the document
OutputFormat format = new OutputFormat(d);
format.setLineSeparator("\n");
Writer out = new StringWriter();
XMLSerializer serial = new XMLSerializer( out, format );
try
{
serial.asDOMSerializer();
serial.serialize(d);
}
catch(IOException ex) { throw new XMLException("An error occured while
serializing the document, original exception was:\n"+ex.toString());}
String s=out.toString();
Please help asap, any comments appreciated and thanks in advance
Messi
Greetings
Messi
It should work with "ISO-8859-1".
hth
Henrik
--
Was sich überhaupt sagen läßt, läßt sich klar sagen.
Wovon man nicht sprechen kann, darüber muß man schweigen.
-- L. Wittgenstein
>Although the exception occurs when de-serializing the document it the error
>is in the serializing part, because e.g. IE5 and NS6 cannot read the file,
>and the umlaut is replaced by "?" when viewing in NS6 and IE5, the plain
>text file correctly contains the umlaut-string.
ISO-8859-1: ö -> 0xFC
UTF-8: ö -> 0xC3 0xB6
Use <?xml version='1.0' encoding="iso-8859-1"?> and see it works. Your
files are not properly UTF-8 encoded.
--
Björn Höhrmann ^ mailto:bjo...@hoehrmann.de ^ http://www.bjoernsworld.de
am Badedeich 7 ° Telefon: +49(0)4667/981ASK ° http://bjoern.hoehrmann.de
25899 Dagebüll # PGP Pub. KeyID: 0xA4357E78 # http://learn.to/quote [!]e
"It may be those who do most, dream most." -- Stephen Leacock
cu
Messi
"Henrik Motakef" <henrik....@ruhr-uni-bochum.de> wrote in message
news:pani19...@adorno.iaw.ruhr-uni-bochum.de...
Wo denn das?
> Sollte doch auch mit UTF-8 möglich sein, oder? Schließlich ist das unicode!
Wenn dein Dokument UTF-8-Codiert ist, kannst du auch UTF-8 als
Codierung angeben. Bei einem ISO-8859-1-Codierten Dokument empfielt
sich ISO-8859-1.
<translation_for_non-krauts quality="low">
> I don't like using ISO-8859-1... i somewhere read it'd be "discouraged"
Who told you this?
> It should be possible with UTF-8, shouldn't it? After all, it's unicode!
If your document is encoded in UTF-8, you may choose UTF-8 as
encoding. For ISO-8859-documents ISO-8859-1 is the better choice.
</translation_for_non-krauts>
Try this header:
<?xml version="1.0" encoding="ISO-8859-1"?>
geronimo