encoding again

7 views
Skip to first unread message

PaulZH

unread,
Sep 23, 2010, 8:12:42 AM9/23/10
to TopBraid Suite Users
Maybe we have been here before.

In SPARQLMotion, I do a XSLT transform.

The intermediate result taken from the console contains accented chars
as expected.
e.g. "één of meer"

However after saving this as an XML file to the file system, this
becomes:
één of meer

Running the same xslt outside TBC-ME with Saxon and saving to the file
system works as expected.


Paul

Jeremy Carroll

unread,
Sep 23, 2010, 10:07:00 PM9/23/10
to topbrai...@googlegroups.com
Hi Paul

I checked some source code and noticed the following subtlety which I
think is correct, but may be underdocumented.

If using sml:ExportToTextFile then the platform default encoding is used
If using sml:ExportToXMLFile then utf-8 is used

thus I would expect problems if your script uses the former which is not
intended for XML.

If this doesn't help please send me or Scott a sample script.

thanks

Jeremy

PaulZH

unread,
Sep 24, 2010, 10:39:34 AM9/24/10
to TopBraid Suite Users
Hi Jeremy,

I have been doing some more investigation myself.

The difference between the 'direct' saxon version and the indirect one
in SPARQLMotion is that the former
starts with the XML declaration
<?xml version="1.0" encoding="UTF-8"?>
and the SPARQLMotion one doesn't.

Opening the XHTML results in Eclipse using Open with Webbrowser leads
to:
- direct: OK
- indirect without XML declaration: NOK

Opening the files outside Eclipse with Firefox with character encoding
auto-detect:
- direct OK
- indirect OK

Paul

Jeremy Carroll

unread,
Sep 24, 2010, 4:51:16 PM9/24/10
to topbrai...@googlegroups.com
On 9/24/2010 7:39 AM, PaulZH wrote:
> The difference between the 'direct' saxon version and the indirect one
> in SPARQLMotion is that the former
> starts with the XML declaration
> <?xml version="1.0" encoding="UTF-8"?>
> and the SPARQLMotion one doesn't.
>

Oh - how painful!

It looks like perhaps we should have an option to either include or
exclude the XML declaration.

I am clear that omitting the declaration is the correct default.

If one wishes to embed the XML inside some other XML doc, then the
declaration is an error.

The XML spec is clear that the default encoding is UTF-8 and the
declaration is not required, thus, according to the specs, we are
conformant, and perhaps you should be complaining about the next tool in
the chain that is reading it with some other encoding.

TopBraid Suite is committed to working in UTF-8 as much as possible.
(e.g. we do not, and do not intend to, give an option for a different
encoding for XML output).

My preferences are:
1) continue with current default
2) not have an option

If you insist that you really *need* the option, we could put it in -
but to me it seems to complicate TopBraid Suite unnecessarily, and will
add to the confusion that the string

<?xml version="1.0" encoding="UTF-8"?>

is meaningful or required.

Jeremy


PaulZH

unread,
Sep 25, 2010, 9:16:43 AM9/25/10
to TopBraid Suite Users
Jeremy,

It can be solved rather easily.
By adding
<meta http-equiv="Content-Type" content="application/xhtml+xml;
charset=utf-8"/>
in the XHTML output, the built-in browser in Eclipse does the correct
thing.


Paul

Jeremy Carroll

unread,
Sep 25, 2010, 1:32:40 PM9/25/10
to topbrai...@googlegroups.com
On 9/25/2010 6:16 AM, PaulZH wrote:
> Jeremy,
>
> It can be solved rather easily.
> By adding
> <meta http-equiv="Content-Type" content="application/xhtml+xml;
> charset=utf-8"/>
> in the XHTML output, the built-in browser in Eclipse does the correct
> thing.

That's in your code (the XSLT transform) not ours

The underlying problem is that the original HTML guys did not use utf-8,
and this was corrected with XML.
In our XML support we should (IMO) followed the XML spec, not the HTML spec.
If you are using our XML support to generate XHTML then yes you may need
that part.
The best we could do is issue a warning if the output appears to be
XHTML and this is missing ... but we probably need more infrastructure
for enabling and disabling warnings before we go down that route.

Jeremy

Reply all
Reply to author
Forward
0 new messages