Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why does XslCompiledTransform always output UTF-16?

980 views
Skip to first unread message

Big Daddy

unread,
Jan 20, 2009, 3:30:49 PM1/20/09
to
I am doing a transform with the XslCompiledTransform class in C#. My
XSLT starts out like this:

<?xml version="1.0" encoding="iso-8859-1" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />

In the <xsl:output> element, I am specifying that I want the output in
UTF-8, but the output of the transform always starts off like this:

<?xml version="1.0" encoding="utf-16"?>

Why does XslCompiledTransform always output UTF-16? Does it matter?
I am not that familiar with UTF encoding, so maybe it doesn't matter.
I am going to be converting the output of the XML transform (which is
a string) into a zip file (byte array) and sending it over a network,
so I want it to be as small as possible.

thanks in advance,
John

Martin Honnen

unread,
Jan 21, 2009, 7:05:27 AM1/21/09
to
Big Daddy wrote:
> I am doing a transform with the XslCompiledTransform class in C#. My
> XSLT starts out like this:
>
> <?xml version="1.0" encoding="iso-8859-1" ?>
> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
> Transform">
> <xsl:output version="1.0" encoding="UTF-8" indent="yes" />
>
> In the <xsl:output> element, I am specifying that I want the output in
> UTF-8, but the output of the transform always starts off like this:
>
> <?xml version="1.0" encoding="utf-16"?>
>
> Why does XslCompiledTransform always output UTF-16?


It doesn't. Transform to a file or stream and the encoding setting in
the xsl:output is honoured.
I suspect you transform to a StringWriter where .NET strings are always
UTF-16 encoded.

If you still have problems then show us your code using
XslCompiledTransform. If you use overloads of the Transform method like
http://msdn.microsoft.com/en-us/library/ms163431.aspx
or
http://msdn.microsoft.com/en-us/library/ms163437.aspx
then the transformation result written to a file or stream will be
encoded according to the encoding of xsl:output.


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Big Daddy

unread,
Jan 23, 2009, 1:32:40 PM1/23/09
to
On Jan 21, 5:05 am, Martin Honnen <mahotr...@yahoo.de> wrote:
> It doesn't. Transform to a file or stream and the encoding setting in
> the xsl:output is honoured.
> I suspect you transform to a StringWriter where .NET strings are always
> UTF-16 encoded.
>
> If you still have problems then show us your code using
> XslCompiledTransform. If you use overloads of the Transform method likehttp://msdn.microsoft.com/en-us/library/ms163431.aspx
> orhttp://msdn.microsoft.com/en-us/library/ms163437.aspx

> then the transformation result written to a file or stream will be
> encoded according to the encoding of xsl:output.
>

You are correct that I was transforming directly to a StringWriter.
When I changed it to output to a MemoryStream and then to a string, it
was UTF-8. But since I am outputting it to a string eventually any
way, maybe it doesn't matter. If I use the MemoryStream so that it's
UTF-8 and then to a string, would it be any smaller than if I do it to
a StringWriter and then to a string? Either way, after it's a string,
I zip it into a byte array.

thanks
John

Martin Honnen

unread,
Jan 24, 2009, 6:43:02 AM1/24/09
to
Big Daddy wrote:

> You are correct that I was transforming directly to a StringWriter.
> When I changed it to output to a MemoryStream and then to a string, it
> was UTF-8. But since I am outputting it to a string eventually any
> way, maybe it doesn't matter. If I use the MemoryStream so that it's
> UTF-8 and then to a string, would it be any smaller than if I do it to
> a StringWriter and then to a string? Either way, after it's a string,
> I zip it into a byte array.

If you want a byte array then transforming to a MemoryStream and doing
ToArray() on that MemoryStream seems the best approach.

As for using strings, strings in the .NET framework are sequences of
UTF-16 encoded characters so while you can of course change the encoding
in the XML declaration you do not change the encoding of the string.

0 new messages