Disabling XML character escaping for to_xml

188 views
Skip to first unread message

Nate Wiger

unread,
Oct 14, 2008, 5:00:55 PM10/14/08
to Ruby on Rails: Talk
Currently, it appears to_xml will automatically escape any entities
into their corresponding &XXX representation. There's a piece in the
documentation that says "If $KCODE is set to u and encoding set to
UTF8, then escaping will NOT be performed."

Unfortunately, this doesn't appear to be the case. Even after
following the docs and ensuring that default_charset is indeed UTF-8
(actually the default for Rails nowadays), we still get encoded
characters in to_xml output.

Since our client is UTF-8 aware, we need to pass thru the UTF-8 data
intact. The only way we've found to do this is thru the following
horrible monkey-patch:

module Builder
class XmlBase
def _escape(text)
text
end
end
end

What's the proper way to do this?

Thanks,
Nate

pru...@gmail.com

unread,
Oct 16, 2008, 1:11:39 PM10/16/08
to Ruby on Rails: Talk
I had the same issue, but eventually putting

$KCODE='UTF8'

in my config/environment.rb solved the issue.

Greetings,

Wouter

pru...@gmail.com

unread,
Oct 16, 2008, 3:23:30 PM10/16/08
to Ruby on Rails: Talk
Just deployed to a production server, but it doesn't work there,
although the rails version is the same. Maybe it's the ruby version
(1.8.7 locally and 1.8.6 on the server)

BobiJo

unread,
Oct 21, 2008, 5:28:22 AM10/21/08
to Ruby on Rails: Talk
I have the same issue,
$KCODE='UTF8' by default, but I set it anyway in environment.rb
This didn't solve my problem, I applied the patch and it worked,
It's not the ideal solution, but it gets the job done :)
I've tried the multibyte chars thing and it didn't work eather.

May the source be with you

mcginniwa

unread,
Nov 11, 2008, 5:09:02 PM11/11/08
to Ruby on Rails: Talk
Any word on if this is fixed in Edge/Rails 2.2?

Cheers,
Walter

mcginniwa

unread,
Nov 11, 2008, 5:16:50 PM11/11/08
to Ruby on Rails: Talk
Actually, the monkey patch solution sort of sucks. It turns off ALL
escaping, not just turning off utf to entities escaping.

So this is fine:

<dc:description>māori</dc:description>

but this is not:

<dc:description><p>āēīōū</p>
<p>&nbsp;</p></dc:description>

The html tags SHOULD be escaped, while the unicode characters
shouldn't be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Cheers,
Watler

Frederick Cheung

unread,
Nov 11, 2008, 7:20:23 PM11/11/08
to Ruby on Rails: Talk


On Nov 11, 10:16 pm, mcginniwa <walter.mcgin...@gmail.com> wrote:
> The html tags SHOULD be escaped, while the unicode characters
> shouldn't be.  My work around will simply be to strip out the embedded
> HTML, but this a problem that people should be aware of when using the
> monkey patch.
>
Many moons ago I overrode the String#to_xs method that builder adds to
just escape the vitals (ie &<>'" ) instead of all the extra stuff it
does.

Fred

> Cheers,
> Watler

Walter McGinnis

unread,
Nov 13, 2008, 11:16:59 PM11/13/08
to rubyonra...@googlegroups.com
Yeah, I ended up doing that basically, but in some specific helpers.  My coworker refined it though using the htmlentities plugin.  You can see it here:


Long term we may do this for all the xml values, not just our dc:description element.  So it might move up to monkey patching builder or more general spot or something.

Cheers,
Walter
Reply all
Reply to author
Forward
0 new messages