Chinese characters become to &#3342 after "to_xml"

2 views
Skip to first unread message

Datou

unread,
May 14, 2007, 1:25:28 AM5/14/07
to Ruby on Rails: Talk
Why?

Browser can translate &#xxxx; into Chinese, but I want to get the real
UTF-8 Chinese characters.

Please kindly help, thanks!

eden li

unread,
May 14, 2007, 11:24:09 PM5/14/07
to Ruby on Rails: Talk
It's built into Builder::XmlMarkup. You can run a transform on the
returned XML to un-escape characters that are encoded into numbers.
This assumes that all characters from emitted from #to_xml were
Unicode.

def unescape_numeric_entities(xml)
xml.gsub(/&#(\d+);/) do |c|
[$1.to_i].pack("U") rescue c
end
end

unescape_numeric_entities(record.to_xml)

Datou

unread,
May 15, 2007, 3:44:02 AM5/15/07
to Ruby on Rails: Talk
Thank you very much!
Reply all
Reply to author
Forward
0 new messages