REXML UTF-16 trouble

53 views
Skip to first unread message

iga

unread,
Aug 14, 2008, 5:31:12 PM8/14/08
to Ruby on Rails: Talk
I've run into a problem parsing XML with REXML, and it looks like the
problem has to do with UTF-16 encoding and a bug with REXML.

I'm still running OS X 10.4.11, Ruby 1.8.6 (using Locomotive). I even
tried upgrading to the latest version of REXML 3.1.7.3 - no luck. It
still gives errors, only now it's saying the following.

Iconv::InvalidCharacter: ">"

Is this old news to everyone? If so, is there known solution for
this?

I can't be the only person who needs to parse UTF-16 xml in Rails.

Thanks in advance.

Frederick Cheung

unread,
Aug 14, 2008, 5:58:41 PM8/14/08
to rubyonra...@googlegroups.com
Random thought - the XML file claims to be utf 16, but is it really?

Sent from my iPhone

iga

unread,
Aug 14, 2008, 6:09:34 PM8/14/08
to Ruby on Rails: Talk
Good question. How would I know?

If I stash the result straight into a variable and do a "puts" I get
the following…

<?xml version="1.0" encoding="utf-16"?>
<CallStatus><Code>InvalidPassword</Code>
<Success>false</Success>
<Message>Invalid password</Message>
</CallStatus>

Again, REXML chokes on this, but if I change utf-16 to utf-8, no
problem.

On Aug 14, 4:58 pm, Frederick Cheung <frederick.che...@gmail.com>
wrote:

iga

unread,
Aug 14, 2008, 11:29:26 PM8/14/08
to Ruby on Rails: Talk
I have an interim Kludge for now. I'm chopping the BOM piece off of
the XML and then sending it through as XML via REXML. It's working
like a charm now. It's not pretty, but it'll do until I figure out if
this is part of a bigger problem.

# Since this part of the string is always going to be same, slice the
first 39 characters
# Off of it.
bom_string_to_remove = login_result.slice(0, 39)
login_results = login_result.gsub(bom_string_to_remove,'')

So this…
<?xml version="1.0" encoding="utf-16"?>
<CallStatus><Code>InvalidPassword</Code>
<Success>false</Success>
<Message>Invalid password</Message>
</CallStatus>

Becomes this…
<CallStatus><Code>InvalidPassword</Code>
<Success>false</Success>
<Message>Invalid password</Message>
</CallStatus>

And, now I treat it as XML.


On Aug 14, 4:58 pm, Frederick Cheung <frederick.che...@gmail.com>
wrote:

Frederick Cheung

unread,
Aug 15, 2008, 5:35:30 AM8/15/08
to rubyonra...@googlegroups.com

On 14 Aug 2008, at 23:09, iga wrote:

>
> Good question. How would I know?

I'd ogle the bytes via unpack or something like that.

Fred

iga

unread,
Aug 15, 2008, 9:54:50 AM8/15/08
to Ruby on Rails: Talk
Fred,

Thanks, I will give that a try -- eventually, I need this to work
without the workaround.

+bm

On Aug 15, 4:35 am, Frederick Cheung <frederick.che...@gmail.com>
wrote:

Naren Salem

unread,
Nov 8, 2010, 10:01:03 AM11/8/10
to rubyonra...@googlegroups.com
Was this ever solved? How do I load a UTF-16 xml file into REXML?

Thanks!
Naren

--
Posted via http://www.ruby-forum.com/.

Reply all
Reply to author
Forward
0 new messages