Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Newbie Character encoding problem with XML::Parser

1 view
Skip to first unread message

Paul Hovnanian P.E.

unread,
Mar 12, 2010, 11:26:12 PM3/12/10
to
Well, not really a newbie.

I resurrected an old tool I had written a year or so ago that parses XML
documents (using XML::Parser) and displays some structure data in
various Tk widgets (including a Tk::Text window).

Back when I wrote it, it worked just fine (on Perl 5.6.1, XML::Parser
ver 2.31, libexpat 0.1.0). Fine being defined as XML::Parser and the
underlying expat lib not messing with character entities. So a dash,
encoded as – in the source document, would be displayed as –
in the Tk::Text widget and eventually saved as –.

So now I move my program to a new platform (Perl 5.8.0, XML::Parser ver
2.31, libexpat 1.5.0). Now it (I've verified that its either XML::Parser
or expat) rewtiting the character entities to something. – is being
rewritten
to x96, which appears in Tk as Â-1 <-wierd characters..

I can trap the output of the Tk widgets and translate them back to
chcracter entities. But I'd rather find a way to stop it. Any way to
make the new Perl/expat operate like the old one? The XML::Parser::Expat
options 'NoExpand' and 'ProtocolEncoding' don't seem to affect this
behavior.

Any ideas?
--
Paul Hovnanian mailto:Pa...@Hovnanian.com
------------------------------------------------------------------
Ask not for whom the <CONTROL-G> tolls.

0 new messages