XML Parser returning '?' instead of ’ in chars node

63 views

Skip to first unread message

David Saunders

unread,

Nov 18, 2013, 10:24:35 PM11/18/13

to intersystems...@googlegroups.com

I am pulling data from an HTML document that I pulled off the internet. It's working great except for when it comes across a unicode character. So as I pulled the HTML content of the web pages, I ran through them one character at a time and replaced any character greater than 127 to &#nnn; in the file.
So my file has this;

<span class="style-b">and he said: "I do not know. Am I my brother’s guardian?"</span>

However, when I am reading the file via the %XML.TextReader, where this is the value of a "chars" node, it breaks the sentence into two pieces;
and he said: "I do not know. Am I my brother

and;
?s guardian?"

Why is it breaking up the text and why is it converting ’ into a question mark? If anything, it should be converting it to $C(8217). That would work, or leave it the way it is. Is there something I can do to address it? I read through the documentation on the text reader and it has some stuff about validation but a really didn't understand it.

I appreciate any help you can give me.

Oh, I am using Cache 2013.1.1 on a Windows XP machine.

DAiMor

unread,

Nov 22, 2013, 7:16:50 AM11/22/13

to intersystems...@googlegroups.com

Your Cache installation Uncode or 8-bit ?

USER>w $zcvt("I do not know. Am I my brother’s guardian?","I","HTML")

I do not know. Am I my brother’s guardian?

USER>w $zv

Cache for Windows (x86-64) 2014.1 (Build 516U) Wed Oct 16 2013 19:11:07 EDT

вторник, 19 ноября 2013 г., 7:24:35 UTC+4 пользователь David Saunders написал:

Reply all

Reply to author

Forward

0 new messages

XML Parser returning '?' instead of &#8217; in chars node

David Saunders

DAiMor

XML Parser returning '?' instead of ’ in chars node