XML Parser returning '?' instead of ’ in chars node

63 views
Skip to first unread message

David Saunders

unread,
Nov 18, 2013, 10:24:35 PM11/18/13
to intersystems...@googlegroups.com
I am pulling data from an HTML document that I pulled off the internet. It's working great except for when it comes across a unicode character. So as I pulled the HTML content of the web pages, I ran through them one character at a time and replaced any character greater than 127 to &#nnn; in the file.
So my file has this;

<span class="style-b">and he said: "I do not know. Am I my brother&#8217;s guardian?"</span>

However, when I am reading the file via the %XML.TextReader, where this is the value of a "chars" node, it breaks the sentence into two pieces;
and he said: "I do not know. Am I my brother

and;
?s guardian?"

Why is it breaking up the text and why is it converting &#8217; into a question mark?  If anything, it should be converting it to $C(8217). That would work, or leave it the way it is. Is there something I can do to address it? I read through the documentation on the text reader and it has some stuff about validation but a really didn't understand it.

I appreciate any help you can give me.

Oh, I am using Cache 2013.1.1 on a Windows XP machine.

DAiMor

unread,
Nov 22, 2013, 7:16:50 AM11/22/13
to intersystems...@googlegroups.com

Your Cache installation Uncode or 8-bit ?

USER>w $zcvt("I do not know. Am I my brother&#8217;s guardian?","I","HTML")
I do not know. Am I my brother’s guardian?
USER>w $zv
Cache for Windows (x86-64) 2014.1 (Build 516U) Wed Oct 16 2013 19:11:07 EDT

вторник, 19 ноября 2013 г., 7:24:35 UTC+4 пользователь David Saunders написал:
Reply all
Reply to author
Forward
0 new messages