zed scripsit:
> To clarify, the input string is ', and the parser normally decodes this
> to an apostrophe in tests.
>
> Is there any other environment variable or difference in underlying
> libraries that would make it *not* decode the ' to ' ?
No. TagSoup always decodes numeric character references, as well as any
of the thousand-odd named character entity references that it understands.
The only references that are left alone are named ones that are not
understood, such as "&#xyz;" or "&##32;".
On the output side, the <, >, &, ", and ' characters are re-encoded
using the built-in character references when required. If the output
encoding is not a UTF, non-ASCII characters are encoded as hex numeric
character references. So TagSoup will never generate "'".
--
Business before pleasure, if not too bloomering long before.
--Nicholas van Rijn
John Cowan <
co...@ccil.org>
http://www.ccil.org/~cowan