Creating structwriter text content with XML character entity references

0 views
Skip to first unread message

Chimezie Ogbuji

unread,
Sep 2, 2012, 5:59:00 PM9/2/12
to akar...@googlegroups.com
I'm using structwriter to produce XML where some of the elements have text child nodes with XML character entity references.  In particular, the original text includes utf-8 encoded Sanskrit characters and I want the XML to be transformed - by XSLT - into XHTML that will render the Sanskrit characters properly in the browser.  

It seems like (from the "Builtin HTML/XML escaping via ASCII encoding" section in [1]) the best way to go about this would be to convert each of these problematic characters into their corresponding XML character reference via:

>>> problematicUnicode.encode('ascii', 'xmlcharrefreplace')

I now want to add this text as the text child of an element using structwriter, but it looks like when I do this:

w = structwriter(indent=u"yes", stream=..stream..)
w.feed(
    ROOT( 
        .. snip ..
        E(u'Text' , sanskritText.encode('ascii', 'xmlcharrefreplace'))
        .. snip ..
    )
)

And then serialize the stream, the XML character references are escaped.  I.e., I get:

<Text>When Avalokite&amp;#347;vara Bodhisattva was practicing the profound Praj&amp;#241;&amp;#257;p&amp;#257;ramit&amp;#257;, he illuminated the Five Skandhas and saw that they were all empty, and crossed over all suffering and affliction.</<Text>

Am I going about this the right way.  If not, how can I achieve what I want with structwriter: to add text content, via E(.., .. text with XML char entity refs.. ) so that the entity references are not escaped when the stream is serialized? I can certainly replace all '&amp;' with '&' in the resulting string, but I'm guessing there is a more elegant way to do this.

Thanks

Reply all
Reply to author
Forward
0 new messages