It seems like (from the "Builtin HTML/XML escaping via ASCII encoding" section in [1]) the best way to go about this would be to convert each of these problematic characters into their corresponding XML character reference via:
>>> problematicUnicode.encode('ascii', 'xmlcharrefreplace')
I now want to add this text as the text child of an element using structwriter, but it looks like when I do this:
w = structwriter(indent=u"yes", stream=..stream..)
w.feed(
ROOT(
.. snip ..
E(u'Text' , sanskritText.encode('ascii', 'xmlcharrefreplace'))
.. snip ..
)
)
And then serialize the stream, the XML character references are escaped. I.e., I get:
<Text>When Avalokite&#347;vara Bodhisattva was practicing the profound Praj&#241;&#257;p&#257;ramit&#257;, he illuminated the Five Skandhas and saw that they were all empty, and crossed over all suffering and affliction.</<Text>
Am I going about this the right way. If not, how can I achieve what I want with structwriter: to add text content, via E(.., .. text with XML char entity refs.. ) so that the entity references are not escaped when the stream is serialized? I can certainly replace all '&' with '&' in the resulting string, but I'm guessing there is a more elegant way to do this.
Thanks