Inconsistency in handling of newline characters

4 views
Skip to first unread message

bkl...@rksystems.com

unread,
Feb 25, 2011, 11:16:52 AM2/25/11
to python-...@googlegroups.com
Does it surprise anyone else that both minidom and lxml (possibly using the same parser under the covers) don't treat "\r\n" and "
" as equivalent?  I would have expected, based on http://www.w3.org/TR/REC-xml/#sec-line-ends, to get back "\n" as the value of the text node in both cases.  That's not what happens, however.  If the string is serialized in what the parser takes in as "\r\n" what comes out is "\n" (as I expected), but if it's serialized as "
" it comes out as "\r\n"!  Seems like either a flaw in the parser(s) or in the spec.  If the spec (which I don't claim to have fully understood) really says these two representations of the two-character sequence should be treated differently, I haven't been able to find any rationale for why the line-ending normalization wouldn't operate on the characters represented by either serialization.  Any clues?
Reply all
Reply to author
Forward
0 new messages