XML Namespace handling

173 views
Skip to first unread message

Russ Amos

unread,
Mar 11, 2014, 3:40:09 PM3/11/14
to golan...@googlegroups.com
I recently hit an issue with encoding/xml that I am trying to understand. I see this type of issue has come up on golang-nuts@ a few times, but most people hitting it seem to be using struct un/marshaling, whereas I am using the de/encoders.

My use case is I need to read in XML files and spit them back out again, changing a very small part and ignoring the rest. I have something that works on simple examples, but I am hitting problems around XML namespaces. It seems if an element has an "xmlns" attribute, it appears to be parsed correctly by xml.Decoder but is mangled by xml.Encoder on the way out.

I have reduced my issue to a playground example: http://play.golang.org/p/yo0f10Pqus.

PTAL and let me know if I am misusing encoding/xml, if this is a known issue, etc.

Sean Russell

unread,
Mar 13, 2014, 7:34:08 AM3/13/14
to golan...@googlegroups.com
Hi!

On Tuesday, March 11, 2014 3:40:09 PM UTC-4, Russ Amos wrote:
...

My use case is I need to read in XML files and spit them back out again, changing a very small part and ignoring the rest. I have something that works on simple examples, but I am hitting problems around XML namespaces. It seems if an element has an "xmlns" attribute, it appears to be parsed correctly by xml.Decoder but is mangled by xml.Encoder on the way out.

I have reduced my issue to a playground example: http://play.golang.org/p/yo0f10Pqus.
...

One little pedantic thing to start: the test itself, comparing string representations of the XML, is not a valid test.  In particular, the XML specification states that the order of attributes is not significant.  This means that these two documents are equivalent:

<a b="1" c="2"/>
<a c="2" b="1"/>
 
So comparing their string representations tells you nothing about the correctness of the encoder.

However, the encoder is doing something strange with the namespaces which causes the documents it generates to be not equivalent.  In particular, the encoding of decoded tokens is not idempotent. The real problem is that this is causing attributes to be duplicated; this violates an XML specification well-formedness constraint that no attribute names appear more than once in an element.  This is a bug in the encoder, so I've filed 7535.

--- SER
Reply all
Reply to author
Forward
0 new messages