encoding/xml: namespace support

1,341 views
Skip to first unread message

chris.j...@gmail.com

unread,
May 29, 2012, 10:39:43 PM5/29/12
to golan...@googlegroups.com
I'm looking at how to solve issue 3526, which is the lack of namespace support for XML attributes. I have a very simple, 4-line fix that will solve the case that's currently bothering me, but the real problem is a bit larger. According to http://www.w3.org/TR/REC-xml-names/ , each element should have its own [prefix -> URI] map, which inherits from the map of its containing element. The root map, which all others inherit from, comes prepopulated with 2 entries for xml and xmlns itself. The implementation of this in Go seems like it will be straightforward, except for backward compatibility:

In the simplest case, you're dealing with an entire document, so the Encoder or Decoder has a complete hierarchy of namespace maps for the corresponding stack of elements. But in other cases, you're marshaling or unmarshaling in the middle of an XML document, and you need to preload entries into that map. I'd like to modify NewDecoder and NewEncoder to take a new parameter, a map[string]URL or map[string]string. Obviously that would break backward compatibility, so I'm guessing the next best option would be adding accessors to Decoder and Encoder to allow those maps to be modified.

Does this seem like a reasonable approach?

Chris

Kyle Lemons

unread,
May 30, 2012, 11:41:57 AM5/30/12
to chris.j...@gmail.com, golan...@googlegroups.com
I would do the simple thing first.

On Tue, May 29, 2012 at 7:39 PM, <chris.j...@gmail.com> wrote:
I'm looking at how to solve issue 3526, which is the lack of namespace support for XML attributes. I have a very simple, 4-line fix that will solve the case that's currently bothering me, but the real problem is a bit larger. According to http://www.w3.org/TR/REC-xml-names/ , each element should have its own [prefix -> URI] map, which inherits from the map of its containing element. The root map, which all others inherit from, comes prepopulated with 2 entries for xml and xmlns itself. The implementation of this in Go seems like it will be straightforward, except for backward compatibility:

In the simplest case, you're dealing with an entire document, so the Encoder or Decoder has a complete hierarchy of namespace maps for the corresponding stack of elements. But in other cases, you're marshaling or unmarshaling in the middle of an XML document, and you need to preload entries into that map. I'd like to modify NewDecoder and NewEncoder to take a new parameter, a map[string]URL or map[string]string. Obviously that would break backward compatibility, so I'm guessing the next best option would be adding accessors to Decoder and Encoder to allow those maps to be modified.

Exposing the namespaces via the API is probably not the best way to do it.  The real fix probably involves a stack of namespace mappings and helpers to "read through" and "write onto" them, but I think all of that can live inside the xml package and only be exposed through xml.Name fields/tags.

Kyle Lemons

unread,
Jun 3, 2012, 11:48:24 PM6/3/12
to chris.j...@gmail.com, golan...@googlegroups.com
FWIW, I think you should try to find a nice way to manage it from the struct tags instead of adding even more "yuck" fields.  You might be able to get away with something like `xml:"short=http://full/namespace/path"` and (for when a parent element will have already defined that) `xml:"short="`, in which case you would just have to pass a map down recursively containing the URIs and shortnames.  This handles the cases I can think of (including where the tag itself and its attributes have different namespaces), but I am exponentially less likely to touch anything XML the more namespaces are involved, so I may simply be naive.

On Sat, Jun 2, 2012 at 8:50 PM, <chris.j...@gmail.com> wrote:
On Wednesday, May 30, 2012 9:41:57 AM UTC-6, Kyle Lemons wrote:
I would do the simple thing first.

I sent a patch for the simple (not entirely correct) thing, as you saw. I won't be offended if it's rejected; I'm more concerned about the "correct" solution. To that end, here's a proposed API and example tests. A new field would be recognized in any marshalable struct, named XMLNs of type xml.Ns:

// An Ns represents an xmlns prefix-to-namespace mapping.
type Ns struct {
    Prefix, Uri string
}

The example from http://www.w3schools.com/xml/xml_namespaces.asp involves HTML and XML intermixed, with an HTML table and a custom "furniture table" element in the same document. Here are illustrative struct types and test cases. Note that these test cases use values for that field rather than tags on the field; I assume we'd want to support both.

type NsRoot struct {
    XMLName Name `xml:"root"`
    XMLNs   []Ns
    HTable  HtmlTable `xml:"http://www.w3.org/TR/html4/ table"`
    FTable  FurnTable `xml:"http://www.w3schools.com/furniture table"`
}

type HtmlTable struct {
    XMLName Name `xml:"http://www.w3.org/TR/html4/ table"`
    XMLNs   []Ns
    Rows    []HtmlTr `xml:"http://www.w3.org/TR/html4/ tr"`
}

type HtmlTr struct {
    XMLName Name `xml:"http://www.w3.org/TR/html4/ tr"`
    Td      []string `xml:"http://www.w3.org/TR/html4/ td"`
}

type FurnTable struct {
    XMLName Name `xml:"http://www.w3schools.com/furniture table"`
    XMLNs   []Ns
    Name    string
    Width   int
    Length  int
}


    {
        ExpectXML: `<root xmlns:h="http://www.w3.org/TR/html4/"` +
            ` xmlns:f="http://www.w3schools.com/furniture">` +
            `<h:table>`+
            `<h:tr>` +
            `<h:td>Apples</h:td>` +
            `<h:td>Bananas</h:td>` +
            `</h:tr>` +
            `</h:table>` +
            `<f:table>` +
            `<f:name>African Coffee Table</f:name>` +
            `<f:width>80</f:width>` +
            `<f:length>120</f:length>` +
            `</f:table>` +
            `</root>`,
        Value: &NsRoot{XMLNs: []Ns{Ns{Prefix: "h",
            Uri: "http://www.w3.org/TR/html4/"},
            Ns{Prefix: "f",
            Uri: "http://www.w3schools.com/furniture"}},
            HTable: HtmlTable{Rows: []HtmlTr{HtmlTr{Td:
            []string{"Apples", "Bananas"}}}},
            FTable: FurnTable{Name: "African Coffee Table",
            Width: 80, Length: 120},
        },
    },
    {
        ExpectXML: `<root>` +
            `<h:table xmlns:h="http://www.w3.org/TR/html4/">` +
            `<h:tr>` +
            `<h:td>Apples</h:td>` +
            `<h:td>Bananas</h:td>` +
            `</h:tr>` +
            `</h:table>` +
            `<f:table xmlns:f="http://www.w3schools.com/furniture">` +
            `<f:name>African Coffee Table</f:name>` +
            `<f:width>80</f:width>` +
            `<f:length>120</f:length>` +
            `</f:table>` +
            `</root>`,
        Value: &NsRoot{HTable: HtmlTable{XMLNs: []Ns{Ns{Prefix: "h",
            Uri: "http://www.w3.org/TR/html4/"}},
            Rows: []HtmlTr{HtmlTr{Td:
            []string{"Apples", "Bananas"}}}},
            FTable: FurnTable{XMLNs: []Ns{Ns{Prefix: "f",
            Uri: "http://www.w3schools.com/furniture"}},
            Name: "African Coffee Table",
            Width: 80, Length: 120},
        },
    },
    {
        ExpectXML: `<root>` +
            `<table xmlns="http://www.w3.org/TR/html4/">` +
            `<tr>` +
            `<td>Apples</td>` +
            `<td>Bananas</td>` +
            `</tr>` +
            `</table>` +
            `<table xmlns="http://www.w3schools.com/furniture">` +
            `<name>African Coffee Table</name>` +
            `<width>80</width>` +
            `<length>120</length>` +
            `</table>` +
            `</root>`,
        Value: &NsRoot{HTable: HtmlTable{Rows: []HtmlTr{HtmlTr{Td:
            []string{"Apples", "Bananas"}}}},
            FTable: FurnTable{Name: "African Coffee Table",
            Width: 80, Length: 120},
        },
    },


Nigel Tao

unread,
Nov 25, 2012, 10:44:54 PM11/25/12
to chris.j...@gmail.com, golang-nuts
On Mon, Nov 26, 2012 at 2:04 PM, <chris.j...@gmail.com> wrote:
> 1. We can't add a new Prefix field to xml.Name, because it breaks source
> compatibility. Programs written with xml.Name{"space", "local"} rather than
> xml.Name{Space: "space", Local: "local"} will break.

Adding a Prefix field isn't necessarily forbidden.
http://golang.org/doc/go1compat.html says "For the addition of
features in later point releases, it may be necessary to add fields to
exported structs in the API". The "go vet" tool looks for untagged
struct literals exactly for this reason.

Chris Jones

unread,
Nov 26, 2012, 11:18:54 AM11/26/12
to Nigel Tao, chris.j...@gmail.com, golang-nuts
Oh. Well, that's encouraging. I'll have to think about this a bit.

Chris

Chris Jones

unread,
Nov 29, 2012, 12:42:18 AM11/29/12
to Nigel Tao, chris.j...@gmail.com, golang-nuts
On 11/25/2012 8:44 PM, Nigel Tao wrote:
I think my original assertion was wrong: Adding a Prefix field to
xml.Name won't address the other problems. What we really need is a way
to provide a namespace context to the Marshaler and Unmarshaler. This
could be done by adding an argument to Encoder.Encode() and
Decoder.Decode() (or more precisely by adding new overloads of those
methods with an extra parameter each). It could also be done by exposing
a public field on Encoder and Decoder which would contain the namespace
context. The private context object in my patch is very nearly the right
thing; it should have the current namespace, and a map of
prefix:namespace which the caller could modify.

Adding a context field to Encoder and Decoder should be a
straightforward change to my patch. I'm curious if others feel it's a
good direction also.

Chris

Joel Reymont

unread,
Feb 25, 2013, 11:43:29 AM2/25/13
to golan...@googlegroups.com, Nigel Tao, chris.j...@gmail.com, ch...@cjones.org
I see that the code review was closed at


and that the issue was 'accepted'


I don't see the patch in the go source tree, though.

What is the status of the patch?

Chris Jones

unread,
Feb 25, 2013, 11:56:04 AM2/25/13
to Joel Reymont, golan...@googlegroups.com, Nigel Tao, chris.j...@gmail.com
The CL moved to https://codereview.appspot.com/6868044 (due to my confusion with Go's Hg extension). As far as I know, it's still under review. I will happily accept suggestions, bug reports, and test cases.

Chris
Reply all
Reply to author
Forward
0 new messages