Marshal XML Mixed Content

1,028 views
Skip to first unread message

Scott

unread,
Oct 11, 2016, 1:19:53 AM10/11/16
to golang-nuts
I'm trying to Marshal XML where an element has mixed content: https://www.w3.org/TR/REC-xml/#sec-mixed-content

I tried just using []interface{} but if I put in just a string, Marshal surrounds each string with the name of the slice:


I'm trying to get the output to be:
  <root>
      <element1>foo</element1>
      hello
      <element2>bar</element2>
      world
  </root>

Any ideas?
Thanks,
Scott

Konstantin Khomoutov

unread,
Oct 11, 2016, 7:49:34 AM10/11/16
to Scott, golang-nuts
Yes. "\nhello\n" and "\nworld\n" are what is called "character data"
in XML parlance. So if you go with the standard approach to marshaling
data to XML -- via a struct type with properly annotated fields --
you should annotate the fields for your character data chunks with the
",chardata" modifiers.

See the docs on encoding/xml.Marshal function for more info.

Here's a working example:

--------8<--------
package main

import (
"bytes"
"encoding/xml"
"fmt"
)

type E struct {
XMLName struct{} `xml:"root"`
E1 string `xml:"element1"`
A string `xml:",chardata"`
E2 string `xml:"element2"`
Z string `xml:",chardata"`
}

func main() {
e := E{
E1: "foo",
A: "hello",
E2: "bar",
Z: "world",
}

var b bytes.Buffer
enc := xml.NewEncoder(&b)
enc.Indent("", "\t")

err := enc.Encode(&e)
if err != nil {
panic(err)
}
err = enc.Flush()
if err != nil {
panic(err)
}

fmt.Println(b.String())
}
--------8<--------

Playground link: <https://play.golang.org/p/pWaYmOT675>

Please note that whitespace is only insignificant in XML where *you*
think it is (that is, there's no inherent semantics of it in the XML
spec. So you should be aware that in your example your character data
chunks are not "hello" and "world" but rather

LF SP SP SP SP "hello" LF

and

LF SP SP SP SP "world" LF

, respectively (provided linebreaks are sole LFs)

So if you really want those bits of whitespace to be present in the
resulting XML document you have to make sure you embed them to your
fields annotated with ",chardata".

Hope this helps.

sm...@brillig.org

unread,
Oct 11, 2016, 1:00:51 PM10/11/16
to golang-nuts, gr8w...@gmail.com
Sorry, I should have been more clear.  The reason I used a slice was because I need an arbitrary number of elements. So I can't just use a static struct with the chardata tags.

sm...@brillig.org

unread,
Oct 11, 2016, 1:13:03 PM10/11/16
to golang-nuts, gr8w...@gmail.com, sm...@brillig.org
I figured out a solution:


I created a type that wraps string and implements the xml.Marshaler interface.  Then I just cast my strings to that type.

Hey go team, any chance this could be made the default behavior for xml.CharData types so a single cast would do the job?

Konstantin Khomoutov

unread,
Oct 11, 2016, 1:33:21 PM10/11/16
to sm...@brillig.org, golang-nuts, gr8w...@gmail.com
On Tue, 11 Oct 2016 10:00:33 -0700 (PDT)
sm...@brillig.org wrote:

[...]
> > > I'm trying to Marshal XML where an element has mixed content:
> > > https://www.w3.org/TR/REC-xml/#sec-mixed-content
> > >
> > > I tried just using []interface{} but if I put in just a string,
> > > Marshal surrounds each string with the name of the slice:
[...]
> > Yes. "\nhello\n" and "\nworld\n" are what is called "character
> > data" in XML parlance. So if you go with the standard approach to
> > marshaling data to XML -- via a struct type with properly annotated
> > fields -- you should annotate the fields for your character data
> > chunks with the ",chardata" modifiers.
> >
> > See the docs on encoding/xml.Marshal function for more info.
[...]
> Sorry, I should have been more clear. The reason I used a slice was
> because I need an arbitrary number of elements. So I can't just use a
> static struct with the chardata tags.

OK, doable with custom marshaler code:

----------------8<----------------
package main

import (
"encoding/xml"
"fmt"
"os"
)

type Elements []interface{}

func (es Elements) MarshalXML(e *xml.Encoder, start xml.StartElement) (err error) {
for _, v := range es {
if s, ok := v.(string); ok {
err = e.EncodeToken(xml.CharData([]byte(s)))
if err != nil {
break
}
continue
}
err = e.Encode(v)
if err != nil {
break
}
}
return
}

func main() {
type Root struct {
XMLName xml.Name `xml:"root"`
Elements Elements
}

type E1 struct {
XMLName xml.Name `xml:"element1"`
Source string `xml:",chardata"`
}

type E2 struct {
XMLName xml.Name `xml:"element2"`
Source string `xml:",chardata"`
}

var doc = &Root{
Elements: Elements{
&E1{Source: "foo"}, "hello",
&E2{Source: "bar"}, "world"},
}

output, err := xml.MarshalIndent(doc, " ", " ")
if err != nil {
fmt.Printf("error: %v\n", err)
}

os.Stdout.Write(output)
}
----------------8<----------------

Playground link: <https://play.golang.org/p/twSkrZIVY9>

Konstantin Khomoutov

unread,
Oct 11, 2016, 1:33:58 PM10/11/16
to sm...@brillig.org, golang-nuts, gr8w...@gmail.com
On Tue, 11 Oct 2016 10:12:50 -0700 (PDT)
sm...@brillig.org wrote:

[...]
> >> > I'm trying to Marshal XML where an element has mixed content:
> >> > https://www.w3.org/TR/REC-xml/#sec-mixed-content
> >> >
> >> > I tried just using []interface{} but if I put in just a string,
> >> > Marshal surrounds each string with the name of the slice:
[...]
> >> Yes. "\nhello\n" and "\nworld\n" are what is called "character
> >> data" in XML parlance. So if you go with the standard approach to
> >> marshaling data to XML -- via a struct type with properly
> >> annotated fields -- you should annotate the fields for your
> >> character data chunks with the ",chardata" modifiers.
[...]
> > Sorry, I should have been more clear. The reason I used a slice
> > was because I need an arbitrary number of elements. So I can't just
> > use a static struct with the chardata tags.
> >
> I figured out a solution:
>
> https://play.golang.org/p/sWR1hAumYh
>
> I created a type that wraps string and implements the xml.Marshaler
> interface. Then I just cast my strings to that type.

BTW you could just have your Elements field a literal as in your
original example: <https://play.golang.org/p/6hQsPUycum>
Reply all
Reply to author
Forward
0 new messages