XML unmarshaller doesn't handle whitespace

357 views
Skip to first unread message

Per Persson

unread,
Oct 2, 2017, 6:10:19 AM10/2/17
to golang-nuts
Probably it's best to ask here before I try to contact the Go developers.

I had a file with space padded numbers (NumberOfPoints="   266703") and the XML unmarshaller couldn't handle the spaces.

Booleans are trimmed, so why not numbers?

Konstantin Khomoutov

unread,
Oct 5, 2017, 3:22:24 AM10/5/17
to Per Persson, golang-nuts
> <https://github.com/golang/go/blob/master/src/encoding/xml/read.go#L647>,
> so why not numbers?

I'd take this question to be rather philosophical.
The question of whether the string " 266703" represents a valid
integer is hardly decidable in my opinion. For instance, would the
string "\v\v\v\n\n\n\t\t266703\n\n\n\v\v\v\v\t\x20\t" represent a valid
integer as well?

I don't know why the decoder from encoding/xml is more lax regarding
booleans; may be that's because the string representation of a boolean
value strconv.ParseBool() accepts as valid is pretty lax in itself,
or may be that's simply because when that code was written, that space
trimming was added semi-automatically by the programmer without giving
it much thought. ;-)

All in all, you could use the fact XML decoder from encoding/xml
checks whether the type of the variable it's going to unmarshal textual
data into implements the encoding.TextUnmarshaler interface [2],
and if it does, the UnmarshalText() method of that type is used to parse
that textual data.

So a way to go in your case is to define a special type to unmarshal
those "whitespace integers" from your XML data, and make that type
implement encoding.TextUnmarshaler:

----------------8<----------------
package main

import (
"bytes"
"encoding/xml"
"fmt"
"strconv"
)

type xmlInt int64

func (xi *xmlInt) UnmarshalText(b []byte) error {
v, err := strconv.ParseInt(string(bytes.TrimSpace(b)), 10, 64)
if err != nil {
return err
}
*xi = xmlInt(v)
return nil
}

type data struct {
NumberOfPoints xmlInt `xml:",attr"`
}

const s = `<data NumberOfPoints=" 266703"/>`

func main() {
var d data
err := xml.Unmarshal([]byte(s), &d)
if err != nil {
panic(err)
}
fmt.Println(d)
}
----------------8<----------------

Playground link: [1].

1. https://play.golang.org/p/Euy8Sag88P
2. https://golang.org/pkg/encoding/#TextUnmarshaler

Per Persson

unread,
Oct 5, 2017, 3:48:54 AM10/5/17
to golang-nuts
Thanks for a good answer and a good suggestion of a solution!

Perhaps I should take the question to the developers to have them either give a technical answer or discuss a change.

Konstantin Khomoutov

unread,
Oct 5, 2017, 4:22:51 AM10/5/17
to Per Persson, golang-nuts
On Thu, Oct 05, 2017 at 12:48:53AM -0700, Per Persson wrote:

> > Probably it's best to ask here before I try to contact the Go developers.
> >
> > I had a file with space padded numbers (NumberOfPoints=" 266703") and
> > the XML unmarshaller couldn't handle the spaces.
> >
> > Booleans are trimmed
> > <https://github.com/golang/go/blob/master/src/encoding/xml/read.go#L647>,
> > so why not numbers?

> Thanks for a good answer and a good suggestion of a solution!

You're welcome!

> Perhaps I should take the question to the developers to have them either
> give a technical answer or discuss a change.

I'd recommend to just file a bug over there in the issue tracker and
solicit a discussion on the matter.

1. https://github.com/golang/go/issues

Reply all
Reply to author
Forward
0 new messages