Confusing whitespace handling in tags

240 views
Skip to first unread message

Sam Salisbury

unread,
Mar 2, 2015, 7:27:03 AM3/2/15
to golan...@googlegroups.com
A strange issue has just confused me and a fellow developer for some time. It appears that if 2 field tags are separated by a tab character, the second one is completely ignored.


Both type definitions look visually identical, however the first contains spaces between the "json" and "mytag" tags, whereas the second contains a tab character.

Reading the reflect docs, this behaviour is possibly correct (only spaces are mentioned in http://golang.org/pkg/reflect/#StructTag), but it took us a while to find, and this is the first time Go has caught me out over whitespace differences, so it was doubly unexpected to me.

Is it possible to modify the StructTag.Get(string) method to support tab-separated tags? Alternatively, could the build be made to fail in the case that 2 tags are separated by tags?

Thanks,
Sam

Jan Mercl

unread,
Mar 2, 2015, 7:53:50 AM3/2/15
to Sam Salisbury, golan...@googlegroups.com
On Mon, Mar 2, 2015 at 1:27 PM Sam Salisbury <samsal...@gmail.com> wrote:

> Is it possible to modify the StructTag.Get(string) method to support tab-separated tags?

The language supports any string as a struct tag.

> Alternatively, could the build be made to fail in the case that 2 tags are 
> separated by tags?

No. That would mean rejecting specs conforming code. Even worse, there's probably someone's existing code which would now suddenly fail to compile, despite the Go 1 compatibility promise.

The space (not spaces) separated convention of the reflect package and the language specs for the struct tag are two different things. The former applies only if you want to use the utility StructTag.Get function. Otherwise you're free to parse anything using any other way out of the struct tag value.

-j

Luna Duclos

unread,
Mar 2, 2015, 8:05:00 AM3/2/15
to Jan Mercl, Sam Salisbury, golang-nuts
Perhaps it'd be a good idea to change StructTag.Get to accept tabs as well as spaces ? I don't see this breaking any existing code.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rob Pike

unread,
Mar 2, 2015, 8:49:27 AM3/2/15
to Luna Duclos, Jan Mercl, Sam Salisbury, golang-nuts
Go vet will verify struct tags match the accepted format.

-rob

Sam Salisbury

unread,
Mar 2, 2015, 10:21:57 AM3/2/15
to Jan Mercl, golan...@googlegroups.com
I definitely agree with your second point, the first, though, is about the convention in the reflect package, so its not a language issue, but a conventions one.

Surely handling and white space in between conventional tags as equivalent is sensible, as when using the reflect package, the inclusion of a tab is less likely to mean: "please ignore everything after this special white space"; and more likely to mean "I am white space separating 2 conventional tags", no?

Sam Salisbury

unread,
Mar 2, 2015, 10:26:15 AM3/2/15
to Luna Duclos, Jan Mercl, golang-nuts
I would agree, this behaviour would seem less surprising to me than being sensitive to invisible character differences.

Conventional tags look like code, so treating them like code when parsing in a conventional way would seem to be more intuitive.

:)

Sam Salisbury

unread,
Mar 2, 2015, 12:47:14 PM3/2/15
to Rob Pike, Luna Duclos, Jan Mercl, golang-nuts
Ah, well that's handy, thanks!

Do you think modifying the conventional parser StructTag.Get(string) would be likely to break existing code?

(sorry about the duped message Rob, I hit reply instead of reply-all last time)

Rob Pike

unread,
Mar 2, 2015, 1:12:40 PM3/2/15
to Sam Salisbury, Luna Duclos, Jan Mercl, golang-nuts
I think things are fine as is, a simple clear spec and a tool to check.

-rob

Sam Salisbury

unread,
Mar 2, 2015, 3:32:10 PM3/2/15
to Rob Pike, Luna Duclos, Jan Mercl, golang-nuts
It definitely seems like a surprising source of silent bugs for newbies using libraries that depend on field tags and reflect.

I would be happy to send a patch for this, would it be considered?

Looking at the code, I don't think the performance consequences would be significant; http://golang.org/src/reflect/type.go?s=21335:21356#L758 


Rob Pike

unread,
Mar 2, 2015, 4:35:14 PM3/2/15
to Sam Salisbury, Luna Duclos, Jan Mercl, golang-nuts
I still think things are fine as is.

-rob

Dmitri Shuralyov

unread,
Mar 2, 2015, 5:05:03 PM3/2/15
to golan...@googlegroups.com, samsal...@gmail.com, luna....@palmstonegames.com, 0xj...@gmail.com
It's better not to support tabs. Right now things are simple and clear:

> By convention, tag strings are a concatenation of
> optionally space-separated key:"value" pairs.

If you add tab support, the description becomes more muddy: "of optionally separated by either space or tab". Then which should I use to separate the pairs, spaces or tabs? Some people will use spaces, others will use tabs, some with mix tabs and spaces. Also, why tabs only but not other unicode whitespace?

"space-separated" is nice and simple. I agree it's best to keep it that way.

Sam Salisbury

unread,
Mar 2, 2015, 6:35:35 PM3/2/15
to Dmitri Shuralyov, golan...@googlegroups.com, Luna Duclos, Jan Mercl, Rob Pike
In fact "white space separated" isn't such a muddy concept; The spec already has a clear definition of white space in Go source code:

"White space, formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and newlines (U+000A), is ignored except as it separates tokens that would otherwise combine into a single token." (from http://golang.org/ref/spec#Tokens)

It would seem highly consistent to apply the same rule to conventional tag parsing, and more efficient than including all unicode whitespace characters. In the mind of the Go target audience, conventional tags do look like tokens. The payoff will be that fewer new gophers will write difficult to detect bugs, leading to a less frustrating experience.

I'm happy to coordinate submitting a patch for both StructTag.Get and go vet, and any other core tools that need to be aware of the rules, in the event that any consensus is agreed upon.

Reply all
Reply to author
Forward
0 new messages