> Hi,
> It would be nice if compiler supports sources with BOM.
Go's source is in UTF-8. BOM is not valid UTF-8.
-rob
The UTF-8 BOM is EF BB BF. From the Unicode 5 Standard, section 16.8:
"In UTF-8, the BOM corresponds to the byte sequence <EF BB BF>. Although
there are never any questions of byte order with UTF-8 text, this
sequence can serve as signature for UTF-8 encoded text where the
character set is unmarked. As with a BOM in UTF-16, this sequence of
bytes will be extremely rare at the beginning of text files in other
character encodings."
From the Unicode FAQ (http://unicode.org/faq/utf_bom.html#bom5, my
emphasis):
"Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)?
If yes, then can I still assume the remaining UTF-8 bytes are in
big-endian order?
A: **Yes, UTF-8 can contain a BOM**. However, it makes no difference as
to the endianness of the byte stream. UTF-8 always has the same byte
order. An initial BOM is only used as a signature � an indication that
an otherwise unmarked text file is in UTF-8."
--
Gordon Tisher
http://balafon.net
Go source code is not an otherwise unmarked text file.
It is a file named *.go, and all Go files must be UTF-8.
There is no need for the BOM.
Frankly, it's a bizarre convention to litter otherwise
ordinary files with byte order marks when the encoding
used in the file has only one byte order.
Russ
Some editors are used to edit more than just .go files, and use the BOM
to both distinguish Unicode in general from other encodings, and between
UTF-8, UTF-16 and UTF-32. Given that they do so, it would be nice for
Go to just silently ignore the BOM, thus enabling people to edit
different kinds of files in their favorite editor without constantly
tweaking their settings.
I think it's a really bad idea to let them in. The Windows compiler might have to, because Windows doesn't get UTF-8 right at all, but it would be a mistake to enable them everywhere.
-rob
Another approach is to put this before the compile stage of your build process:
http://www.ueber.net/who/mjl/projects/bomstrip/
Andrew
Here's a Go version of bomstrip you could use:
package main
import (
"io"
"log"
"os"
)
func main() {
b := make([]byte, 3)
n, err := os.Stdin.Read(b)
if err != nil && err != os.EOF {
log.Exit(err)
}
if n > 0 {
if string(b) != "\xef\xbb\xbf" {
os.Stdout.Write(b[:n])
}
if err != os.EOF {
io.Copy(os.Stdout, os.Stdin)
}
}
}
I think it's a really bad idea to let them in. The Windows compiler might have to, because Windows doesn't get UTF-8 right at all, but it would be a mistake to enable them everywhere.
Any text editor that adds a BOM to UTF-8 files is broken, if that is a
problem, report it to the author of such a text editor.
That the Unicode folks took a perfectly sane UTF-8 standard, and
decided to allow an abomination like the BOM shows that one can trust
standard bodies to always, always fuck everything up.
uriel
By a strange coincidence I have in my other hand a report about
the Jena Turtle parser's non-support for a BOM, Turtle having
mandatory UTF-8 encoding. That user says:
> The data files are coming from my software which is all written
> in .Net and when outputting in UTF-8 the default behaviour of .Net
> is to include the BOM at the start of the file.
So it may not be as easy as reporting it to the author of "the" text
editor ...
Chris
--
Chris "No BOM today. BOM tomorrow?" Ivanova Dollin
I'm speechless...
> So it may not be as easy as reporting it to the author of "the" text
> editor ...
Seems like the only decent solution will be to have somebody drop a
'physics package' on Redmond.
uriel
P.S.: I miss boyd :(
>>> The data files are coming from my software which is all written
>>> in .Net and when outputting in UTF-8 the default behaviour of .Net
>>> is to include the BOM at the start of the file.
>
> I'm speechless...
Well, just because that user says it doesn't mean it's true.
Maybe it's just their local configuration. Maybe they've
misinterpreted something. Maybe they mailed in from the
Weirdzo universe.
Chris
--
Chris "allusive" Dollin
Or they are processing a source UTF-8 file that already has a BOM in it.
According to [1] .Net tries to hide BOMs as much as it can: the string with a BOM
is equal to the same string without the BOM, and the BOM does not appear
in the debugger or normal console output...
[1] http://chriscant.phdcc.com/2010/02/systemstring-hidden-utf8-bom.html