"Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark
(BOM) in UTF-8 encoded files is known to cause problems for some text
editors and older browsers. You may want to consider avoiding its use
until it is better supported."
The only way I could solve the problem was using notepad++ which has
an option to explicitly save the file without the BOM. Is there a way
to do the same thing in Vim? Maybe even to display this BOM?
Thanks,
Carlo
:set bomb?
Do ':set nobomb' before saving to remove a BOM.
--
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit
That message is outdated. The BOM is supported in all Unicode encodings
including UTF-8 by all "reasonably recent" browers. It is also part of
the HTML standard. Some text editors (such as Notepad, I think) choke on
it, but the answer to that is to use a better editor, such as Vim or
even WordPad, which know about the BOM and handle it correctly, even in
UTF-8.
For some other kinds of text files (most source files and shell scripts,
for instance), it is better to save the file without a BOM, but for
momst "web" formats including HTML, CSS, and, I think, XML, XHTML, etc.,
a BOM is no problem and can even be a help (e.g. in case the web server
sets the charset incorrectly or not at all in its Content-Type header).
>
> The only way I could solve the problem was using notepad++ which has
> an option to explicitly save the file without the BOM. Is there a way
> to do the same thing in Vim? Maybe even to display this BOM?
>
> Thanks,
> Carlo
>
To save the file without a BOM:
:setlocal nobomb
:w
To ask Vim if there is a BOM:
:setlocal bomb?
The answer is bomb for "BOM present" or nobomb for "BOM absent".
Note that regardless of the state of the 'bomb' option, a BOM can only
exist if the 'fileencoding' is one of UTF-8, UTF-16 (or its UCS-2
subset) or UTF-16 (aka UCS-4), any of them (other than UTF-8 for which
endianness is not relevant) in any endianness. For other 'fileencoding'
values the 'bomb' option is irrelevant.
To display the presence or absence of the BOM on the status line:
see http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line
Best regards,
Tony.
--
George Orwell was an optimist.
:w ++bin
should also work IIRC.
regards,
Christian
> That message is outdated. The BOM is supported in all Unicode encodings
> including UTF-8 by all "reasonably recent" browers. It is also part of the
> HTML standard.
Well, with the BOM the whole layout of the website appeared broken in
Internet Explorer 7. No problem with Firefox. Still it seems is not an
issue to understimate.
> For some other kinds of text files (most source files and shell scripts, for
> instance), it is better to save the file without a BOM, but for momst "web"
> formats including HTML, CSS, and, I think, XML, XHTML, etc., a BOM is no
> problem and can even be a help (e.g. in case the web server sets the charset
> incorrectly or not at all in its Content-Type header).
It was a php file, so maybe that's problem.
> To save the file without a BOM:
>
> :setlocal nobomb
> :w
>
> To ask Vim if there is a BOM:
>
> :setlocal bomb?
>
> The answer is bomb for "BOM present" or nobomb for "BOM absent".
>
>
> To display the presence or absence of the BOM on the status line:
>
> see
> http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line
Thanks for all the info and the commands. Very useful.
BOM is a standard for UCS2 or UTF-16, not for UTF-8.
BOM for utf-8 will cause problem for most programs which expect text
streams. gcc is a good example, most GNU CLI utilities will reject
utf-8 with BOM.
And, W3C validator will of course complain about it...
According to the Unicode FAQ,
http://www.unicode.org/faq//utf_bom.html#bom4 (two successive FAQ
questions) a BOM can be used in UTF-8 as well as in UTF-16 or UTF-32;
but since UTF-8 doesn't have endianness variants, with UTF-8 it
specifies encoding only, not endianness. BTW, "good" editors (including
at least Vim and WordPad, possibly others) handle the BOM correctly,
even in UTF-8. In fact, in my experience WordPad won't read UTF-8 text
correctly _unless_ there is a BOM.
However (about your next paragraph), when UTF-8 is fed "transparently"
to a program which expects ASCII, and in particular to any program which
expects #! at the start of a file, the BOM should not be used (see the
2nd FAQ question linked above, and also
http://www.unicode.org/faq//utf_bom.html#bom10 "How I should deal with
BOMs?", point 3.
>
> BOM for utf-8 will cause problem for most programs which expect text
> streams. gcc is a good example, most GNU CLI utilities will reject
> utf-8 with BOM.
I explicitly mentioned in the part you snipped that for some other kinds
of text than HTML or CSS (such as, I said, source files and shell
scripts) it is better to save the file without a BOM.
>
> And, W3C validator will of course complain about it...
>
...with a warning, not an error; and Tidy won't.
Best regards,
Tony.
--
"My weight is perfect for my height -- which varies"
The better way to use BOM is when you know your target. I work in a MacBook
which has UTF-8 as default. When I'm working with Objective-C that will be
compiled using LLVM there is no problem using BOM (which is a good thing since
the encoding can be easily recognized). But when I'm working with Java, doing
something for the Android platform, I use ISO-8859-1 because the Google guys
had defined the 'encoding' argument of the 'javac' compiler as 'ASCII' in an
ANT XML somewhere.
I known, also, that PHP doesn't handle BOM well. So I decided to work with PHP
also in ISO-8859-1. But, my e-mails are all HTML formated using UTF-8 with BOM
(edited on VIM), always seen in Firefox, Safari or Chrome with no problems.
I believe that the problem with major browsers is in respect with user
configuration. You can left the browser discover the character set of a page
or configure it to use one based in the assumption that you are in an
occidental country (or another part of the world). This causes no problems if
you don't open pages from another countries. In the current days, is
preferable if you let the browser handle the encoding it self.
Regards.
Yeah, the idea is to know what your file will be used with.
Recently I discovered that when feeding a local *.txt file to SeaMonkey
(or, I suppose, Firefox), it will try to read it as Latin1 unless there
is a BOM. I'm not sure if that depends on my Appearance preferences. Of
course, for a *.txt on my local disk there is no metadata (no HTTP
headers etc.) to tell the MIME type and the encoding to the browser. For
the MIME type, *.txt means text/plain but it could be any charset.
This means that when I want to display (and possibly print) multilingual
text (let's say, who knows? maybe a *.txt file in French with some
Russian and some Hebrew in it), something Gecko (the display engine used
by Firefox, Thunderbird and SeaMonkey) does better than gvim, I'll have
to record it with a BOM.
OTOH any file starting with #! MUST, as has already been said, be
recorded with no BOM because the shebang is only looked for in the first
two bytes of the file (which would be part of the BOM if there were one).
Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
156. You forget your friend's name but not her e-mail address.