9.3 Warning: Document encoding mismatch

425 views
Skip to first unread message

Steve Piercy

unread,
Nov 4, 2009, 10:04:00 PM11/4/09
to BBEdit Talk
Since upgrading to 9.3, when I attempt to save a file as UTF-8 (with
BOM) that contains the following:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

I get the following error:

=======
Document encoding mismatch

This document contains data which describes its encoding as Unicode
(UTF-8, no BOM), but the encoding has been set to Unicode (UTF-8).

Saving this document as-is will likely cause unexpected display of its
contents and may prevent BBEdit from opening it in the future.
=======

How do I make this warning stop popping up on every file save?

Rich Siegel

unread,
Nov 4, 2009, 10:08:28 PM11/4/09
to bbe...@googlegroups.com
On 11/4/09 at 10:04 PM, steve.pi...@gmail.com (Steve
Piercy) wrote:

>How do I make this warning stop popping up on every file save?

Change the encoding to "UTF-8, No BOM" as the warning suggests.
You've been writing your documents out with a BOM this whole
time (which is not typically a good idea).

R.
--
Rich Siegel Bare Bones Software, Inc.
<sie...@barebones.com> <http://www.barebones.com/>

Someday I'll look back on all this and laugh... until they
sedate me.

Steve Piercy

unread,
Nov 4, 2009, 10:23:29 PM11/4/09
to BBEdit Talk
On Nov 4, 7:08 pm, Rich Siegel <sie...@barebones.com> wrote:
> On 11/4/09 at 10:04 PM, steve.piercy....@gmail.com (Steve
>
> Piercy) wrote:
> >How do I make this warning stop popping up on every file save?
>
> Change the encoding to "UTF-8, No BOM" as the warning suggests.
> You've been writing your documents out with a BOM this whole
> time (which is not typically a good idea).

In a general context, yes, I would agree that it is not a good idea.
However it is a Lasso file, and Lasso recognizes a BOM if present, so
in this specific context it is necessary. Here is what Lasso does:

"Lasso uses the standard Unicode byte order mark to determine if a
Lasso page is encoded in UTF-8. If no
byte order mark is present then the Lasso page will be assumed to be
encoded using the Macintosh (or Mac-
Roman) character set on Mac OS X or the Latin-1 (or ISO 8859-1)
character set on Windows or Linux."

In order to stop the warning, currently my only option is to remove
this line:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

...and maybe I could, but I am not sure of the implications of doing
so for web pages.

--steve

Rich Siegel

unread,
Nov 4, 2009, 10:33:21 PM11/4/09
to bbe...@googlegroups.com
On 11/4/09 at 10:23 PM, steve.pi...@gmail.com (Steve
Piercy) wrote:

>In a general context, yes, I would agree that it is not a good idea.
>However it is a Lasso file, and Lasso recognizes a BOM if present, so
>in this specific context it is necessary.

One does not follow from the other. My reading of this page
<http://www.lassosoft.com/Documentation/TotW/index.lasso?8902>
leads me to conclude that the BOM is optional and that Lasso
will do the right thing with the character set declaration that
you have in place.

Steve Piercy

unread,
Nov 4, 2009, 11:13:11 PM11/4/09
to BBEdit Talk
On Nov 4, 7:33 pm, Rich Siegel <sie...@barebones.com> wrote:
> On 11/4/09 at 10:23 PM, steve.piercy....@gmail.com (Steve
>
> Piercy) wrote:
> >In a general context, yes, I would agree that it is not a good idea.
> >However it is a Lasso file, and Lasso recognizes a BOM if present, so
> >in this specific context it is necessary.
>
> One does not follow from the other. My reading of this page
> <http://www.lassosoft.com/Documentation/TotW/index.lasso?8902>
> leads me to conclude that the BOM is optional and that Lasso
> will do the right thing with the character set declaration that
> you have in place.

I found a workaround:
http://old.nabble.com/Re%3A-html-entities%2C-mysql-and-lasso-p22476273.html

<meta http-equiv="content-type" content="text/html;
[content_encoding]">

So I'll just replace that in any legacy file I touch in BBEdit going
forward. Nonetheless, it's annoying to get the warning and not be able
to shut it off.

--steve

Johan Solve

unread,
Nov 5, 2009, 2:45:16 AM11/5/09
to bbe...@googlegroups.com
At 22.33 -0500 2009-11-04, Rich Siegel wrote:
>On 11/4/09 at 10:23 PM, steve.pi...@gmail.com (Steve
>Piercy) wrote:
>
> >In a general context, yes, I would agree that it is not a good idea.
> >However it is a Lasso file, and Lasso recognizes a BOM if present, so
> >in this specific context it is necessary.
>
>One does not follow from the other. My reading of this page
><http://www.lassosoft.com/Documentation/TotW/index.lasso?8902>
>leads me to conclude that the BOM is optional and that Lasso
>will do the right thing with the character set declaration that
>you have in place.

Depends on what you think "the right thing" is. Without a BOM, Lasso assumes the file to have MacRoman encoding when reading the source file on a Mac server (for backwards compatibility reasons) or Latin-1 when reading the file on other server platforms. In this context, this is not the right thing. So in reality the BOM is required when using Lasso to process files saved with UTF-8 encoding.

Note that this has nothing to do with the encoding the file is actually served to a browser with. Lasso is Unicode native and use Unicode everywhere internally. The BOM is needed for Lasso to properly read source files saved with UTF-8 encoding. But Lasso can still serve the resulting output with a different encoding, and will indicate it properly with the http content-type header. The meta http-equiv header is redundant and unneeded (btw is there a way to turn off adding the meta http-equiv header?).

With Lasso, there is no connection between how the file has been saved and how it is served because everything becomes Unicode in between.


I understand that in many cases a BOM can cause unwanted side effects for (static html and php* for example), but Lasso does in fact depend on the BOM.

* http://bugs.php.net/bug.php?id=22108

--
Johan Sölve [FSA Member, Lasso Partner]
Web Application/Lasso/FileMaker Developer
MONTANIA SOFTWARE & SOLUTIONS
http://www.montania.se mailto:jo...@montania.se
(spam-safe email address, replace '-' with 'a')

Johan Solve

unread,
Nov 5, 2009, 3:11:39 AM11/5/09
to bbe...@googlegroups.com
At 08.45 +0100 2009-11-05, Johan Solve wrote:
>So in reality the BOM is required when using Lasso to process files saved with UTF-8 encoding.

I should add that Lasso consumes the BOM when reading the file, so there is no BOM in the output.

Steve Piercy

unread,
Nov 9, 2009, 4:33:35 AM11/9/09
to BBEdit Talk
On Nov 5, 12:11 am, Johan Solve <inbox...@solve.se> wrote:
> At 08.45 +0100 2009-11-05, Johan Solve wrote:
>
> >So in reality the BOM is required when using Lasso to process files saved with UTF-8 encoding.
>
> I should add that Lasso consumes the BOM when reading the file, so there is no BOM in the output.

This warning is really annoying, despite the workaround. I've been
using BBEdit for about 15 years without this new warning in 9.3, and
the world has not come to an end. Why must it be persistent?

All I want is a little box to tick: "Don't warn me again. I know what
I'm doing." Or instructions to hack the BBEdit plist. Or a tiny pony.
Pretty please?

--steve

Steve Piercy

unread,
Nov 9, 2009, 4:43:25 AM11/9/09
to BBEdit Talk
On Nov 4, 7:33 pm, Rich Siegel <sie...@barebones.com> wrote:
> On 11/4/09 at 10:23 PM, steve.piercy....@gmail.com (Steve
>
> Piercy) wrote:
> >In a general context, yes, I would agree that it is not a good idea.
> >However it is a Lasso file, and Lasso recognizes a BOM if present, so
> >in this specific context it is necessary.
>
> One does not follow from the other. My reading of this page
> <http://www.lassosoft.com/Documentation/TotW/index.lasso?8902>
> leads me to conclude that the BOM is optional and that Lasso
> will do the right thing with the character set declaration that
> you have in place.

From the LassoTalk list, another developer who understands the BOM and
UTF-8 character set in HTML documents far better than I had the
following to say:

You might want to point out that their original assumption
(specifying UTF-8 as the character set precludes the use of a BOM
for HTML documents) is incorrect. Or at least I couldn't find
any document that prevents it's use. Some reading if interested:

Discusses display problems, but doesn't say it's illegal to use
BOM:
http://www.w3.org/International/tests/results/results-utf8-signature

Says BOM is valid for UTF-8 data streams:
http://unicode.org/faq/utf_bom.html#BOM

This RFC says that even if the protocol disallows a BOM, the BOM
must then be interpreted as a zero-width char, so it's still
allowed!
http://tools.ietf.org/html/rfc3629#section-6

If BareBones knows of a RFC or similar authoritative spec that
disallows the use of a BOM on HTML documents encoded as UTF-8,
I'd be interested in the reference.

I hope I'm not flogging this pony too hard and that it is still
kicking.

--steve

BeeRich

unread,
Nov 10, 2009, 2:58:33 PM11/10/09
to BBEdit Talk
I keep getting the following warning:

"This document contains data which describes its encoding as Western
(Mac OS Roman), but the encoding has been set to Western (Mac OS
Roman).

Saving this document as-is will likely cause unexpected display of its
contents and may prevent BBEdit from opening it in the future. "

Um, that's what I want it to do. It's warning me of the data and the
encoding...as being the same. Did I miss something?

Moosehair.

Rich Siegel

unread,
Nov 10, 2009, 6:01:46 PM11/10/09
to bbe...@googlegroups.com
On 11/10/09 at 2:58 PM, bee...@gmail.com (BeeRich) wrote:

>I keep getting the following warning:
>
>"This document contains data which describes its encoding as Western
>(Mac OS Roman), but the encoding has been set to Western (Mac OS
>Roman).

This sounds like good fodder for support. Please send a
compressed file which exhibits this behavior and we'll have a look.

Thanks,

stratboy

unread,
Nov 23, 2009, 10:48:27 AM11/23/09
to BBEdit Talk
Ok, I come from here (a post of mine):

http://groups.google.com/group/bbedit/browse_thread/thread/c23a3d6be244a7c8/

I've got this problem too. BOM or not BOM for me is a simple problem:
as an xhtml programmer, I need simple utf-8 since the no-bom encoding
makes a lor of characters displaying wrong (unless I use the old
entities). So I frankly don't mind if ir better to use the no-bom
encoding. For me, it's quite evident that good-old plain utf-8 is
better. That's why I want it back like it was before 9.3 :)

Bye

Rich Siegel

unread,
Nov 23, 2009, 12:31:06 PM11/23/09
to bbe...@googlegroups.com
On 11/23/09 at 10:48 AM, em...@reghellin.com (stratboy) wrote:

>Ok, I come from here (a post of mine):
>
>http://groups.google.com/group/bbedit/browse_thread/thread/
>c23a3d6be244a7c8/
>
>I've got this problem too. BOM or not BOM for me is a simple problem:
>as an xhtml programmer, I need simple utf-8 since the no-bom encoding
>makes a lor of characters displaying wrong (unless I use the old
>entities). So I frankly don't mind if ir better to use the no-bom
>encoding. For me, it's quite evident that good-old plain utf-8 is
>better.

I think you're confused. :-)

The "utf-8" character set declaration that you use in HTML/XHTML
*is* UTF-8 with no BOM. (I said this before, but perhaps was not
clearly expressing it.) So, you *should* make sure that the text
encoding popup (in the status bar at the bottom of each
document) is set to "Unicode (UTF-8, No BOM)".

We know that the nomenclature is confusing. In a future version,
the encoding "Unicode (UTF-8)" will get renamed to make clear
that it includes a BOM: "Unicode (UTF-8, with BOM)". Then,
today's "Unicode (UTF-8, No BOM)" will be renamed to simply
"Unicode (UTF-8)". That way, the simplest notation ("UTF-8")
works for both your character set declarations *and* for the
presentation in the UI, which will hopefully be less confusing.
Reply all
Reply to author
Forward
0 new messages