Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

B.O.M.

4 views
Skip to first unread message

luciendenarend

unread,
Feb 17, 2007, 1:36:02 PM2/17/07
to
Thanks for the tip Tina. I've been changing back all my special characters for two weeks now, and am sick of it. I have a more than 1500 page site (denarend.com) and finding the abacadabra all over my pages everytime again is frustrating. I wish I could find a right way to do it - but the unchecking of the BOM is a good solution for the moment.
Lucien

EggHeadCafe.com - .NET Developer Portal of Choice
http://www.eggheadcafe.com

Christoph Schneegans

unread,
Feb 17, 2007, 6:33:44 PM2/17/07
to
"Lucien den Arend" wrote:

> I've been changing back all my special characters for two weeks now,
> and am sick of it.

You can almost always automate that task through VBA macros.

Charax

unread,
Feb 18, 2007, 4:58:53 AM2/18/07
to
"Christoph Schneegans" <Chri...@Schneegans.de> wrote in message
news:er86t8...@news.christoph.schneegans.de...

This problem troubled me also. I originally file-copied my FP site to a
different folder then opened it in EW. I executed a find and replace to
universally add a HTML 4.01 DOCTYPE statement, only to subsequently find
many pages with corrupted characters above the 256 basic ANSI range.

But when I (re-)imported the offending pages into EW overwriting the
corrupted pages, then applied the DOCTYPE, all was OK.

I cannot reliably reproduce the problem, seems to be hit or miss.

Cheers,

Charax

Christoph Schneegans

unread,
Feb 18, 2007, 6:57:10 AM2/18/07
to
"Charax" wrote:

> I originally file-copied my FP site to a different folder then opened
> it in EW. I executed a find and replace to universally add a HTML
> 4.01 DOCTYPE statement, only to subsequently find many pages with
> corrupted characters above the 256 basic ANSI range.

A BOM is not corruption, but a Unicode standard for ten years. Besides that,
if your files have an explicit encoding declaration such as

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />,

this will not happen. If your files don't have an explicit encoding
declaration, they're already corrupted.

Charax

unread,
Feb 18, 2007, 4:38:19 PM2/18/07
to
"Christoph Schneegans" <Chri...@Schneegans.de> wrote in message
news:er9if6...@news.christoph.schneegans.de...

> "Charax" wrote:
>
> A BOM is not corruption, but a Unicode standard for ten years. Besides
> that,
> if your files have an explicit encoding declaration such as
>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"
> />,
>
> this will not happen. If your files don't have an explicit encoding
> declaration, they're already corrupted.

You misunderstood and I wasn't clear. I'm saying that when I used find &
replace to mass redesignate file-copied pages with win-1252 or iso-8859-1
pages to HTML 4.01 / utf-8, then opened in many characters above ANSI 256
range were corrupted. It doesn't seem to happen with imported pages, and can
not be duplicated consistently. Only file copied pages opened in EW after a
find & Replace of the encoding and doctype shown the problem.

Charax

Christoph Schneegans

unread,
Feb 20, 2007, 7:51:34 PM2/20/07
to
"Charax" wrote:

> I'm saying that when I used find & replace to mass redesignate
> file-copied pages with win-1252 or iso-8859-1 pages to HTML 4.01 /
> utf-8, then opened in many characters above ANSI 256 range were
> corrupted.

I would _never_ use find & replace to change the encoding declaration. It's
not sufficient to replace the declaration, EW also needs to make sure that
the actual encoding is changed. There might be situations when this doesn't
happen. If you want to recode all files in a website, you should probably
use a VBA macro such
<http://google.com/groups?selm=ca88dq.16o.1%40news.christoph.schneegans.de>.

Note the line

pw.Document.DocumentHTML = pw.Document.DocumentHTML

which seems redundant. In fact, it is absolutely necessary. Without it, EW
would only change the encoding declaration, but not update the actual
encoding. You would end up with a declaration such as

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

while the file is still Windows-1252 encododed. The same problem might have
occured during your replace operation.

Replacing the document type declaration is safe, however.

Charax

unread,
Feb 20, 2007, 9:12:08 PM2/20/07
to
"Christoph Schneegans" <Chri...@Schneegans.de> wrote in message
news:erg8j7...@news.christoph.schneegans.de...

> I would _never_ use find & replace to change the encoding declaration.
> It's
> not sufficient to replace the declaration, EW also needs to make sure that
> the actual encoding is changed. There might be situations when this
> doesn't
> happen. If you want to recode all files in a website, you should probably
> use a VBA macro such
> <http://google.com/groups?selm=ca88dq.16o.1%40news.christoph.schneegans.de>.
>
> Note the line
>
> pw.Document.DocumentHTML = pw.Document.DocumentHTML
>
> which seems redundant. In fact, it is absolutely necessary. Without it, EW
> would only change the encoding declaration, but not update the actual
> encoding. You would end up with a declaration such as
>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
>
> while the file is still Windows-1252 encododed. The same problem might
> have
> occured during your replace operation.
>
> Replacing the document type declaration is safe, however.

Thanks very much. Better I find out late than never.... I still have a lot
of conversions to do.

An interesting bit of code but I can't compile, evidently missing a library
ref. I get a compile error on the dim statement for PageWindowEx. Here's
what I've got loaded:
Visual Basic for Applications
Microsoft Expression Web 12.0 Web Object Reference Library
OLE Automation
Microsoft Office 12.0 Object Library
Microsoft Expression Web 12.0 Page Object Reference Library
Encoding Converters Repository and basic converter engine wrappers

What ref am I missing?

Thanks,

Charax

Christoph Schneegans

unread,
Feb 21, 2007, 6:42:11 PM2/21/07
to
"Charax" wrote:

>> <http://google.com/groups?selm=ca88dq.16o.1%40news.christoph.schneegans.de>.


>
> An interesting bit of code but I can't compile, evidently missing a
> library ref.

Oops, that was an macro written for FrontPage. Here's the version adapted
for EW:

Sub SetEncodingDeclaration()

Dim wf As WebFile
Dim pw As PageWindow

For Each wf In ActiveWeb.AllFiles
'Process .html and .htm files only.
If wf.Extension <> "html" And wf.Extension <> "htm" Then GoTo NextFile

Set pw = wf.Edit(PageViewNoWindow)

'Search for existing encoding declaration.
Dim meta As MetaElement
For Each meta In pw.Document.all.tags("head")(0).all.tags("meta")
If meta.httpEquiv = "Content-Type" Then
Exit For
End If
Next

'No encoding declaration found, create a new one.
If meta Is Nothing Then
pw.Document.all.tags("head")(0).insertAdjacentHTML "AfterBegin", "<meta />"
Set meta = pw.Document.all.tags("head")(0).all.tags("meta")(0)
End If

With meta
.httpEquiv = "Content-Type"
.content = "text/html"
.Charset = "utf-8"
End With

pw.Document.documentHTML = pw.Document.documentHTML

pw.Save
pw.Close

NextFile:
Next

End Sub

Be sure to backup your website before running this macro.

Charax

unread,
Feb 22, 2007, 9:38:14 AM2/22/07
to
"Christoph Schneegans" <Chri...@Schneegans.de> wrote in message
news:eriot...@news.christoph.schneegans.de...

> "Charax" wrote:
>
>>> <http://google.com/groups?selm=ca88dq.16o.1%40news.christoph.schneegans.de>.
>>
>> An interesting bit of code but I can't compile, evidently missing a
>> library ref.
>
> Oops, that was an macro written for FrontPage. Here's the version adapted
> for EW:

Many thanks, Christof. That's very helpful!

I will also recode for single page use and put a button on the toolbar. It
will be a valuable and much used utility for me as I move everything to
utf-8.

Charax

0 new messages