While researching strange IE behaviour at some pages of my
Django-powered site I've found that Django does not handle BOM marks
correctly.
For instance I have following templates (!UTF_8_BOM! = EF BB BF):
--- base.html
!UTF_8_BOM!<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
blah-blah-blah
---
and
--- page.html
!UTF_8_BOM!
{% extends "base.html" %}
blah-blah-blah
---
If I will render page.html output will contain TWO byte order marks:
!UTF_8_BOM!!UTF8_BOM!<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN" ...
Of course I can fix this by introducing custom middleware, but I think
problem should be handled at template loader level.
Any comments?
Sort of off-topic, but why are you using a BOM in UTF-8? ;)
--
"May the forces of evil become confused on the way to your house."
-- George Carlin
i experimented a little, and here's what i found:
the basic "problem" is that django simply treats the templates as
byte-strings. for him even the BOM is simply a 3bytes-long text.
so, for example, in the following case:
======base.txt==========
bbb
bbb
bbb
========================
======inherit.txt=======
xxx{% extends "base.txt" %}
========================
if you render inherit.txt, you get
========================
xxxbbb
bbb
bbb
========================
as you see, the text before the extends-tag is kept.
and, in your case that text is the BOM.
so technically django behaves correctly (well, let's say consistently :)
even if we could change it, what would you propose?
i think the best solution would be to:
- strip the BOM, and remember that there was a BOM
- at the end, when the final rendered template is emitted, add the BOM
if it was used in the input templates
but this would mean that we have to "unicode-interpret" the template,
and i don't think that will happen before django goes completely unicode.
so, imho your best way of action is to strip the BOM from your templates
gabor
Well, I'm not ;)
But I had few legacy templates created in Notepad, and I see that
Notepad inserts BOM marks automatically.
I think correct behavior, for template loader, will be to check default
encoding, and if it is UTF8 (or other BOMable) remove BOM.
Thats how I've solved this problem for today. But I think there should
be a generic approach for handling this.