Feature request for newforms: HTML 4

5 views
Skip to first unread message

James Bennett

unread,
Dec 4, 2006, 10:36:34 PM12/4/06
to django-d...@googlegroups.com
So I've been poking around in the newforms code, and it appears that
the pre-defined widgets will be producing XHTML-style output.

Now, I'm pretty picky about my markup, and I'm certainly willing to go
to unusual lengths to get it just the way I want it, but it'd be
awfully nice if there were some way to get HTML-style output from
newforms without having to manually subclass all the widgets and
override their rendering to remove trailing slashes.

Unfortunately, I don't really have a good proposal for how to handle
this, except maybe to further break down the Widget API to include
'as_html' and 'as_xhtml'. Any ideas?

--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Antonio Cavedoni

unread,
Dec 5, 2006, 4:57:58 AM12/5/06
to django-d...@googlegroups.com
On 12/5/06, James Bennett <ubern...@gmail.com> wrote:
> Now, I'm pretty picky about my markup, and I'm certainly willing to go
> to unusual lengths to get it just the way I want it, but it'd be
> awfully nice if there were some way to get HTML-style output from
> newforms without having to manually subclass all the widgets and
> override their rendering to remove trailing slashes.

+1 to this proposal. I found myself writing the code below, which is
quite scary but does the trick:

[[[
from django import template

"""
Remove XHTML endings from tags to make them HTML 4.01 compliant

Usage:

{% load html4 %}

{% html4 %}
My long template with {{ variables }} and
{% block whatever %} blocks {% endblock %}
{% endhtml4 %}
"""

def do_html4(parser, token):
nodelist = parser.parse(('endhtml4',))
parser.delete_first_token()
return Html4Node(nodelist)

class Html4Node(template.Node):
def __init__(self, nodelist):
self.nodelist = nodelist

def render(self, context):
output = self.nodelist.render(context)
return output.replace(' />', '>')

register = template.Library()
register.tag('html4', do_html4)
]]]

Cheers.
--
Antonio

Ivan Sagalaev

unread,
Dec 5, 2006, 5:43:37 AM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:
> Now, I'm pretty picky about my markup, and I'm certainly willing to go
> to unusual lengths to get it just the way I want it, but it'd be
> awfully nice if there were some way to get HTML-style output from
> newforms without having to manually subclass all the widgets and
> override their rendering to remove trailing slashes.

The question is where to stop. Pickiness may lead further to having an
option to omit quotes around attribute values, have uppercase tag names,
omit end tags of <li> etc... This is all working HTML (even valid by DTD).

<rant mode=purist>
Since all these things happily work in browsers the only difference
between "/>" and the rest is that it is not DTD-valid HTML 4.01. However
to my puristic point of view this is not a problem because DTD
validation is effectively useless. The only user agent that does DTD
validation is W3C's validator itself. No real browser ever considered
HTML as SGML application and never used DTD for its validation. In fact
what guys at WHAT WG[1] are doing now for HTML5 is specifying exactly
the syntax that browsers use for parsing HTML. And "/>" will be valid HTML5.

So from my point of view a real HTML purist would ignore HTML 4.01
validation altogether. :-)
</rant>

Ivan Sagalaev

unread,
Dec 5, 2006, 5:48:01 AM12/5/06
to django-d...@googlegroups.com
Ivan Sagalaev wrote:
> <rant mode=purist>
> Since all these things happily work in browsers the only difference
> between "/>" and the rest is that it is not DTD-valid HTML 4.01.

In fact I'm wrong here... I just checked that W3C's validator doesn't
object to "<br />"s. This is a valid HTML 4.01:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<title>Test</title>

<p><br />


So even "invalidness" is not a point. What's then?

Adrian Holovaty

unread,
Dec 5, 2006, 10:30:20 AM12/5/06
to django-d...@googlegroups.com
On 12/5/06, Ivan Sagalaev <Man...@softwaremaniacs.org> wrote:
> In fact I'm wrong here... I just checked that W3C's validator doesn't
> object to "<br />"s. This is a valid HTML 4.01:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
>
> <title>Test</title>
>
> <p><br />
>
> So even "invalidness" is not a point. What's then?

If XHTML-style tags are valid in HTML 4 strict, then I don't see a
point in creating a separate output format for each widget. If you
want to be religious about whether there's a slash in your HTML tag,
clearly you care about it enough to have the (minimal) energy to write
a custom method on your Form. Or, write a custom Form subclass once
and subclass it for each form you use.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

Rob Hudson

unread,
Dec 5, 2006, 10:55:06 AM12/5/06
to Django developers
James Bennett wrote:
> So I've been poking around in the newforms code, and it appears that
> the pre-defined widgets will be producing XHTML-style output.

I had the same thought but I wrote a quick test with an HTML 4 strict
doc-type, put an input tag in it like this: <input type="text"
name="mytext" />, and it was still valid.

I'd like to see agnostic HTML from Django too, but if it isn't
producing anything that breaks HTML 4, I'm happy.

One of my worries is that the W3C, from what I've read, is going to
re-invigorate HTML and work on HTML5 as well as XHTML2. If these two
diverge enough, Django may need support for both.

Somewhere I think I suggested the possibility of passing in a template
string to the forms on how they should render, but the only purpose of
this would be to remove the "/", and requires knowledge of the local
variables. Last I looked the forms stuff is pretty flexible with
adding attributes (ie: class names) and things. So there's not a lot
of benefit doing that.

-Rob

Fredrik Lundh

unread,
Dec 5, 2006, 11:01:50 AM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:

> Unfortunately, I don't really have a good proposal for how to handle
> this, except maybe to further break down the Widget API to include
> 'as_html' and 'as_xhtml'. Any ideas?

build up the output using a light-weight DOM with a nice Python-level
syntax, and serialize it on the way out, using either a standard XHTML
serializer, or a user-provided alternative serializer.

(should I duck now?)

</F>

James Bennett

unread,
Dec 5, 2006, 11:09:40 AM12/5/06
to django-d...@googlegroups.com
On 12/5/06, Adrian Holovaty <holo...@gmail.com> wrote:
> If XHTML-style tags are valid in HTML 4 strict, then I don't see a
> point in creating a separate output format for each widget.

They are valid but have a completely different meaning which browsers
don't interpret correctly; in HTML4, the closing slash is a form of
SGML SHORTTAG syntax, and '<br />' in HTML4 is meant to be interpreted
as a 'br' element followed by a literal greater-than sign.

WHAT-WG's HTML5 will do away with this and make the closing slash
semantically meaningless in HTML, but that's still a ways off in the
future.

Fredrik Lundh

unread,
Dec 5, 2006, 11:19:31 AM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:

> They are valid but have a completely different meaning which

(most)

> browsers don't interpret correctly; in HTML4, the closing slash is a
> form of SGML SHORTTAG syntax, and '<br />' in HTML4 is meant to be
> interpreted as a 'br' element followed by a literal greater-than sign.

full details:

http://www.cs.tut.fi/~jkorpela/html/empty.html

</F>

James Bennett

unread,
Dec 5, 2006, 11:31:33 AM12/5/06
to django-d...@googlegroups.com
On 12/5/06, Fredrik Lundh <fre...@pythonware.com> wrote:
> full details:
>
> http://www.cs.tut.fi/~jkorpela/html/empty.html

Also, there is a valid problem here; if I produce HTML4, and just say
"it validates, I don't care if it's correct", then my HTML will work
in browsers, but actual SGML parsers (which do exist and do get used
on occasion) will produce a different document tree when they parse my
HTML, because they'll read the SHORTTAG syntax correctly. I haven't
yet had a moment to verify that the standard Python sgmllib will do
this, but I know for a fact that nsgmls will.

Also, we're the framework for "perfectionists"; let's get this right ;)

James Bennett

unread,
Dec 5, 2006, 12:32:32 PM12/5/06
to django-d...@googlegroups.com
On 12/5/06, Ivan Sagalaev <Man...@softwaremaniacs.org> wrote:
> The question is where to stop. Pickiness may lead further to having an
> option to omit quotes around attribute values, have uppercase tag names,
> omit end tags of <li> etc... This is all working HTML (even valid by DTD).

Yup. And in fact, I do that (quite deliberately).

But I'm not asking for that; mostly that's a matter of templating and
doing some deep hacking in markdown.py (at least in my case), and I'm
willing to put in that work.

I'm just asking for a simple way to get form inputs without trailing
slashes, because even though it's a nitpicky thing and maybe there
aren't very many people who actually care about the difference between
XML empty-element syntax and SGML SHORTTAG, there are potential issues
with things like SGML parsers, and (above all) it's just not "right"
:)

On IRC a moment ago, Jacob suggested an 'html4' template filter which
would just strip trailing slashes from empty tags; I'd be happy with
that (and willing to put in some time to implement it), provided we
advertise clearly that the Django forms system is going to produce
XHTML and that if you want HTML4 you'll need to use the filter.

> Since all these things happily work in browsers the only difference
> between "/>" and the rest is that it is not DTD-valid HTML 4.01. However
> to my puristic point of view this is not a problem because DTD
> validation is effectively useless. The only user agent that does DTD
> validation is W3C's validator itself. No real browser ever considered
> HTML as SGML application and never used DTD for its validation.

Steps to reproduce:

1. Put together an HTML4 document with valid DOCTYPE declaration and everything.
2. Drop a closing slash into a BR, IMG or INPUT element somewhere.
3. Run through nsgmls and look at how it gets parsed

There's a real-world difference there. You may say that nobody's ever
used a real SGML parser on HTML4, but I actually have (in fact, I once
ran into a situation where it was the only way to find a bug that the
standard W3C validator settings couldn't catch), and I know for a fact
that you get different output from an SGML parser than you do from a
web browser. That's an interoperability problem :)

> In fact
> what guys at WHAT WG[1] are doing now for HTML5 is specifying exactly
> the syntax that browsers use for parsing HTML. And "/>" will be valid HTML5.

Yup. I'm subscribed to their mailing list and I've been following the
discussion of that, and the proposal to allow 'xmlns' in HTML, with
some trepidation. These things are necessary now because for years
people have been doing stuff that was demonstrably incorrect but still
"worked". I don't want Django to become part of that crowd.

Antonio Cavedoni

unread,
Dec 5, 2006, 12:39:12 PM12/5/06
to django-d...@googlegroups.com
On 12/5/06, James Bennett <ubern...@gmail.com> wrote:
> On IRC a moment ago, Jacob suggested an 'html4' template filter which
> would just strip trailing slashes from empty tags; I'd be happy with
> that (and willing to put in some time to implement it), provided we
> advertise clearly that the Django forms system is going to produce
> XHTML and that if you want HTML4 you'll need to use the filter.

I posted a tag earlier in this thread that does what you ask (it's
pretty trivial). Are my messages coming through, anyway?
--
Antonio

Fredrik Lundh

unread,
Dec 5, 2006, 12:44:48 PM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:

> There's a real-world difference there. You may say that nobody's ever
> used a real SGML parser on HTML4, but I actually have (in fact, I once
> ran into a situation where it was the only way to find a bug that the
> standard W3C validator settings couldn't catch), and I know for a fact
> that you get different output from an SGML parser than you do from a
> web browser. That's an interoperability problem :)

the Planet RSS aggregator used to use an SGML parser (sgmllib?) to
clean up embedded HTML, which caused rather interesting output when
people used crappy blog tools that inserted <br /> all over the
place.

</F>

Ivan Sagalaev

unread,
Dec 5, 2006, 1:48:02 PM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:
> Yup. And in fact, I do that (quite deliberately).

Nice to meet a like-minded person :-)

> I'm just asking for a simple way to get form inputs without trailing
> slashes

As you said the problem is how to make it simple enough... What about a
middleware that seeing 'text/html' content type would htmlize content?

> There's a real-world difference there. You may say that nobody's ever
> used a real SGML parser on HTML4, but I actually have

In the near future I think The Right Thing would be to use a real HTML
parser for such things. There were many messages on WHATWG list from
people writing such tools in many languages including Python:
http://code.google.com/p/html5lib/

James Bennett

unread,
Dec 5, 2006, 2:25:48 PM12/5/06
to django-d...@googlegroups.com
On 12/5/06, Ivan Sagalaev <Man...@softwaremaniacs.org> wrote:
> In the near future I think The Right Thing would be to use a real HTML
> parser for such things. There were many messages on WHATWG list from
> people writing such tools in many languages including Python:
> http://code.google.com/p/html5lib/

Well... define "near future" ;)

Whenever HTML5-the-specification is finished and
HTML5-the-cross-browser-implementation is available, then yeah,
that'll work. In the meantime, HTML4 with SGML tools is all I've got
available to me, and every once in a while that catches things the W3C
validator's default settings won't notice.

Ivan Sagalaev

unread,
Dec 5, 2006, 2:33:31 PM12/5/06
to django-d...@googlegroups.com
James Bennett wrote:
> Well... define "near future" ;)

When the library will be usable.

> Whenever HTML5-the-specification is finished and
> HTML5-the-cross-browser-implementation is available, then yeah,
> that'll work.

You don't have to wait for this because html5lib would work with
existing content which is pretty much the point of the whole spec.

Anne van Kesteren

unread,
Jan 10, 2007, 9:48:41 AM1/10/07
to Django developers
Rob Hudson wrote:
> James Bennett wrote:
> > So I've been poking around in the newforms code, and it appears that
> > the pre-defined widgets will be producing XHTML-style output.
>
> I had the same thought but I wrote a quick test with an HTML 4 strict
> doc-type, put an input tag in it like this: <input type="text"
> name="mytext" />, and it was still valid.

That might be valid in the SGML sense, but it means something
competely different. See http://hixie.ch/advocacy/xhtml


> One of my worries is that the W3C, from what I've read, is going to
> re-invigorate HTML and work on HTML5 as well as XHTML2. If these two
> diverge enough, Django may need support for both.

Well, I suppose that depends on how many browser implementations XHTML2
will get...

Anne van Kesteren

unread,
Jan 10, 2007, 9:51:13 AM1/10/07
to Django developers
James Bennett wrote:
> WHAT-WG's HTML5 will do away with this and make the closing slash
> semantically meaningless in HTML, but that's still a ways off in the
> future.

Well, it's reflects what's been implemented for years in browsers.

James Bennett

unread,
Jan 10, 2007, 11:22:48 AM1/10/07
to django-d...@googlegroups.com
On 1/10/07, Anne van Kesteren <annevan...@gmail.com> wrote:
> Well, it's reflects what's been implemented for years in browsers.

Yes, but it's still not right ;)

When HTML5 finalizes, then I'll feel a little more comfortable
migrating to it and this won't be an issue anymore. For now, though,
I'm using HTML 4.01 (quite happily, I might add) and running into
annoyance with the fact that both Django's newforms and the old
manipulator system default to XHTML-style tags with no way to override
that.

I may have to just resort to that 'html4' template tag posted further
up in the thread...

Rob Hudson

unread,
Jan 10, 2007, 11:39:43 AM1/10/07
to Django developers
James Bennett wrote:
> I'm using HTML 4.01 (quite happily, I might add) and running into
> annoyance with the fact that both Django's newforms and the old
> manipulator system default to XHTML-style tags with no way to override
> that.

James,

I'm in the same boat. I'm curious why you don't use XHTML?

For me, it's some of the exact reasons that Ian Hickson states, but I
was curious about others.
http://www.hixie.ch/advocacy/xhtml

-Rob

James Bennett

unread,
Jan 10, 2007, 12:06:23 PM1/10/07
to django-d...@googlegroups.com
On 1/10/07, Rob Hudson <trebor...@gmail.com> wrote:
> I'm in the same boat. I'm curious why you don't use XHTML?

In no particular order:

1. I've done the content-negotiation thing before, and I don't really
want to go there again.
2. I don't have need of any XML-specific features, so I don't really
have a valid reason to dump something that's been working remarkably
well up until now.
3. HTML 4.01 lets me be more terse by omitting various tags and other
bits, which appeals to my minimalist side.
4. I just feel like being ornery sometimes.

Jacob Kaplan-Moss

unread,
Jan 10, 2007, 12:14:25 PM1/10/07
to django-d...@googlegroups.com
On 1/10/07 11:06 AM, James Bennett wrote:

> 4. I just feel like being ornery sometimes.

Don't let him fool you -- this is actually reason #1 :)

Jacob

Ivan Sagalaev

unread,
Jan 10, 2007, 1:49:01 PM1/10/07
to django-d...@googlegroups.com
James Bennett wrote:
> 1. I've done the content-negotiation thing before, and I don't really
> want to go there again.
> 2. I don't have need of any XML-specific features, so I don't really
> have a valid reason to dump something that's been working remarkably
> well up until now.
> 3. HTML 4.01 lets me be more terse by omitting various tags and other
> bits, which appeals to my minimalist side.
> 4. I just feel like being ornery sometimes.

Then I don't understand why you still insist on some artificial DTD
validity that doesn't matter anything to anyone except the tool that
checks for it. Ignoring "/>" IS a reality. Whether HTML5 approves it
next year or in ten years won't change a thing...

Michael Radziej

unread,
Jan 10, 2007, 2:09:22 PM1/10/07
to django-d...@googlegroups.com
Ivan Sagalaev schrieb:

> Then I don't understand why you still insist on some artificial DTD
> validity that doesn't matter anything to anyone except the tool that
> checks for it. Ignoring "/>" IS a reality. Whether HTML5 approves it
> next year or in ten years won't change a thing...

For me, it's the difference between specified behaviour and an
accidental implementation detail. Maybe future clients go into a
kind of quirks mode when they see " />" and output a warning that
"this is not proper HTML4 and unsafe to consume"? And they would
even be right.

BTW, what keeps me from XHTML is simply that my javascript
library of choice (yui) doesn't support it in all components. I
feel that django leads one into a trap by rendering XHTML style,
and it's not a good idea. XHTML currently has a tendency to break
your web site when it grows.

Michael


--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100

http://www.noris.de - The IT-Outsourcing Company

Rob Hudson

unread,
Jan 11, 2007, 4:22:35 PM1/11/07
to Django developers
Michael Radziej wrote:
> BTW, what keeps me from XHTML is simply that my javascript
> library of choice (yui) doesn't support it in all components.

Hmmm. Can you elaborate? We're using YUI for a few things as well and
I wasn't aware of this. (We can take this offline if it's preferable.)

> XHTML currently has a tendency to break your web site when it grows.

I'd also like to hear more about this one too.

Michael Radziej

unread,
Jan 11, 2007, 6:01:34 PM1/11/07
to django-d...@googlegroups.com
Hi Rob,

Rob Hudson schrieb:


> Michael Radziej wrote:
>> BTW, what keeps me from XHTML is simply that my javascript
>> library of choice (yui) doesn't support it in all components.
>
> Hmmm. Can you elaborate? We're using YUI for a few things as well and
> I wasn't aware of this. (We can take this offline if it's preferable.)

Menubars don't work in documents delivered as content type
application/xhtml+xml. There's already an entry in the bug database.

>
>> XHTML currently has a tendency to break your web site when it grows.
>
> I'd also like to hear more about this one too.

Well, XHTML not by itself, of course. It's just that JavaScript and CSS
work differently with XHTML or HTML4. But since some well known crap
browsers don't work with XHTML, you are forced to deliver the content
either as application/xhtml+xml or text/html, based upon content type
negotiation.

And when your business buys or has some CSS, the probability is high
that the CSS designer has never heard about XHTML ...

Michael

Gary Wilson

unread,
Jan 11, 2007, 9:58:40 PM1/11/07
to Django developers
James Bennett wrote:
> Unfortunately, I don't really have a good proposal for how to handle
> this, except maybe to further break down the Widget API to include
> 'as_html' and 'as_xhtml'. Any ideas?

I still think that all these "strategies" do not belong in the BaseForm
class. Using a strategy pattern, something like I've suggested before
[1], gives you the same flexibility of subclassing (and maybe more),
yet makes it easier to change existing code to use a new strategy (I
wouldn't have to go changing the classes my Forms inherit).

FormFormatter is to Form as Widget is to Field. We aren't adding
as_CheckboxSelectMultiple-like methods to the Field class, so why are
we doing it with Form?

If we want to be able dynamically specify how your form gets rendered
in the template instead of in the code, then maybe we could introduce a
templatetag:
{% form myform as_xhtml %}
which might be a nice idea for Fields too:
{% field myform.myfield CheckboxSelectMultiple %}

Anyway, this could also probably help the people wanting to grab their
Widget HTML from templates instead of code [2].

[1]
http://groups-beta.google.com/group/django-developers/browse_thread/thread/e3bcd07da81c3275/b13b6385d1b6696e#msg_b13b6385d1b6696e
[2]
http://groups-beta.google.com/group/django-developers/browse_thread/thread/b2ace4f7f69a73f6/8412579768316e9c

Afternoon

unread,
Jan 12, 2007, 6:07:31 AM1/12/07
to Django developers
This seems a long way to go for the want of removing a few
forward-slashes.

XHTML has become the defacto standard for Django, which is great, but
the vast majority of pages are still HTML 4. So if there's to be one
standard it should be that.

ElGranAzul

unread,
Jan 14, 2007, 7:09:31 AM1/14/07
to Django developers
I think that a possible fix to this problem and to the presentation
problem (as_p, as_li, as_whichever_you_like) have been discussed in
this list (or in user list, i don't remember), and is the use of
templates instead of handcode the (X)HTML code in source.

Each widget and form could have a template that could be overrided at
project level (like admin templates, for example), and in template
level, we can simply pass the name of the template we want to use as a
template filter, something like:

{{ form|template:"as_table.html" }} or {{
form[name]|template:"text_input.html" }}

This give all user the option to create or modify the presentation of
their forms fields and a way to refactore and slim the code ;). This
also allow the admins and web developers to create custom forms fields
presentation (AJAX, custom options, etc) without the need of
programming knowledge.

With this, default could be XHTML, but if someone want to create the
same set of templates in HTML he can do it easily, but it can also do t
in XUL or the format he want.

Only my thoughts ;)

Reply all
Reply to author
Forward
0 new messages