Should our default themes avoid the use of strict XHTML and content negotiation?

11 views
Skip to first unread message

Sam Minnee

unread,
Mar 1, 2009, 4:17:07 PM3/1/09
to SilverStripe Development
Hi all,

On this thread, a new users got into CSS trouble, and it looks like it
was due to the content negotiator:
http://www.silverstripe.org/template-questions/show/253456?start=0#post255151

In 2.3.0 we reconfigured the content negotiator to only operate when
you use the <?xml ?> header in your templates. However, our default
template uses this header and we are therefore forcing new users to
grapple with this potentially confusing feature.

Perhaps we should re-work our default template to use HTML 4.01
instead of XHTML with mime-type negotiation? That way, the CSS side
of things will be closer to what people have experienced elsewhere.

Note that I recommend HTML 4.01 over XHTML for the reasons listed in
the http://hixie.ch/advocacy/xhtml article. Although this came out a
few years ago, IE 8 *still* doesn't support XHTML.

Cam Spiers

unread,
Mar 1, 2009, 6:06:02 PM3/1/09
to silverst...@googlegroups.com
It's quite interesting: A bug in IE6 causes an xml declaration to throw IE6 into quirks mode even when the the doc type is not quirks.
This cause issues with IE6 using the old (IE5) box model. Therefore I think that the xml declaration should be left out (but still leave the doctype as XHTML) by default even though it is supposed to be there for XHTML. As a note without the xml declaration your document will still pass XHTML validation (even though it is supposed to be there).

I am basing this information on my experience so it might not be technically correct sorry if not.

Cheers,
Cam

Sam Minnee

unread,
Mar 1, 2009, 6:19:07 PM3/1/09
to SilverStripe Development
Hi Cam,

You might find the article that I linked to in my original post an
interesting read: http://hixie.ch/advocacy/xhtml

In short, if you don't set the mime-type of an XHTML document to
application/xhtml+xml, then the browser will interpret the XHTML file
as plain old HTML, and assume that things like using <br /> instead of
<br> are just tag-soup bugs. And since that's not really advancing
the cause of web standards, it's better just to send the browser an
HTML 4.01 file, since that's what the browser is going to treat your
file as anyway.

Browsers other than IE actually support the use of the application/
xhtml+xml tag, and so you *can* send them XHTML. This has the dubious
benefit of showing you straight away if there are XML parsing errors
in your code. It's a bit of a tighter way of building your mark,
because it doesn't let you get away with leaving XML parse errors in.
But it trips up beginners and generally causes misery. It also means
that IE is going to have *different* mark-up sent to it.

Will Rossiter

unread,
Mar 1, 2009, 6:55:19 PM3/1/09
to silverst...@googlegroups.com
> But it trips up beginners and generally causes misery.

You can see the number of these issues on the forum / IRC logs. While
it encourages better practice from developers - to fix missing tags,
bad encoding etc HTML 4.01 would be much more straightforward for most
SilverStripe developers. And anything that makes life easier is good
news.

Sending XHTML as text/html basically gives up alot of the advantages
of xhtml without gaining alot. HTML 4.01 contains everything that
XHTML 1.0 contains.

That article is a good read :)

Ingo Schommer

unread,
Mar 1, 2009, 7:36:35 PM3/1/09
to silverst...@googlegroups.com
Perhaps we should re-work our default template to use HTML 4.01
instead of XHTML with mime-type negotiation?
What about other markup generated by SS, e.g. Form.ss,
all formfields etc. - we traditionally adhered to XHTML here.

Either way, we'd need some conversion to avoid producing
invalid markup for the given output type- existing users relying on
XHTML doctype would need to convert the (new) HTML4 field templates to XHTML, and vice versa.

The ContentNegotiator currently doesn't fully convert markup HTML4->XHTML
(e.g. <select multiple> wouldn't translate into <select multiple="multiple">).
Self-closing tags just work for <img> and <br> - it would be quite
hard to detect an orphaned tag anyway given arbitrary nesting, right? 
So *if* we want to do any conversion which is turned on by default,
I think the ContentNegotiator class will need some attention.

These kinds of problems have already been solved by libtidy (http://php.net/tidy),
meaning we wouldn't need error-prone self-written regexes in ContentNegotiator
to handle this. Downside: Its not packaged in the standard PHP distribution,
so would have to be treated as an optional setting, which kinda
defeats the purpose...

-------
Ingo Schommer | Senior Developer
SilverStripe

Skype: chillu23

Sam Minnee

unread,
Mar 1, 2009, 9:06:57 PM3/1/09
to SilverStripe Development
> What about other markup generated by SS, e.g. Form.ss,
> all formfields etc. - we traditionally adhered to XHTML here.

Unfortunately, it isn't feasible to come up with a markup that
consistently validates as both XHTML and HTML. "<br />" tags don't
validate in HTML. However, it would be feasible to write mark-up that
could be easily converted from one to the other.

Our standard form templates are going to need to be able to be
delivered as either XHTML or HTML, depending on where they are used.
There are a few ways of doing this:

1) Have separate XHTML and HTML templates.
2) Run rewriting rules on Form::forTemplate()
3) Have a conditional block in the template along the lines of <% if
IsXHTML %><br /><% else %><br><% end_if %>, or possibly specific
variables such as <br$SelfClose>. $SelfClose would evaluate to " /"
in XHTML mode, and "" in HTML mode.

Option (1) has a certain conceptual elegance - it is the option where
the content of your HTML output is closest to the template, there's no
behind the scenes processing to complicate the situation. However,
all the extra templates are a maintenance burden and for that reason
Ingo and I think it's unacceptable.

Option (2) would be straightforward in some ways - an XHTML to HTML
converter is pretty easy to make - however, it adds a piece of "behind
the scenes magic" that I would like to avoid. In particular

Option (3) has a lot of merit: you have a lower maintenance burden,
but there's also no behind the scenes magic that isn't clearly spelled
out in the template. The "<br$SelfClose>" is my favourite, because
it's concise. It would only need to be written by template developers
that are wanting to build dual-format templates, which I would expect
is a group of people that are getting deep into SS, so the fact that
it's an odd syntax that people would need to learn is acceptable IMO.

In order to do any of these, we would probably need a way of detecting
whether an XHTML page or an HTML page was being rendered. One way of
doing this would be to add a hook into SSViewer that would look at the
doctype of the main template (i.e., the one with an "<html>" tag), and
expose that information via a SSViewer::html_doctype_used() method.
We could also encourage people to use $Form.XHTML or $Form.HTML to
include forms in their templates; that way detection isn't necessary.
But it seems a little clumsy.

Ingo Schommer

unread,
Mar 1, 2009, 10:19:02 PM3/1/09
to silverst...@googlegroups.com
Hm, I don't like <br$SelfClose> - its a lot of overhead in quite common
elements (<img>, <br>). This addition means that every developer has to
know about it, and change the way they write HTML. It will clash with
IDE autocompletion, you won't be able to preview/copypaste even the
simplest markup snippets without running it through SSViewer first, etc. 
I can pretty much guarantee that only the most hardout SS devs
will use this notation, which leaves us with invalid markup in certain contexts,
and people having to subclass fields just to change templates for their context (HTML4 vs XHTML).
Also, this adresses only one aspect (self-closing tags), albeit a very common one.
It requires each dev to know the subtle differences between the markup
variations to know where to plug in these special helpers.

I think we should explicitly add an option to mysite/_config.php.
I see the main reason for confusion as the weird autodetection
("if page contains <?xml assume XHTML format...") rather than
the actual rewriting.

For example:
// Rewrites all parsed templates to be HTML 4 compliant.
// Set to 'xhtml' for XML/XHTML compatible output
ContentNegotiator::set_doctype('html4');
This way it would be a bit less "magic" because we're making
people aware of whats going on in a very prominent place,
and tell them how to change it.

As for our "standard format" in markup snippets and templates,
I'd opt for XHTML. Everybody knows how to write it,
its virtually no overhead, and most importantly:
Its much easier to programmatically convert XHTML->HTML4 than HTML4->XHTML.

The decision for the default *output* format can be separate
if we have ContentNegotiator enabled by default - it might
as well be HTML4.

Rewriting on Form::forTemplate() only - I think we will create even
more confusion if templates are *selectively* converted.

Mark Rickerby

unread,
Mar 1, 2009, 10:27:13 PM3/1/09
to silverst...@googlegroups.com
I would favor the rewriting approach, ideally running a filter as late
as possible in the processing pipeline (ie: just before the rendered
template gets cached or output to the response stream).

This would have a marginally higher overhead in terms of page
generation time, but it would keep futzing in templates to a minimum
and be a more reliable way of correcting mistakes/slippages.

I would also suggest using an HTML5 doctype, with HTML4 compatible
markup, and ignoring XHTML altogether in the default themes. Just ask
the question "what specific value does XHTML provide?". The answer is
unclear.

Regards,
Mark

Keri Henare

unread,
Mar 1, 2009, 10:35:24 PM3/1/09
to silverst...@googlegroups.com
> Just ask
> the question "what specific value does XHTML provide?". The answer
> is...
Rules, lots of lovely rules :D

---------------------------------------------------
Keri Henare

[e] ke...@henare.co.nz
[m] 021 874 552
[w] www.kerihenare.com

Sam Minnee

unread,
Mar 1, 2009, 10:41:31 PM3/1/09
to SilverStripe Development
> I would also suggest using an HTML5 doctype, with HTML4 compatible
> markup, and ignoring XHTML altogether in the default themes. Just ask
> the question "what specific value does XHTML provide?". The answer is
> unclear.

Yeah, XHTML is pretty useless and likely to wind up a childless,
barren fork of HTML's family tree. Although I don't know if we want
to be as extreme as saying "You can't make XHTML sites with
SilverStripe", it's probably acceptable to treat it as the 2nd class
citizen, if we have to choose.

Specifically, we make all our templates in HTML - form templates and
default themes - and transform that to XHTML when necessary. In order
to keep things simple, we use "safe HTML", ensuring that we match our
opening and closing tags, etc. This won't be bulletproof for XHTML -
developers will have to manually ensure that the HTML templates can be
easily converted to XHTML. However, we could put together some unit
tests to verify that, and HTML 4.01 & 5 should be everyone's primary
focus.

In terms of the roadmap, I think that we would want to make such a
change on a major release - i.e., 2.4.0. We would want to coincide it
with some kind of announcement explaining our decision to recommend
HTML 4.01 and/or HTML 5 over XHTML, while explaining that you should
still be able to use XHTML with SilverStripe.

Sam Minnee

unread,
Mar 1, 2009, 11:05:08 PM3/1/09
to SilverStripe Development
Having spoken to Ingo about this, here are our thoughts:

* Choose a "recommended HTML doctype" for SilverStripe, most likely
either HTML 4.01 or HTML 5. John Resig has good things to say about
HTML 5's doctype: http://ejohn.org/blog/html5-doctype/

* Update the default theme, CMS, and support templates such as those
used to build the forms to use this doctype

* Replace the content negotiator with an optionally activated "HTML
cleanser", that will look at the final output's doctype, and make some
simple transformations to ensure that the document is valid. The main
focus would be the correction of self-closing tags, however, it may be
best to make use of a more full-featured system such as libtidy for
this. This would only be necessary if you were mixing doctypes - for
example by using an XHTML-based module with an HTML-based theme, or
vice versa.
Reply all
Reply to author
Forward
0 new messages