Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: XHTML 1.0 / 1.1 / 2.0

10 views
Skip to first unread message
Message has been deleted

Jukka K. Korpela

unread,
Sep 12, 2005, 5:56:19 AM9/12/05
to
Buford Early wrote:

> I thought that XHTML 1.1 was the follow-up to XHTML 1.0 and that
> XHTML 2.0 will someday be the follow-up to XHTML 1.1. Am I wrong?

Yes, you are.

XHTML 1.1 is an exercise in futility, a dead end, and both practically
and theoretically useless.

XHTML 2.0 is being designed to be incompatible with every previous HTML
version, though similar enough to confuse people into thinking
otherwise. If it will ever be released, it will most probably be
advertized as a successor of XHTML 1.0, not of XHTML 1.1.

Henri Sivonen

unread,
Sep 12, 2005, 9:14:50 AM9/12/05
to
In article <dg3jb6$3jn$1...@phys-news1.kolumbus.fi>,

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote:

> XHTML 1.1 is an exercise in futility, a dead end, and both practically
> and theoretically useless.

Out of curiosity, do you consider Ruby both practically and
theoretically useless or do you consider the modularization of the DTD
practically and theoretically useless?

To me it seems that Ruby has at least theoretical merit. As for the DTD,
I am a proponent of DTDlessness in languages built on top XML.

> XHTML 2.0 is being designed to be incompatible with every previous HTML
> version, though similar enough to confuse people into thinking
> otherwise.

I think the label "exercise in futility" is particularly appropriate for
XHTML 2.0.

> If it will ever be released,

It will be interesting to see how they intend to come up with two
interoperable implementations. However, it seems unlikely that the HTML
WG would give up on its own initiative.

> it will most probably be
> advertized as a successor of XHTML 1.0, not of XHTML 1.1.

Well, the advertising does not need to be strictly reality-based, so
XHTML 2.0 could be advertised anything. It is already advertised that
"more than 95% of browsers in use, can process new markup languages
without having to be updated".

Bonus XHTML 2.0 link:
http://hades.mn.aptest.com/cgi-bin/xhtml2-issues/DocType?id=7336

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

Guy Macon

unread,
Sep 12, 2005, 12:57:51 PM9/12/05
to


Henri Sivonen wrote:

The snippet [ xml:lang="en-US" ] caught my eye. A Google search
( http://www.google.com/search?q=xml%3Alang%3D%22en-US%22 ) shows
different pages using en-us, en-US and EN-US. Is this case
sensitive in XHTML, and if so, which is correct?


Andreas Prilop

unread,
Sep 12, 2005, 12:59:55 PM9/12/05
to
On Mon, 12 Sep 2005, it was written:

> different pages using en-us, en-US and EN-US. Is this case
> sensitive in XHTML, and if so, which is correct?

Correct is only en-GB-King.

SCNR

Martin Honnen

unread,
Sep 12, 2005, 1:09:24 PM9/12/05
to

Guy Macon wrote:

> The snippet [ xml:lang="en-US" ] caught my eye. A Google search
> ( http://www.google.com/search?q=xml%3Alang%3D%22en-US%22 ) shows
> different pages using en-us, en-US and EN-US. Is this case
> sensitive in XHTML, and if so, which is correct?

xml:lang is an attribute the XML specification itself defines:
<http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag>
For possible values it refers to this RFC
<http://www.ietf.org/rfc/rfc3066.txt>
and that says
"All tags are to be treated as case insensitive; there exist
conventions for capitalization of some of them, but these should not
be taken to carry meaning. For instance, [ISO 3166] recommends that
country codes are capitalized (MN Mongolia), while [ISO 639]
recommends that language codes are written in lower case (mn
Mongolian)."
So whether you use en-us or en-US or EN-US does not matter in terms of
the semantics but the conventions cited suggest en-US.

--

Martin Honnen
http://JavaScript.FAQTs.com/

Jukka K. Korpela

unread,
Sep 12, 2005, 6:15:30 PM9/12/05
to
Henri Sivonen <hsiv...@iki.fi> wrote:

> In article <dg3jb6$3jn$1...@phys-news1.kolumbus.fi>,
> "Jukka K. Korpela" <jkor...@cs.tut.fi> wrote:
>
>> XHTML 1.1 is an exercise in futility, a dead end, and both practically
>> and theoretically useless.
>
> Out of curiosity, do you consider Ruby both practically and
> theoretically useless or do you consider the modularization of the DTD
> practically and theoretically useless?

I wasn't taking any position on those issues; they do not depend on
XHTML 1.1, which is just a pointless mix of "modularizing" XHTML 1.0,
throwing in the Ruby stuff, and making some silent changes, including
changes that make it impossible to use client-side image maps on current
browsers.

If I took a position on those matters, I'd probably say that Ruby looks OK,
except for its semantics, which is undefined. (Is it for East Asian
languages only, or is it generally for interlinear annotations? To say that
it's for both means really that it's not suitable for either purpose.)
And I'd say that modularization of XHTML is a misguided attempt at creating
order in tag soup. (By "tag soup", I mean tagwise thinking, not any
syntactic nuances or even the big picture of syntax. Invent tags, assign
meanings to them as you go, and shake well.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

mbstevens

unread,
Sep 12, 2005, 6:49:37 PM9/12/05
to
Jukka K. Korpela wrote:
> Henri Sivonen <hsiv...@iki.fi> wrote:
>
>
>>In article <dg3jb6$3jn$1...@phys-news1.kolumbus.fi>,
>> "Jukka K. Korpela" <jkor...@cs.tut.fi> wrote:
I'd probably say that Ruby looks OK,
> except for its semantics, which is undefined. (Is it for East Asian
> languages only, or is it generally for interlinear annotations? To say that
> it's for both means really that it's not suitable for either purpose.)

Just as clarification for people not familiar with what's going
on here --

The Ruby _markup_ for Asian languages being spoken of here is
explained at:
http://www.w3.org/TR/ruby/#what

The _language_ Ruby coming out of Japan, is a scripting language
similar in some ways to Perl and Python, and is not the same
thing. You can find out about this at:
http://www.ruby-lang.org/en/20020101.html
--
mbstevens
http://www.mbstevens.com/

Jukka K. Korpela

unread,
Sep 13, 2005, 2:04:20 AM9/13/05
to
mbstevens <NOXweb...@xmbstevensx.com> wrote:

> Jukka K. Korpela wrote:
>> Henri Sivonen <hsiv...@iki.fi> wrote:
>>
>>
>>>In article <dg3jb6$3jn$1...@phys-news1.kolumbus.fi>,
>>> "Jukka K. Korpela" <jkor...@cs.tut.fi> wrote: I'd probably say that
>>> Ruby looks OK, except for its semantics, which is undefined. (Is it
>>> for East Asian
>> languages only, or is it generally for interlinear annotations? To say
>> that it's for both means really that it's not suitable for either
>> purpose.)
>
> Just as clarification for people not familiar with what's going
> on here --
>
> The Ruby _markup_ for Asian languages being spoken of here is
> explained at:
> http://www.w3.org/TR/ruby/#what

The point in my question was whether Ruby markup is for (East) Asian
languages only. The Ruby specification, which you cite, does _not_ answer
this question appropriately. It describes Ruby vaguely, e.g.:

"Ruby is the term used for a run of text that is associated with another
run of text, referred to as the base text. Ruby text is used to provide a
short annotation of the associated base text. It is most often used to
provide a reading (pronunciation guide). Ruby annotations are used
frequently in Japan in many kinds of publications, including books and
magazines. Ruby is also used in China, especially in schoolbooks."

It goes on the explain the origin of the name "ruby" as denoting a font
size of 5.5 points, and later explains that ruby text normally appears
about half the font size of the base text.

Ruby looks _very_ much like an attempt to cover certain notational and
typographic usage of Chinese and Japanese, presenting examples using Latin
letters as artificial illustrations only, and with little or no concern
about the real applicability of Ruby markup for purposes outside
traditional Chinese and Japanese usage of ruby text.

> The _language_ Ruby coming out of Japan, is a scripting language

I don't think there was much danger of confusion with that, but it's always
exciting to hear about new programming languages. :-)

Toby Inkster

unread,
Sep 13, 2005, 3:00:59 AM9/13/05
to
Henri Sivonen wrote:

> Out of curiosity, do you consider Ruby both practically and
> theoretically useless or do you consider the modularization of the DTD
> practically and theoretically useless?

Or the ditching of frames and a transitional DTD?

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Toby Inkster

unread,
Sep 13, 2005, 3:08:25 AM9/13/05
to
Buford Early wrote:

> "A common misconception is that XHTML 1.1 is the latest version
> of the XHTML series. And although it was released a bit more
> than a year later then the first version of XHTML 1.0, the second
> edition is actually newer. Furthermore, XHTML 1.1 is not really
> the follow-up of XHTML 1.0"

The second edition of XHTML 1.0 is not a new standard. It is just a
rewritten, clarified version of the old XHTML 1.0.

XHTML 1.1 is newer as standards go. From an author's point of view it is
almost identical to XHTML 1.0 Strict, but with ruby added, and a handful
of attributes dropped (*@lang, a@name, map@name). Unless you need ruby,
there's not an awful lot of argument in favour of using it. OTOH, unless
you need to support truly ancient browsers, or need to use client-side
image maps, there's not an awful lot argument against it.

Spartanicus

unread,
Sep 13, 2005, 4:28:37 AM9/13/05
to
Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

>XHTML 1.1 is newer as standards go. From an author's point of view it is
>almost identical to XHTML 1.0 Strict, but with ruby added, and a handful
>of attributes dropped (*@lang, a@name, map@name). Unless you need ruby,
>there's not an awful lot of argument in favour of using it. OTOH, unless
>you need to support truly ancient browsers, or need to use client-side
>image maps, there's not an awful lot argument against it.

So far I've not seen you produce an argument that would support ignoring
w3c's guideline not to serve XHTML 1.1 as text/html.

--
Spartanicus

zuki...@webmail.co.za

unread,
Sep 13, 2005, 6:01:22 AM9/13/05
to
You are not wrong the XHTML should be taken serious.

zuki...@webmail.co.za

unread,
Sep 13, 2005, 6:01:23 AM9/13/05
to

zuki...@webmail.co.za

unread,
Sep 13, 2005, 6:01:31 AM9/13/05
to

Guy Macon

unread,
Sep 13, 2005, 6:42:55 AM9/13/05
to


Spartanicus wrote:

(Warning: I am by no means an expert on this, but I seem to muddle
through OK; be sure to check the replies for corrections...)

I cut and pasted this chart from somewhere:
application application
Media Type text/html /xhtml+xml /xml text/xml
HTML 4.01 SHOULD MUST NOT MUST NOT MUST NOT
XHTML 1.0 (HTML Compat.) MAY SHOULD MAY MAY
XHTML 1.0 (other) SHOULD NOT SHOULD MAY MAY
XHTML Basic SHOULD NOT SHOULD MAY MAY
XHTML 1.1 SHOULD NOT SHOULD MAY MAY

...so it seems to me that if you are willing to serve your pages as
application/xhtml+xml then there is no reason to avoid XHTML 1.1,
but if you decide to to serve your pages as text/html then there is
no good reason to use anything except HTML 4.01.

I have been redoing my webpages, and I have decided to have two copies
of each page: one is HTML 4.01 strict served as text/html with a filename
of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
with a filename of *.xhtml - all with appropriatly labelled navigation
links to give the user a choice of versions.

As far as I can tell, if I am careful with my markup, the only
differences will be...

(.html)

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>


(.xhtml)

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">


(.htaccess)

DirectoryIndex index.html index.xhtml
AddType 'text/html; charset=US-ASCII' html
AddType 'application/xhtml+xml; charset=US-ASCII' xhtml

...and I already wrote a macro that makes the two versions.

Corrections/comments/ass-chewings welcome!)

****************************************

(...added later...)
I just did some Googling to see if he above is totally stupid,
and I found the following in an example document:

Example of an XHTML 2.0 document

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css"
href="http://www.w3.org/MarkUp/style/xhtml2.css"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml2.dtd">
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xml:lang="en"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2/
http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"
>

Do I need any of the extra stuff above, or is my simpler version above
suitable for what I am doing? I do use CSS, but no tables, math, etc.


--
Guy Macon <http://www.guymacon.com/>

When it comes to web design, I am is a pretty
good assembly language programmer... :)

Henri Sivonen

unread,
Sep 13, 2005, 7:11:48 AM9/13/05
to
In article <11idb9f...@corp.supernews.com>,

Guy Macon <http://www.guymacon.com/> wrote:

> but if you decide to to serve your pages as text/html then there is
> no good reason to use anything except HTML 4.01.
>
> I have been redoing my webpages, and I have decided to have two copies
> of each page: one is HTML 4.01 strict served as text/html with a filename
> of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
> with a filename of *.xhtml - all with appropriatly labelled navigation
> links to give the user a choice of versions.

Out of curiosity, what use case inspired you to expend the extra effort
the maintain two versions? (I have been making observations about the
subject matter for some time now, and I am always curious about motives
that I may have failed to consider myself.)

BTW, if you are referring to http://www.guymacon.com/ , the site shows
an Apache-generated directory listing.

> Example of an XHTML 2.0 document
>
> <?xml version="1.0" encoding="UTF-8"?>
> <?xml-stylesheet type="text/css"
> href="http://www.w3.org/MarkUp/style/xhtml2.css"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
> "http://www.w3.org/MarkUp/DTD/xhtml2.dtd">
> <html xmlns="http://www.w3.org/2002/06/xhtml2/" xml:lang="en"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2/
> http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"
> >
>
> Do I need any of the extra stuff above, or is my simpler version above
> suitable for what I am doing? I do use CSS, but no tables, math, etc.

That's from a XHTML 2.0 draft and, due to the draft status at the very
least, should not be deployed on the Web.

There's an awful lot of boilerplate cruft. The crux of the matter is

<html xmlns="http://www.w3.org/2002/06/xhtml2/" xml:lang="en">

The rest is some serious cruft. (And some people would categorize the
namespace declaration as cruft, too, along with the whole concept of
namespaces...)

See also
http://copia.ogbuji.net/blog/2005-08-10/Today_s_XM

The issue tracker page I already referred to provides some hints about
the world view of the HTML WG:

Spartanicus

unread,
Sep 13, 2005, 7:22:06 AM9/13/05
to
Guy Macon <http://www.guymacon.com/> wrote:

>...so it seems to me that if you are willing to serve your pages as
>application/xhtml+xml then there is no reason to avoid XHTML 1.1,

IE and certain other browsers can't handle it, bot compatibility is
questionable, and you are disadvantaging users who use a Gecko based
browser since it cannot render XHTML served as such incrementally.

>I have been redoing my webpages, and I have decided to have two copies
>of each page: one is HTML 4.01 strict served as text/html with a filename
>of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
>with a filename of *.xhtml - all with appropriatly labelled navigation
>links to give the user a choice of versions.

Users don't give a flying monkey for code versions, and you'd at least
need to request that SE bots do not index the xhtml pages. So what's the
point?

>As far as I can tell, if I am careful with my markup, the only
>differences will be...
>
>(.html)
>
><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
> <html>
>
>
>(.xhtml)
>
><?xml version="1.0" encoding="US-ASCII"?>
><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
><html xmlns="http://www.w3.org/1999/xhtml">

There are other differences such as mandatory attribute value quoting,
required closing of elements etc.

>Example of an XHTML 2.0 document

XHTML 2.0 is not finished, and currently defined as being non backward
compatible.

--
Spartanicus

Guy Macon

unread,
Sep 13, 2005, 8:09:08 AM9/13/05
to

Henri Sivonen wrote:

>BTW, if you are referring to http://www.guymacon.com/ , the site shows
>an Apache-generated directory listing.

I am in the middle of updating it, and I cleared everything out and
am redoing the entire structure (with 301 redirects so that the old
URLs still work, of course).

> Guy Macon <http://www.guymacon.com/> wrote:
>
>> but if you decide to to serve your pages as text/html then there is
>> no good reason to use anything except HTML 4.01.
>>
>> I have been redoing my webpages, and I have decided to have two copies
>> of each page: one is HTML 4.01 strict served as text/html with a filename
>> of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
>> with a filename of *.xhtml - all with appropriatly labelled navigation
>> links to give the user a choice of versions.
>
>Out of curiosity, what use case inspired you to expend the extra effort
>the maintain two versions? (I have been making observations about the
>subject matter for some time now, and I am always curious about motives
>that I may have failed to consider myself.)

Mine is a special case.

My first motivation: I have a lot of engineers and engineering managers
who examine my markup for clues as to what kind of engineer I am.
Having a link to an XHTML 1.1 version is good PR for me.

My second motivation: I design products that use webpages as the user
interface. People love to be able to fire up the old browser to see
how the robotic assembly line is doing. This makes it important for
me to know about things like the difference between HTML and XHTML,
how to make pages work well with cellphone browsers, etc.

My third motivation; I am a geek and find playing with technology to
be relaxing.

Guy Macon

unread,
Sep 13, 2005, 8:18:42 AM9/13/05
to


Spartanicus wrote:

>Users don't give a flying monkey for code versions,

*Your* users don't give a flying monkey for code versions. *My* users
comment on things like my decision to switch from XHTML-Basic to XHTML
1.0.

>There are other differences such as mandatory attribute value quoting,
>required closing of elements etc.

I am unaware of any difference that makes it so that I cannot write
a simple webpage that works in XHTML 2.0 and HTML 4.01 strict.

>XHTML 2.0 is not finished, and currently defined as being non backward
>compatible.

That's the whole point of having a *.html version and a *.xhtml version.
*My* users will want to know whether their browsers will work with both
- especially when the browser in question is one I wrote to be part of
a children's toy...

Henri Sivonen

unread,
Sep 13, 2005, 9:07:43 AM9/13/05
to
In article
<l1ddi1tvopgk8hoe0...@news.spartanicus.utvinternet.ie>,
Spartanicus <inv...@invalid.invalid> wrote:

> you'd at least
> need to request that SE bots do not index the xhtml pages.

Are there SE bots that support application/xhtml+xml?

Spartanicus

unread,
Sep 13, 2005, 9:57:40 AM9/13/05
to
Henri Sivonen <hsiv...@iki.fi> wrote:

>> you'd at least
>> need to request that SE bots do not index the xhtml pages.
>
>Are there SE bots that support application/xhtml+xml?

Hotbot/Ask Jeeves indexes pages served as such, although I suspect that
it parses as text/html. Google doesn't appear to index them, I haven't
tried any others but it seems likely that Hotbot isn't alone.

http://www.hotbot.com/default.asp?query=XHTML+1.1+Demo+This+XHTML+1.1+document+served+as+application%2Fxhtml%2Bxml+won%27t+be+opened+by+IE%2C+it+will+produce+a+dialog+box+asking+the+user+what+to+do+with+it
[first result]

--
Spartanicus

Andreas Prilop

unread,
Sep 13, 2005, 10:12:03 AM9/13/05
to
On Tue, 13 Sep 2005, Henri Sivonen wrote:

> Are there SE bots that support application/xhtml+xml?

Do they follow the Content-Type header, in the first place?
Google, for example, does not:
http://ppewww.ph.gla.ac.uk/~flavell/www/content-type.html

Both Google's web search and Usenet interface ignore the
Content-Type header, including the charset parameter.

James Pickering

unread,
Sep 13, 2005, 10:20:27 AM9/13/05
to

Guy Macon wrote:

> I have been redoing my webpages, and I have decided to have two copies
> of each page: one is HTML 4.01 strict served as text/html with a filename
> of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
> with a filename of *.xhtml - all with appropriatly labelled navigation
> links to give the user a choice of versions.


Instead of two copies of each page, why not compose individual pages
and serve them via content negotiation?

--
James Pickering
------------------------------------
http://jp29.org/indexbak.php
XHTML 1.0 page served via
content-negotiation with test
results of MIME type serving
to various Browsers
------------------------------------

din...@codesmiths.com

unread,
Sep 13, 2005, 10:51:45 AM9/13/05
to
Guy Macon wrote:

> Having a link to an XHTML 1.1 version is good PR for me.

Is it ? Anyone who cares and understands 1.1 vs 1.0, is (IMHO) likely
to regard 1.0 as the better choice.

James Pickering

unread,
Sep 13, 2005, 10:59:16 AM9/13/05
to
James Pickering wrote:
> Guy Macon wrote:
>
> > I have been redoing my webpages, and I have decided to have two copies
> > of each page: one is HTML 4.01 strict served as text/html with a filename
> > of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
> > with a filename of *.xhtml - all with appropriatly labelled navigation
> > links to give the user a choice of versions.
>
>
> Instead of two copies of each page, why not compose individual pages
> and serve them via content negotiation?

BTW, Neil Crosby's page
http://www.workingwith.me.uk/articles/scripting/mimetypes/
offers PHP coding for serving pages as XHTML (any flavor) or HTML 4.01
via content neotiation.

Guy Macon

unread,
Sep 13, 2005, 11:52:51 AM9/13/05
to


James Pickering wrote:
>
>Guy Macon wrote:
>
>> I have been redoing my webpages, and I have decided to have two copies
>> of each page: one is HTML 4.01 strict served as text/html with a filename
>> of *.html, and the other is XHTML 1.1 served as application/xhtml+xml
>> with a filename of *.xhtml - all with appropriatly labelled navigation
>> links to give the user a choice of versions.
>
>Instead of two copies of each page, why not compose individual pages
>and serve them via content negotiation?

I am not a big fan of content negotiation. Some folks like it, but
I don't trust all browsers to tell the truth about what they will and
will not accept and display

See http://norman.walsh.name/2003/07/02/conneg and
http://wellformedweb.org/news/WebServicesAndContentNegotiation

Nick Kew

unread,
Sep 13, 2005, 11:33:07 AM9/13/05
to

Yeah, but anyone with a clue on the technical level isn't a "PR" target.

--
Not me guv

Toby Inkster

unread,
Sep 13, 2005, 6:10:25 PM9/13/05
to
Spartanicus wrote:

> So far I've not seen you produce an argument that would support ignoring
> w3c's guideline not to serve XHTML 1.1 as text/html.

Here are four...


1. Browser support.
-------------------

If you ignore the XHTML Media Types note, you will be able to
achieve far greater browser support for your XHTML 1.1 documents.

The note states its intention:

| It documents a set of recommendations to maximize the
| interoperability of XHTML documents with regard to
| Internet media types

It fails. It should be ignored.


2. Chronology.
--------------

26 January 2000 - XHTML 1.0 Recommendation. Appendix C says it's
OK to serve XHTML as text/html.

31 May 2001 - XHTML 1.1 Recommendation. In Appendix A, lists
"Changes from XHTML 1.0 Strict". Reversing Appendix C is not listed
as a change. Thus, from May 2001, it is OK to serve XHTML 1.1 as
text/html.

1 August 2002 - XHTML Media Types note says that XHTML 1.1 should
not be served as text/html.

So from June 2001 until July 2002, it was OK to serve XHTML 1.1 as
text/html? But suddenly, on 1 August 2002, it became frowned upon?


3. The Status of the Note.
--------------------------

The W3C's note that says you SHOULD NOT use text/html for XHTML
1.1 does not have "Internet Standard" status, nor even the weaker
"W3C Recommendation" status. It's not even a "Candidate
Recommendation". In fact, it specifically states:

| Publication of this Note by W3C indicates no endorsement
| by W3C or the W3C Team, or any W3C Members. [...] This
| document is not intended to be a normative specification.

If the W3C itself refuses to endorse it, why should I?


4. Simple Logic.
----------------

The following is a valid XHTML 1.0 Strict document:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p>Example</p>
</body>
</html>

The following is a valid XHTML 1.1 document:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<title>Example</title>
</head>
<body>
<p>Example</p>
</body>
</html>

Notice there are only a few bytes of difference between these
two examples. Whatsmore, the differences are only in the DOCTYPE,
which is largely ignored by real user-agents, except for the
misguided practice of DOCTYPE-switching (but note also that all
browsers that employ that technique classify both the above
DOCTYPEs as "strict mode").

If the two files are *essentially* the same, why may one be served
as text/html, but the other never so?


Of the four arguments, the last is my favourite.

Toby Inkster

unread,
Sep 13, 2005, 6:18:44 PM9/13/05
to
Jukka K. Korpela wrote:

> I don't think there was much danger of confusion with that, but it's always
> exciting to hear about new programming languages. :-)

There is certainly potential for such confusion. Ruby has been becoming
rather popular of late, in particular for CGI, but also to an extend in
GUI programming on UNIX.

Andreas Prilop

unread,
Sep 14, 2005, 9:19:52 AM9/14/05
to
On Tue, 13 Sep 2005, Toby Inkster wrote:

> Spartanicus wrote:
>
>> So far I've not seen you produce an argument that would support ignoring

^^^ ^^^^^^^^


>> w3c's guideline not to serve XHTML 1.1 as text/html.

^^^
>
> Here are four...

... arguments in favour or against what?
Three negations in one sentence are too much!

Alan J. Flavell

unread,
Sep 14, 2005, 10:09:11 AM9/14/05
to

As I read it: the hon. Usenaut is trying to say that Appendix C of
XHTML/1.0 applies (or at least /had originally been intended to
apply/) also to future versions of XHTML in perpituity, until the W3C
say it's explicitly countermanded.

My understanding, on the other foot, was that Appendix C of XHTML/1.0
was always intended to be an informative comment about a partially
nobbled form of XHTML/1.0 - and only XHTML/1.0 - which was capable of
being served as text/html and would fool most (i.e non-SGML-aware)
then-available browsers.

Appendix C, being an informative comment about XHTML/1.0, was not part
of its definition, and so it came as no surprise that to me that it
wasn't mentioned in the list of changes (sc. of the definition of
XHTML) which appeared in /1.1. I don't believe this means it was
ever intended to apply to /1.1, nor indeed to later versions.

At any rate, whatever the count of angels on pinheads about the
history, I think the W3C's current stance is clear: the one and only
situation where they approve of XHTML being served temporarily as
text/html is XHTML/1.0-Appendix C. Any other usage is, at the least,
SHOULD NOT.

Not that I really care, since - for production purposes - I'm content
to stay with HTML/4.01 until XHTML is offering some genuinely
deployable benefits.

Spartanicus

unread,
Sep 14, 2005, 10:58:54 AM9/14/05
to
Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

>> So far I've not seen you produce an argument that would support ignoring
>> w3c's guideline not to serve XHTML 1.1 as text/html.
>

> If you ignore the XHTML Media Types note, you will be able to
> achieve far greater browser support for your XHTML 1.1 documents.

1) If client support is the aim then HTML is the answer.
2) The fact that a XHTML document served as text/html can probably be
rendered by HTML clients does not equate to support.
3) The purpose of porting HTML to XML is to enable parsing with an XML
parser. Serving XHTML as text/html causes it to be parsed with the HTML
parser who then has to use it's error correction mechanism to deal with
the invalid HTML that it is being fed.

> The note states its intention:
>
> | It documents a set of recommendations to maximize the
> | interoperability of XHTML documents with regard to
> | Internet media types
>
> It fails. It should be ignored.

It fails if you consider the quote to mean "compatibility with HTML
UAs", but that is not what is written. As I read it the quote refers to
providing guidelines to restrict the use of the various media types with
the various document formats. The resulting better match between the
advertised content type and the actual document type results in better
interoperability.

>2. Chronology.
>--------------
>
> 26 January 2000 - XHTML 1.0 Recommendation. Appendix C says it's
> OK to serve XHTML as text/html.

"5. Compatibility Issues

This section is normative.

Although there is no requirement for XHTML 1.0 documents to be
compatible with existing user agents, in practice this is easy to
accomplish. Guidelines for creating compatible documents can be found in
Appendix C."

Note that it says XHTML 1.0, Appendix C does not extend to all (then)
future versions of XHTML.

>3. The Status of the Note.
>--------------------------
>
> The W3C's note that says you SHOULD NOT use text/html for XHTML
> 1.1 does not have "Internet Standard" status, nor even the weaker
> "W3C Recommendation" status. It's not even a "Candidate
> Recommendation". In fact, it specifically states:
>
> | Publication of this Note by W3C indicates no endorsement
> | by W3C or the W3C Team, or any W3C Members. [...] This
> | document is not intended to be a normative specification.
>
> If the W3C itself refuses to endorse it, why should I?

I'm not sure if w3c has the authority to declare content types as
normative. If as I suspect they don't, the only way they can deal with
the issue is by issuing guidelines in the form of non normative notes.
That should not be used to declare them irrelevant.

Browsers yes, a validator is no less a real UA.

> except for the
> misguided practice of DOCTYPE-switching (but note also that all
> browsers that employ that technique classify both the above
> DOCTYPEs as "strict mode").
>
> If the two files are *essentially* the same, why may one be served
> as text/html, but the other never so?

The differences between the two languages is not the issue, the question
remains: why ignore this particular w3c guideline?

--
Spartanicus

Stewart Gordon

unread,
Sep 14, 2005, 11:07:18 AM9/14/05
to
Looks like my news server was faulty last time. Let's try again....

Jukka K. Korpela wrote:
> Buford Early wrote:
>
>> I thought that XHTML 1.1 was the follow-up to XHTML 1.0 and that
>> XHTML 2.0 will someday be the follow-up to XHTML 1.1. Am I wrong?
>
> Yes, you are.
>
> XHTML 1.1 is an exercise in futility, a dead end, and both practically
> and theoretically useless.
>
> XHTML 2.0 is being designed to be incompatible with every previous HTML
> version, though similar enough to confuse people into thinking
> otherwise. If it will ever be released, it will most probably be
> advertized as a successor of XHTML 1.0, not of XHTML 1.1.

I've just had a look at the W3C draft of XHTML 2.0. It has some things
in common with an HTML replacement I had envisioned:
- most of the design aims
- section elements
- navigation lists (my idea also included auto-generated TOCs that could
be rendered similarly)
- image elements having the alternative text as the content rather than
an attribute

It would appear that the src element of everything is designed to
support client-side include, something else that certainly should be
there. It would appear that the problem with adding it to HTML was
figuring how to support graceful degradation. But making a fresh start
would make this issue irrelevant.

However, if it really is meant to be a fresh start, it would seem silly
that W3C has decided to call it XHTML, rather than making a fresh start
with the name, MIME type and filename extension while at it. (FTM why
is it application/xhtml+xml not text/xhtml?)

Just remembered something else I thought of for my HTML replacement: a
uniform mechanism for conditional inclusion based on what browser
features are enabled.

Stewart.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS-
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox. Please keep replies on
the 'group where everyone may benefit.

Guy Macon

unread,
Sep 14, 2005, 2:22:39 PM9/14/05
to


Alan J. Flavell wrote:

>At any rate, whatever the count of angels on pinheads

Unlike most of the discussion here, angels on pinpoints
(points, not heads!) is a serious question...

http://www.straightdope.com/classics/a4_132.html
http://www.straightdope.com/columns/001110.html
http://www.nytimes.com/learning/students/scienceqa/archive/971111.html
http://www.phrases.org.uk/bulletin_board/13/messages/1512.html
http://www.improb.com/airchives/paperair/volume7/v7i3/angels-7-3.htm

I hope this helps.

:)

Toby Inkster

unread,
Sep 15, 2005, 1:37:45 PM9/15/05
to
Spartanicus wrote:
> Toby Inkster <usenet...@tobyinkster.co.uk> wrote:
>
>>> So far I've not seen you produce an argument that would support ignoring
>>> w3c's guideline not to serve XHTML 1.1 as text/html.
>>
>> If you ignore the XHTML Media Types note, you will be able to
>> achieve far greater browser support for your XHTML 1.1 documents.
>
> 1) If client support is the aim then HTML is the answer.

I don't disagree there. But we're not talking about HTML, we're talking
about XHTML 1.1, and I'm assuming that the document being in XHTML 1.1 is
a given, and running from there...

> 2) The fact that a XHTML document served as text/html can probably be
> rendered by HTML clients does not equate to support.

Perhaps not, but being rendered is closer to being supported than not
being rendered is.

> 3) The purpose of porting HTML to XML is to enable parsing with an XML
> parser. Serving XHTML as text/html causes it to be parsed with the HTML
> parser who then has to use it's error correction mechanism to deal with
> the invalid HTML that it is being fed.

An XML parser should send "Accept: text/xml" or somesuch in its initial
HTTP request, so should get an XML-related Content-Type in return.

I would only advocate sending "Content-Type: text/html" to agents that
don't announce that they support XML via the Accept header.

> I'm not sure if w3c has the authority to declare content types as
> normative.

It doesn't. It's the realm of the IETF.

> If as I suspect they don't, the only way they can deal with
> the issue is by issuing guidelines in the form of non normative notes.

They can issue a recommendation though. The note doesn't even have that
status.

>> If the two files are *essentially* the same, why may one be served
>> as text/html, but the other never so?
>
> The differences between the two languages is not the issue, the question
> remains: why ignore this particular w3c guideline?

Because it's the only silly one.

Spartanicus

unread,
Sep 15, 2005, 3:47:07 PM9/15/05
to
Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

>>> If you ignore the XHTML Media Types note, you will be able to
>>> achieve far greater browser support for your XHTML 1.1 documents.
>>
>> 1) If client support is the aim then HTML is the answer.
>
>I don't disagree there. But we're not talking about HTML, we're talking
>about XHTML 1.1, and I'm assuming that the document being in XHTML 1.1 is
>a given, and running from there...

I can't imagine a situation where 1.1 is a given, it's a choice.

>> 2) The fact that a XHTML document served as text/html can probably be
>> rendered by HTML clients does not equate to support.
>
>Perhaps not, but being rendered is closer to being supported than not
>being rendered is.

Forgive the repetition, but if not being rendered is a worry, then HTML
is the solution.

>> 3) The purpose of porting HTML to XML is to enable parsing with an XML
>> parser. Serving XHTML as text/html causes it to be parsed with the HTML
>> parser who then has to use it's error correction mechanism to deal with
>> the invalid HTML that it is being fed.
>
>An XML parser should send "Accept: text/xml" or somesuch in its initial
>HTTP request, so should get an XML-related Content-Type in return.
>
>I would only advocate sending "Content-Type: text/html" to agents that
>don't announce that they support XML via the Accept header.

Content negotiation comes with it's own risks and issues. We know about
IE's broken accept string, and we can work around it. But who's to say
that there aren't any other clients with incorrect accept values? There
is the server overhead to consider, and the potential cache issues.

If content negotiation is to be used at all, then it's a small step to
generate HTML from the XHTML and feed that to clients who want HTML.

Then there is Gecko, it's happy to accept XHTML served as such, but
that's doing the user a disservice since Gecko currently cannot render
XHTML incrementally.

>> I'm not sure if w3c has the authority to declare content types as
>> normative.
>
>It doesn't. It's the realm of the IETF.

I've lost track of who is in charge of that, I seem to remember that the
situation got messy, with some established types having been declared as
being under the patronage of the w3c.

--
Spartanicus

Toby Inkster

unread,
Sep 15, 2005, 6:20:29 PM9/15/05
to
Stewart Gordon wrote:

> filename extension

What's one of them? And what has it goe to do with the WWW?

Jan Roland Eriksson

unread,
Sep 15, 2005, 7:34:58 PM9/15/05
to
On Thu, 15 Sep 2005 19:47:07 GMT, Spartanicus <inv...@invalid.invalid>
wrote:

>Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

[quoting from previous source]

>>> I'm not sure if w3c has the authority to declare content types as
>>> normative.

>>It doesn't. It's the realm of the IETF.

Well; as I have understood it, there is an organization IANA that
handles, or at least registers, things around MIME types and stuff like
that.

>I've lost track of who is in charge of that, I seem to remember that the
>situation got messy, with some established types having been declared as
>being under the patronage of the w3c.

You mean like the situation of...

http://www.ietf.org/rfc/rfc2854.txt

...where W3C is trying, in an IETF RFC (Request For Comments), to "burn
the books" (like some old Chinese Emperor) of what was created before
the "divine rise" of W3C, to make history start with W3C.

There is one very specific standards track document available here...

http://www.ietf.org/rfc/rfc1866.txt

...that at a technical level specifies the absolute best documented HTML
standard we have ever had. Dan Connolly worked his butt off to gather
all comments he got and to edit out what became the final edition of
that RFC, AND managed to get it into a "standards track" status at
IETF[1].

It's really sad to see Dan's name in RFC2854...

W3C shall be taken for what it is; a conglomerate of high buck companies
that do not allow ordinary human beings as members and that will shell
in money to operate W3C for as long as it primarily serves their own
purposes at an acceptable level.

--
Rex


Henri Sivonen

unread,
Sep 16, 2005, 1:59:10 AM9/16/05
to
In article <dg9eb6$fqd$1...@sun-cc204.lut.ac.uk>,
Stewart Gordon <smjg...@yahoo.com> wrote:

> (FTM why is it application/xhtml+xml not text/xhtml?)

+xml is the convention for MIME types for languages specified on top of
XML. The rules of text/* do not make sense for XML--hence application.

Stewart Gordon

unread,
Sep 16, 2005, 6:22:21 AM9/16/05
to
Toby Inkster wrote:
> Stewart Gordon wrote:
>
>> filename extension
>
> What's one of them?

http://www.foldoc.org/?filename+extension

What do you call it?

> And what has it goe to do with the WWW?

At least two things:

- Thousands, if not millions, of web servers determine the MIME type
from the filename extension in the default configuration (or at least
the configuration that a hosting provider presents to its customers as
the default).

- Thousands, if not millions, of website developers test their pages
locally before uploading them to the WWW. Except in the case of those
who need to test server-side processing and so have a staging server,
the filename extension is the only way the browser has of determining
the MIME type.

Alan J. Flavell

unread,
Sep 16, 2005, 8:57:28 AM9/16/05
to
On Fri, 16 Sep 2005, Stewart Gordon wrote:

> - Thousands, if not millions, of website developers test their pages
> locally before uploading them to the WWW.

I'd rate browsing local files as suboptimal [1]

> Except in the case of those who need to test server-side processing
> and so have a staging server, the filename extension is the only way
> the browser has of determining the MIME type.

I recommend that they, practically without exception, should run a
local web server (accessible only from 127.0.0.1 if they have no
reason to do otherwise) on their authoring platform, configured as
closely as possible to their production server (as to default
filename, as to media content type versus filename "extension", and so
on). Sooner or later they'd probably want PHP or CGI or SSI or
whatever, anyway, so it makes sense to start on the right foot.
They can also (taking Apache as a paradigm) develop their .htaccess
file in parallel.

And this has the side benefit, if the configuration is close enough to
the production server, of offering a decent chance of exposing any
mistakes with resulting HTTP headers - not only the obvious ones like
Content-type with its associated charset attribute, but also any
expires or cache-control headers that one wants to develop - I'd refer
you to Chris Pederick's web developer toolbar for Mozzi or Firefox for
a convenient way of viewing these while developing.

Installing win32 Apache on my laptop took only moments, to take just
one case in point.

good luck.

[1] By way of illustration, just one example. Wherever a natural web
author would code a local URL as e.g whatever/ , they are forced to
code e.g whatever/index.html , (replace "index.html" with what their
server uses as its default), otherwise their local testing fails.

Then they upload that, and the web is awash with URLs like
http://some.example/whatever/index.html , rather than the nicer
http://some.example/whatever/ - which, at least for some of us,
offends our sense of URL aesthetics - such internal details should
better not be exposed (and they might later change, e.g to index.shtml
or index.php, without any need for a visible external difference).

And coding "absolute local" URLs e.g /specimen/something tends cause
difficulties too.

Don't get me started on Win-based authors who code URLs with
backslashes. That too seems to be promoted by browsing files locally,
but will rightly result in errors if a proper server is used (and
browsed with a proper browser, I mean).

Andreas Prilop

unread,
Sep 16, 2005, 9:58:36 AM9/16/05
to
On Fri, 16 Sep 2005, Alan J. Flavell wrote:

> Then they upload that, and the web is awash with URLs like
> http://some.example/whatever/index.html , rather than the nicer
> http://some.example/whatever/

I cannot resist ...
http://www.google.com/search?q=index.html
Note that Google lists most (all?) URLs without "index.html".
So we have an uncessary duplication (if not triplication)
of URLs.

Stewart Gordon

unread,
Sep 16, 2005, 10:48:16 AM9/16/05
to
Alan J. Flavell wrote:
> On Fri, 16 Sep 2005, Stewart Gordon wrote:
>
>> - Thousands, if not millions, of website developers test their pages
>> locally before uploading them to the WWW.
>
> I'd rate browsing local files as suboptimal [1]

And I expect plenty of hobbyists who don't do server-side programming
either don't know where to start with finding/installing a staging
server or rate the idea as overkill.

>> Except in the case of those who need to test server-side processing
>> and so have a staging server, the filename extension is the only way
>> the browser has of determining the MIME type.
>
> I recommend that they, practically without exception, should run a
> local web server (accessible only from 127.0.0.1 if they have no
> reason to do otherwise) on their authoring platform, configured as
> closely as possible to their production server (as to default
> filename, as to media content type versus filename "extension", and so
> on). Sooner or later they'd probably want PHP or CGI or SSI or
> whatever, anyway, so it makes sense to start on the right foot.
> They can also (taking Apache as a paradigm) develop their .htaccess
> file in parallel.

Is it possible/easy to get hold of a staging server for Windows that
emulates a Unix webserver, or vice versa? Stuff like case-sensitivity....

> And this has the side benefit, if the configuration is close enough to
> the production server, of offering a decent chance of exposing any
> mistakes with resulting HTTP headers - not only the obvious ones like
> Content-type with its associated charset attribute, but also any
> expires or cache-control headers that one wants to develop - I'd refer
> you to Chris Pederick's web developer toolbar for Mozzi or Firefox for
> a convenient way of viewing these while developing.

I hate cache control. All too often it tends to break the back button.

> Installing win32 Apache on my laptop took only moments, to take just
> one case in point.
>
> good luck.
>
> [1] By way of illustration, just one example. Wherever a natural web
> author would code a local URL as e.g whatever/ , they are forced to
> code e.g whatever/index.html , (replace "index.html" with what their
> server uses as its default), otherwise their local testing fails.

Yes, that's an issue that bugs many. However, that's different in that
it's straightforward for web browser developers to fix their products to
handle it.

Meanwhile, on those projects where I've grown out of linking to
index.html (yes, that was to help with local testing), I tweak the URL
bar after following such a link.

> Then they upload that, and the web is awash with URLs like
> http://some.example/whatever/index.html , rather than the nicer
> http://some.example/whatever/ - which, at least for some of us,
> offends our sense of URL aesthetics - such internal details should
> better not be exposed (and they might later change, e.g to index.shtml
> or index.php, without any need for a visible external difference).

You might later change non-index pages between .html, .shtml, .php,
.asp, etc. just the same.

> And coding "absolute local" URLs e.g /specimen/something tends cause
> difficulties too.
>
> Don't get me started on Win-based authors who code URLs with
> backslashes. That too seems to be promoted by browsing files locally,
> but will rightly result in errors if a proper server is used (and
> browsed with a proper browser, I mean).

I wasn't going to. Such people shouldn't be let loose on a webhost anyway.

Alan J. Flavell

unread,
Sep 16, 2005, 11:18:20 AM9/16/05
to
On Fri, 16 Sep 2005, Stewart Gordon wrote:

> Alan J. Flavell wrote:
[...]


>
> Is it possible/easy to get hold of a staging server for Windows that
> emulates a Unix webserver, or vice versa?

see below re win32 Apache, close enough to the Apache-based servers
which most service providers seem to use. Certainly I've never had
any real problems relative to the linux-based production Apache server
which we run at our research group.

> Stuff like case-sensitivity....

It's close enough to real life for me. If you insist on having two
URLs that differ only in letter case, then you'd probably have
difficulties, I suppose. I never tried it. Best read the
win32-specific release notes if you want to know the sordid details.

[...]


> I hate cache control. All too often it tends to break the back
> button.

I didn't say you *have* to develop it. Just one of the many things
you /can/ do, once the local server is there. Nothing wrong with
Expires headers, and sensibly-chosen cache controls. I'm not talking
about crude cache-busting weapons!

I said:
> > Installing win32 Apache on my laptop took only moments, to take
> > just one case in point.

If you get Indigoperl, then you even get Apache and Perl bundled in
one convenient package. Personally, I already had Activestate Perl
installed, so I just installed the Win32 httpd download from Apache
HQ.

> > Then they upload that, and the web is awash with URLs like
> > http://some.example/whatever/index.html , rather than the nicer
> > http://some.example/whatever/ - which, at least for some of us,
> > offends our sense of URL aesthetics - such internal details should
> > better not be exposed (and they might later change, e.g to
> > index.shtml or index.php, without any need for a visible external
> > difference).
>
> You might later change non-index pages between .html, .shtml, .php,
> .asp, etc. just the same.

Indeed.

Which is why some folks promote the idea of activating MultiViews in
Apache. Even if there's only one "variant" (let's say example.html),
Apache will happily serve-out the URL "example" , and when you one day
change the .html to .php, the external URL won't need to change. Of
course MultiViews can do much more than that, if you have several
variants available (different languages, whatever), but that goes
beyond the present topic.

Confession: although I think that's a good idea, I've never done it
right across the server, although there are some places where it's
used to good effect.

hope that helps

Toby Inkster

unread,
Sep 16, 2005, 2:37:11 PM9/16/05
to
Stewart Gordon wrote:
> Toby Inkster wrote:
>> Stewart Gordon wrote:
>>
>>> filename extension
>>
>> What's one of them?
>
> http://www.foldoc.org/?filename+extension
> What do you call it?

"Filename that happens to have a dot (which has no technical significance)
in it."

>> And what has it goe to do with the WWW?
>
> At least two things:

They may effect the way some people use their web servers, but they're of
no real consequence to the way the WWW works. Some people may choose to
end their XHTML2 file names with ".xhtml", others with ".xhtml2" others
with ".page", others with ".doc" and others with nothing at all. Still
others may not keep their XHTML2 in files, but rather use some other
mechanism for looking up and serving XHTML content.

Spartanicus

unread,
Sep 16, 2005, 3:23:28 PM9/16/05
to
Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

[file extensions]

>They may effect the way some people use their web servers, but they're of
>no real consequence to the way the WWW works.

But they may effect how web clients behave. Extension sniffing is a well
known phenomena, and it's not just IE that does that.

A recent example (told here IIRC) was that some versions of IE render
XHTML served as application/xml+xhtml if the file used an .html
"extension".

--
Spartanicus

Nick Kew

unread,
Sep 16, 2005, 8:10:20 PM9/16/05
to
Spartanicus wrote:
> Toby Inkster <usenet...@tobyinkster.co.uk> wrote:
>
> [file extensions]
>
>
>>They may effect the way some people use their web servers, but they're of
>>no real consequence to the way the WWW works.
>
>
> But they may effect how web clients behave. Extension sniffing is a well
> known phenomena, and it's not just IE that does that.

Ignoring Content-Type is well known to be a serious security risk,
and indeed was extensively documented as such in 1992. It is precisely
this that spawned the first generation of big 'net-borne windoze worms.
And it's windoze's continuing failure to implement this mandatory part
of several Internet specs (including HTTP) that turns virus protection
in software from the trivial task it should be to the nightmare it is
on windoze.

--
Nick Kew

Alan J. Flavell

unread,
Sep 17, 2005, 5:54:01 AM9/17/05
to
On Fri, 16 Sep 2005, Toby Inkster wrote:

> "Filename that happens to have a dot (which has no technical
> significance) in it."
>
> >> And what has it goe to do with the WWW?

> Stewart Gordon wrote:

> > At least two things:
>
> They may effect the way some people use their web servers, but
> they're of no real consequence to the way the WWW works.

Correct. But it's worse than that.

Any influence which the filename extensions coming from the server
side may appear to have at the client side are *BUGS*, in terms of the
WWW interworking specifications, and - in as much as they just might
cause a web resource to be processed in a *wrong* context - such bugs
could well prove to be a security exposure, over and above the known
risks inherent in the web interworking specifications themselves.

> Some people may choose to end their XHTML2 file names with ".xhtml",
> others with ".xhtml2" others with ".page", others with ".doc" and
> others with nothing at all.

Indeed. I'm increasingly attracted by the idea of using Apache
Multiviews, even when there's only one document variant available, and
avoiding exposing the "filename extension" to the web.

Those users who need a filename extension when doing a "save As" of
some web resource, ought to be providing their /own/ extension, one
that makes sense in /their/ own context, irrespective of what's done
at the server. If their software does it for them, then "fine" - if
their software just blindly copies what's coming from the server, then
"not fine": sure, most of the time it may very well be OK, but it's a
potential exposure, which could be exploited by a malicious web site.

> Still others may not keep their XHTML2 in files, but rather use some
> other mechanism for looking up and serving XHTML content.

Certainly. (But I don't want to comment specifically on XHTML2 right
now...)

all the best

Spartanicus

unread,
Sep 17, 2005, 6:51:33 AM9/17/05
to
Nick Kew <ni...@asgard.webthing.com> wrote:

>> [file extensions]
>>
>>
>>>They may effect the way some people use their web servers, but they're of
>>>no real consequence to the way the WWW works.
>>
>>
>> But they may effect how web clients behave. Extension sniffing is a well
>> known phenomena, and it's not just IE that does that.
>
>Ignoring Content-Type is well known to be a serious security risk,

Is it? Given that it's a doddle to serve malicious content with the
wrong Content-Type, it seems to me that the vulnerability of the client
determines the security risk.

--
Spartanicus

Tim

unread,
Sep 17, 2005, 8:38:47 AM9/17/05
to
On Sat, 17 Sep 2005 10:54:01 +0100, Alan J. Flavell sent:

> Indeed. I'm increasingly attracted by the idea of using Apache
> Multiviews, even when there's only one document variant available, and
> avoiding exposing the "filename extension" to the web.

That's something I've done for a while, ostensibly so that now I can make
a HTML document about something, later it might be written using something
else, but the URI will be consistent.

However, I've discovered a problem with Apache 1.3 (which my host uses),
related to language: My pages are English, and described as such. If
someone browses using a browser configured only for some other language,
they don't get mine regardless, they get a 406 error. Apache 2 does allow
the webmaster to preselect the document you'll get regardless, when
there's no other obviously suitable choice.

I'd say this is a fault in two halves: The browser for not promoting the
concept of flagging more than one language when you configure it, and
the older webserver for not having fallback options.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please destroy some files yourself.

Alan J. Flavell

unread,
Sep 17, 2005, 10:06:00 AM9/17/05
to
On Sat, 17 Sep 2005, Tim wrote:

> On Sat, 17 Sep 2005 10:54:01 +0100, Alan J. Flavell sent:
>
> > Indeed. I'm increasingly attracted by the idea of using Apache
> > Multiviews, even when there's only one document variant available, and
> > avoiding exposing the "filename extension" to the web.

[...]

> However, I've discovered a problem with Apache 1.3 (which my host uses),
> related to language: My pages are English, and described as such. If
> someone browses using a browser configured only for some other language,
> they don't get mine regardless, they get a 406 error.

My solution for that is described under
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html

I take it your current pages are named internally as foobar.html.en or
foobar.en.html ? Then symlink foobar.html.html to them, and it will
work. Something like this:

ln -s foobar.html.en foobar.html.html


> Apache 2 does allow
> the webmaster to preselect the document you'll get regardless, when
> there's no other obviously suitable choice.

Indeed, but, before we moved to 2, I needed a solution for 1.3



> I'd say this is a fault in two halves: The browser for not promoting the
> concept of flagging more than one language when you configure it,

Agreed. It's worse than that! For example, USA users get their
MeSsIE configured for them to say that they accept only en-US. So when
I honestly advertise my pages as en-GB, they get told there is no
acceptable language available for them. Sigh.

> and the older webserver for not having fallback options.

Nevertheless, the workaround is simple enough and it works (under
unix-like OSes, anyway - I'm not sure how that works out on win32
platforms, I didn't actually try it, knowing that "shortcuts" aren't
really a direct synomym of what in unix-like file systems we call a
symlink or soft link).

Richard Cornford

unread,
Sep 17, 2005, 10:24:36 AM9/17/05
to
Spartanicus wrote:
> Toby Inkster wrote:
<snip>

>>I would only advocate sending "Content-Type: text/html" to
>>agents that don't announce that they support XML via the
>>Accept header.

Don't current Opera versions announce that they support XHTML while
stating a preference for HTML? Shouldn't sensible content negotiation
only send XHTML in place of HTML when the UA expresses both support for
it _and_ a preference for receiving it?

> Content negotiation comes with it's own risks and issues.
> We know about IE's broken accept string, and we can work
> around it. But who's to say that there aren't any other
> clients with incorrect accept values? There is the server
> overhead to consider, and the potential cache issues.

The notion that you can write static pages in Appendix C-style XHTML and
use content negotiation to decide which content-type header to send with
them really stands a very good chance of coming unstuck whenever there
is an intention to script those pages. A SCRIPT element in a static page
can only reference one URL but a browser receiving an HTML content type
will create an HTML DOM to be scripted and a browser receiving an XHTML
content-type will create an XHTML DOM to be scripted.

The single script file referenced from a static URL in the page will
struggle to accommodate both DOM types (as they need to be scripted very
differently). It is much less trouble (and probably about equivalent
work) to create two script resources, one for each DOM type, and serve
HTML DOM scripts to pages that have been served as HTML content types,
and XHTML DOM scripts to pages served as XHTML content types.
Unfortunately there is little reason to expect a browser's request for a
script to contain information that could be used to negotiate which
style of script to send (a script is a script, is a script, whichever
type of DOM is trying to load it). So serving scripts depending on which
content type you previously sent with the page that wants to load the
script becomes a problem.

Session tracking could be used; remembering which content-type was sent
with the page and then sending a particular script version when the
request came back for the script in the same session. Session tracking
by URL re-writing would impact on the client-side caching of the script
(and you would normally want scripts cached on the client if possible)
and cookie-based session tracking might result in intermediate caches
serving the wrong script type to some users.

And the alternative is:-

> If content negotiation is to be used at all, then it's a
> small step to generate HTML from the XHTML and feed that
> to clients who want HTML.

- in which you use an explicitly different script URL with each type of
mark-up. It can be served from a cache if available with out any risk of
getting the wrong script for the DOM type being scripted, and without
requiring any additional effort to mach script requests with previous
page requests.

But what has been done here? A requirement to provide essentially static
marked-up content with a script now involves dynamically generating (or
pre-processing two versions of) the pages, writing/testing two versions
of the same script and including server-side scripts (or a considerably
more involved server configuration)(with an inevitable increase in
server load) just to make it possible to serve the same content as HTML
and XHTML, with no perceivable difference to the user's experience of
the results. And that is assuming that all the added complexity works
the way it was intended 100% of the time.

So, we can content negotiate but what is the reward for all of that
extra trouble and expense?

Richard.


Spartanicus

unread,
Sep 17, 2005, 12:44:51 PM9/17/05
to
"Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:

>Don't current Opera versions announce that they support XHTML while
>stating a preference for HTML?

<= 7.2x do IIRC, more recent versions don't.

>Shouldn't sensible content negotiation
>only send XHTML in place of HTML when the UA expresses both support for
>it _and_ a preference for receiving it?

It shouldn't matter theoretically, if practicality is an issue (as it
should be), then the whole XHTML exercise is imo at best pointless.

>The notion that you can write static pages in Appendix C-style XHTML and
>use content negotiation to decide which content-type header to send with
>them really stands a very good chance of coming unstuck whenever there
>is an intention to script those pages.

Indeed, I've got a link on one of my pages to your fine explanation of
the scripting issue a while back in alt.html.

>A SCRIPT element in a static page
>can only reference one URL but a browser receiving an HTML content type
>will create an HTML DOM to be scripted and a browser receiving an XHTML
>content-type will create an XHTML DOM to be scripted.
>
>The single script file referenced from a static URL in the page will
>struggle to accommodate both DOM types (as they need to be scripted very
>differently). It is much less trouble (and probably about equivalent
>work) to create two script resources, one for each DOM type, and serve
>HTML DOM scripts to pages that have been served as HTML content types,
>and XHTML DOM scripts to pages served as XHTML content types.
>Unfortunately there is little reason to expect a browser's request for a
>script to contain information that could be used to negotiate which
>style of script to send (a script is a script, is a script, whichever
>type of DOM is trying to load it). So serving scripts depending on which
>content type you previously sent with the page that wants to load the
>script becomes a problem.

I don't think you covered this specific issue in the alt.html post, I'll
add another link to this further explanation :)

>But what has been done here? A requirement to provide essentially static
>marked-up content with a script now involves dynamically generating (or
>pre-processing two versions of) the pages, writing/testing two versions
>of the same script and including server-side scripts (or a considerably
>more involved server configuration)(with an inevitable increase in
>server load) just to make it possible to serve the same content as HTML
>and XHTML, with no perceivable difference to the user's experience of
>the results. And that is assuming that all the added complexity works
>the way it was intended 100% of the time.
>
>So, we can content negotiate but what is the reward for all of that
>extra trouble and expense?

Indeed.

--
Spartanicus

Toby Inkster

unread,
Sep 18, 2005, 3:34:07 AM9/18/05
to
Richard Cornford wrote:

> Don't current Opera versions announce that they support XHTML while
> stating a preference for HTML?

Current ones: no.

> Shouldn't sensible content negotiation only send XHTML in place of HTML
> when the UA expresses both support for it _and_ a preference for
> receiving it?

It should definitely take browser preference into account, but not
necessarily treat it as a gold standard.

For example, say I'm running a site with mulitple languages: English,
French and German. I create a new page for my site; my French translator
translates it into French; my German translator is on holiday, so I run
the page through an automatic translator to create a temporary poor German
translation, which will be replaced by a good translation at a later date.

If somebody visits my site using:

Accept-Language: de;q=1.0, en;q=0.9, it;q=0.1

Then it might be more sensible to send the English than the German. Apache
allows you to specify such things using a type-map file and the 'qs'
parameter.

URI: mypage.html

URI: mypage.en.html
Content-Language: en; q=1.0

URI: mypage.fr.html
Content-Language: fr; q=0.9

URI: mypage.de.html
Content-Language: de; q=0.2

Apache would multiply each q value with qs values to give:

de: 1.0 * 0.2 = 0.2
en: 0.9 * 1.0 = 0.9
fr: 0.0 * 0.9 = 0.0
it: 0.1 * 0.0 = 0.0

and serve up English.

Alan J. Flavell

unread,
Sep 18, 2005, 4:41:50 AM9/18/05
to
On Sun, 18 Sep 2005, Toby Inkster wrote:

> Richard Cornford wrote:
>
> > Shouldn't sensible content negotiation only send XHTML in place of
> > HTML when the UA expresses both support for it _and_ a preference
> > for receiving it?

What does the interworking specification say?

There may be some fine-details that aren't quite clear in the
specification, but I don't think any of them prevent us from answering
any of the questions raised on this thread.

> It should definitely take browser preference into account,

There /is/ a published interworking specification for content-type
negotiation. It isn't exactly new! Client agents get what they asked
for[1]. If it isn't what they wanted, they shouldn't have asked for
it! If they unilaterally make requests over which their user has no
control, the user might be well advised to get a better browser.

> but not necessarily treat it as a gold standard.

Sure. I'd recommend always offering alternative ways of accessing the
other variants.

> For example, say I'm running a site with mulitple languages:
> English, French and German. I create a new page for my site; my
> French translator translates it into French; my German translator is
> on holiday, so I run the page through an automatic translator to
> create a temporary poor German translation, which will be replaced
> by a good translation at a later date.

[snip details of qs= negotiation]

Indeed. But I repeat (and I say this also on my page
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html ) that
negotiation should not be relied on as the sole way of accessing
the desired variant: alternative ways (i.e usually explicit links)
for users to choose the variants should also be provided.


[1] I note that in IE's case it means (seeing that our campus standard
MS Windows installation includes MS Office) that they always get MS
Word format, if available, in preference to HTML. Well, if that's
what they want, who am I to argue? :-}

Richard Cornford

unread,
Sep 18, 2005, 9:24:04 AM9/18/05
to
Spartanicus wrote:
> "Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:
>
>>Don't current Opera versions announce that they support
>>XHTML while stating a preference for HTML?
>
> <= 7.2x do IIRC, more recent versions don't.

I haven't looked at Opera's Accept header recently so I can accept that
I am out of date here.

>>Shouldn't sensible content negotiation only send
>> XHTML in place of HTML when the UA expresses both support
>>for it _and_ a preference for receiving it?
>
> It shouldn't matter theoretically,

It is possibly interesting to note that while early versions of Opera 7
would render XHTML they would not script it, and intermediate versions
would allow the scripting of intrinsic event attributes but would not
recognise (or at least act upon) SCRIPT elements, while the latest
versions allow XHTML to be fully scripted. Given that Opera 7+'s HTML
DOM was always scriptable (subject to scripting being enabled) that may
have been part of the reason for the browser expressing a preference for
HTML over XHTML in the past. It may also be a reason for other browsers
to express the same preference in the future and so a good practical
reason for content negotiators to observe the expressed preference.

> if practicality is an issue (as it should be),
> then the whole XHTML exercise is imo at best pointless.

Even though my expressed attitude towards XHTML may seem negative I am
not of the opinion that it is a bad idea as such. As a programmer I
quite like the idea of a more formally rigid mark-up language, where a
syntax error (or its equivalent) is fatal and final (and so needs to be
fixed on the spot). I would like to see a more disciplined approach to
web authoring where the results of guesswork are not more often as not
perceived as "working", and must be replaced by decision making based on
a technical understanding. I am not entirely convinced that XHTML will
deliver that in reality but I am not opposed to the experiment.

(Of course HTML does not deny the possibility of formal rigour or
informed authoring decisions, but an awful lot of web sites are created
without either)

My concerns almost entirely stem form practicalities; in a public
commercial context IE is so significant that it must be accommodated,
and IE doesn't understand XHTML at all. And so if XHTML is to be used at
all it must be XHTML for those that understand it and HTML (possibly
formally malformed HTML in the guise of Appendix C style XHTML) for IE
(or formally malformed HTML in the guise of XHTML for all). And that
strikes me as introducing so many issues (and particularly in my own
area of browser scripting), and so few rewards (none as far as I am
concerned) then using XHTML now seems like bad idea.

If, at some future time, it is possible to serve XHTML as
application/xhtml+xml with the expectation that all (or at least the
vast majority of) web browsers understand it, then the practical issues
become insignificant. (Issues around how well designed XHTML is remain)

The decision as to whether that future may eventually become a reality
is entirely down to Microsoft. If they introduce a browser that renders
XHTML then it may become viable in a commercial context and a switchover
might be perceived as having benefits. However, if they do that but take
their usual attitude and use a parser that error-corrects any old
nonsense into something useable/renderable, and compromise their XHTML
DOM implementation with all of the shortcuts, etc. that are in their
HTML DOM, then XHTML is lost forever. Becoming a different flavour of
tag soup, with all of Microsoft's competitors having to similarly
compromise their XHTML implementations in order not to seem broken
alongside this future IE version.

In the meanwhile HTML predictably delivers what is wanted with the least
trouble, effort and issues. The time for XHTML may come, but certainly
not yet.

<snip>
>>... . So


>>serving scripts depending on which content type you
>>previously sent with the page that wants to load the
>>script becomes a problem.
>
> I don't think you covered this specific issue in the alt.html post,
> I'll add another link to this further explanation :)

The alt.html post was in response to the notion of writing Appendix C
XHTML and only serving it as text/html (with the mistaken idea that it
may be possible to switch to application/xhtml+xml at some future time
without any consequences). So the issues arising from dynamically
choosing a content type based on Accept headers, and of content
negotiation to serve alternative mark-up, were not that relevant.

Richard.


Richard Cornford

unread,
Sep 18, 2005, 9:24:11 AM9/18/05
to
Toby Inkster wrote:
> Richard Cornford wrote:
>
>> Don't current Opera versions announce that they support
>> XHTML while stating a preference for HTML?
>
> Current ones: no.

Fair enough, I haven't looked at Opera's Accept headers recently. (As
you have probably guessed, writing/serving XHTML does not occupy much of
my web authoring time).

>> Shouldn't sensible content negotiation only send XHTML in
>> place of HTML when the UA expresses both support for it
>> _and_ a preference for receiving it?
>
> It should definitely take browser preference into account,
> but not necessarily treat it as a gold standard.
>
> For example, say I'm running a site with mulitple languages:
> English, French and German. I create a new page for my site;

...


> a temporary poor German translation, which will be replaced
> by a good translation at a later date.
>
> If somebody visits my site using:
>
> Accept-Language: de;q=1.0, en;q=0.9, it;q=0.1
>
> Then it might be more sensible to send the English
> than the German.

<snip>

OK, the quality of the resource is a factor, but we were talking about
sending the same content as HTML or XHTML depending on the user agent's
expression of support for XHTML. If it is the same content then it is
difficult to see how XHTML could be regarded as having a higher quality
by virtue of nothing more than being marked-up as XHTML.

The relevant part of my preceding response to you was mostly intended to
question the logic of saying; only send text/html when the browser's
Accept header does not announce support for XHTML. With the implication
that the browser's preferences would not be a factor in the decision. If
the content negotiation is to be done in accordance with the HTTP
specification (subject to correcting for IE's unhelpful Accept header)
then the browser's expressed preference is a factor in the decision.

I accept that your expression of your strategy may not have included
full details of your practice in content negotiation.

However, I am concerned that simplistic expressions of content
negotiation are resulting in actively bad manifestations of "content
negotiation" server-scripts. For example, my attention has recently been
drawn to two examples of PHP scripts that attempt to serve HTML or XHTML
based on a UA's Accept header by doing no more than searching the header
for the substring "application/xhtml+xml" and serve XHTML if it is
found. That is so far short of content negotiation that a UA that is, by
HTTTP standards, expressing an absolute rejection of XHTML will be
served XHTML.

Any growth in (or popularisation of) this type of stupidly simplistic
"content negotiation" must stand some chance of doing to the Accept
header what server-side browser detection has done to the User Agent
header; turn it from a potentially useful source of information that
could maximise the user's experience, into a meaningless sequence of
characters expediently chosen to do the browser manufactures least harm
in the face of incompetent web developers.

Richard.


Tim

unread,
Sep 18, 2005, 10:02:53 AM9/18/05
to
Alan J. Flavell sent:

>>> Indeed. I'm increasingly attracted by the idea of using Apache
>>> Multiviews, even when there's only one document variant available, and
>>> avoiding exposing the "filename extension" to the web.


Tim:

>> However, I've discovered a problem with Apache 1.3 (which my host uses),
>> related to language: My pages are English, and described as such. If
>> someone browses using a browser configured only for some other language,
>> they don't get mine regardless, they get a 406 error.

Alan J. Flavell:

> My solution for that is described under
> http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html

The whole page? (It's quite comprehensive.)

> I take it your current pages are named internally as foobar.html.en or
> foobar.en.html ? Then symlink foobar.html.html to them, and it will work.
> Something like this:
>
> ln -s foobar.html.en foobar.html.html

Actually, mine are currently named simply as pagename.html, with
(default/add) language defined in the config file. Though, upon
reflection, the server probably isn't using that in the manner I'd need it
to.

I don't think I'll go around symlinking every single file that's on the
server, that'd be a few hundred operations I'd have to do, never mind the
mess it'd create of maintaining the server over time.

It's a while since I experimented with it, but I don't recall neither
pagename.html or pagename.html.en helping any. I would have thought the
former would have provided some default page, with the latter being more
problematic as an unacceptable language variant. But they both seem as
bad as each other.

>> Apache 2 does allow the webmaster to preselect the document you'll get
>> regardless, when there's no other obviously suitable choice.

> Indeed, but, before we moved to 2, I needed a solution for 1.3

I've been having a little war of words with my webhost, virtually accusing
them of not knowing what they were doing (for various reasons, along with
prior comments asking about when were they going to update to Apache 2),
resulting in being given SSH access without having to pay extra. I think
it's their way of saying to me, "well fix it yourself if you think you
know better".


>> I'd say this is a fault in two halves: The browser for not promoting
>> the concept of flagging more than one language when you configure it,

> Agreed. It's worse than that! For example, USA users get their MeSsIE
> configured for them to say that they accept only en-US. So when I honestly
> advertise my pages as en-GB, they get told there is no acceptable language
> available for them. Sigh.

They're not the only one. I'm Australian, and it's common for the
defaults to be en-US, not en-AU, despite the system asking for your
location when first installing (along with various other stupid things
that ignore your location, requiring individual manual configuration).

Linux seems less stupid than Windows, in that regard, but the latest
version of Fedora (4) seems to think that all English speaking people are
American. It didn't even bother to offer an option for localisation, and
it seems difficult to do post-installation.

>> and the older webserver for not having fallback options.

> Nevertheless, the workaround is simple enough and it works (under
> unix-like OSes, anyway - I'm not sure how that works out on win32
> platforms, I didn't actually try it, knowing that "shortcuts" aren't
> really a direct synomym of what in unix-like file systems we call a
> symlink or soft link).

My host is using Linux, I'm sure shortcuts wouldn't cut it (they haven't
worked in other ways in the webserver files where a symlink would work,
when I've tried it), but I really can't see myself making a symlink for
every file. If my host won't lift its game, I'll be looking for a new
one, especially as I'm nearing the end of my first year (it bills
annually, in advance).

For some strange reason I seem to have a lot of French referrals to a page
on my website (decrying a few stupid website issues), and they mostly get
406 errors. Since I haven't managed to solve the problem, nor be able to
customise the 406 error in what I consider the sensible way (offer the
unacceptable variant, with an understandable explanation before the link),
I've written one with a fair bit of detail about the problem, and
suggesting that if the reader can read this message, they should add
English to the languages their browser accepts, it'll help them with my
site and many others.

Alan J. Flavell

unread,
Sep 18, 2005, 10:45:06 AM9/18/05
to
On Sun, 18 Sep 2005, Tim wrote:

> Alan J. Flavell:
>
> > My solution for that is described under
> > http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html
>
> The whole page?

No, the detail which I included into the posting (discussed in more
detail somewhere on that page, though).

> (It's quite comprehensive.)

I'm not sure if that's a gripe or a compliment ;-)

> Actually, mine are currently named simply as pagename.html, with
> (default/add) language defined in the config file. Though, upon
> reflection, the server probably isn't using that in the manner I'd
> need it to.

I don't think that is compatible with what I'm suggesting.

> > I take it your current pages are named internally as foobar.html.en or
> > foobar.en.html ? Then symlink foobar.html.html to them, and it will work.
> > Something like this:
> >
> > ln -s foobar.html.en foobar.html.html

> I don't think I'll go around symlinking every single file that's on
> the server,

Up to you. It's the best I came up with for my requirements.
The alternative is to write typemap files.

If I wanted to do either, I'd write a Makefile, and run make on the
site, with a little Perl script to do the business.

> that'd be a few hundred operations I'd have to do,

Well, not you, but your script!

> never mind the mess it'd create of maintaining the server over time.

Oh, hardly! Moving to Makefiles would rate to make the server
more maintainable than before, IMHO.



> It's a while since I experimented with it, but I don't recall neither
> pagename.html or pagename.html.en helping any.

I think it should, as long as you're not referencing the .html in your
URLs. If you /are/ so doing, then the .html.html trick in the
filenames will fix that, as I discuss in the related part of my page.

good luck.

Spartanicus

unread,
Sep 18, 2005, 12:00:37 PM9/18/05
to
"Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:

>It is possibly interesting to note that while early versions of Opera 7
>would render XHTML they would not script it, and intermediate versions
>would allow the scripting of intrinsic event attributes but would not
>recognise (or at least act upon) SCRIPT elements, while the latest
>versions allow XHTML to be fully scripted. Given that Opera 7+'s HTML
>DOM was always scriptable (subject to scripting being enabled) that may
>have been part of the reason for the browser expressing a preference for
>HTML over XHTML in the past.

Another possible reason for Opera <=7.2 to declare a lesser ability with
regard to XHTML was that prior to V7.5 Opera's XML parser was not able
to render character entity references.

>Even though my expressed attitude towards XHTML may seem negative I am
>not of the opinion that it is a bad idea as such. As a programmer I
>quite like the idea of a more formally rigid mark-up language, where a
>syntax error (or its equivalent) is fatal and final (and so needs to be
>fixed on the spot). I would like to see a more disciplined approach to
>web authoring where the results of guesswork are not more often as not
>perceived as "working", and must be replaced by decision making based on
>a technical understanding. I am not entirely convinced that XHTML will
>deliver that in reality but I am not opposed to the experiment.

On that issue all that true XHTML has to offer is a check for well
formedness. Well formed code can still be invalid and most importantly
dreadful markup. Well formedness is imo merely a technical requirement
for code to be parsed with an XML parser, it has no other benefits. The
notion that browsers throwing parsing errors when they encounter mall
formed code will somehow make authors write better markup is imo wishful
thinking.

Imo a potential benefit of true XHTML is mixed name space documents.

There's also the potential advantage that parsing true XHTML uses
significantly less resources, this would be an advantage on for example
mobile platforms. The snag here is content. A mobile phone with only an
XML parser will not be able to use the vast amount of text/html content
currently on the web. To utilize the resource advantage new content
would have to be created for such clients. If I was a phone manufacturer
I would not be keen to make such a device, people wouldn't be able to do
much with it.

>If, at some future time, it is possible to serve XHTML as
>application/xhtml+xml with the expectation that all (or at least the
>vast majority of) web browsers understand it, then the practical issues
>become insignificant.

Maybe, UAs in general would have to become compatible, not just
browsers.

>The decision as to whether that future may eventually become a reality
>is entirely down to Microsoft. If they introduce a browser that renders
>XHTML then it may become viable in a commercial context and a switchover
>might be perceived as having benefits.

Even if IE7 will have that capability, it will take a long time for the
number of users who's browser is not capable of rendering true XHTML to
fall below a level where it becomes viable to ignore them.

--
Spartanicus

Richard Cornford

unread,
Sep 18, 2005, 5:39:28 PM9/18/05
to
Spartanicus wrote:
> Richard Cornford wrote:
<snip>
>>... . I would like to see a more disciplined approach to

>>web authoring where the results of guesswork are not more
>>often as not perceived as "working", and must be replaced
>>by decision making based on a technical understanding. I
>>am not entirely convinced that XHTML will deliver that in
>>reality but I am not opposed to the experiment.
>
> On that issue all that true XHTML has to offer is a check
> for well formedness.

There is well-formedness in XML terms and there is the possibility of
enforcing the DTD/Schema rules as to which elements may contain which
other elements, eliminating one type of tag soup nonsense.

> Well formed code can still be invalid and most importantly
> dreadful markup. Well formedness is imo merely a technical
> requirement for code to be parsed with an XML parser, it
> has no other benefits.

Much as following the syntax rules in a programming language in no way
ensures that the results will be good, or even functional. I think of it
more in terms of the possible influence on the attitude of the creators
of marked-up documents. Any unavoidable increase in the formal
requirements may encourage individuals who are forced to learn syntax
rules to go on to learn the full set of applicable rules.

As it stands almost any sequence of characters will produced a 'result'
of some sort or another in at least one tag soup browser. And that seems
to allow some individuals to work in web development for years without
apparently ever realising that HTML has anything objectively knowable
behind it.

People learning programming languages don't sustain the notion that what
they think they should be able to do has any relevance for very long, at
least in part because the imposition of syntax rules stop them short if
they try to make it up off the top of their heads. A switch to reading
and understanding documentation, specifications, etc. is easily seen as
the only productive way forward in that context. The practicalities
encourage a particular attitude towards the task.

> The notion that browsers throwing parsing errors when they
> encounter mall formed code will somehow make authors write
> better markup is imo wishful thinking.

(: At the very least optimistic thinking.

> Imo a potential benefit of true XHTML is mixed name space
> documents.

Yes, one of the reasons that I will not dismiss XHTML out of hand. In my
own work there would be huge advantages in being able to include
architectural CAD drawings in documents. Mixing SVG and XHTML may allow
that, in the meanwhile there is lots of fun to be had with SVG plug-ins.

<snip>


>>The decision as to whether that future may eventually

>>become a reality is entirely down to Microsoft. ...
<snip>


> Even if IE7 will have that capability, it will take a long
> time for the number of users who's browser is not capable
> of rendering true XHTML to fall below a level where it
> becomes viable to ignore them.

Yes, two to eight years if the history of the widespread adoption of web
technologies has anything to say on the subject. And I have seen nothing
to suggest that IE 7 will represent Microsoft moving in this direction.

Richard.


Spartanicus

unread,
Sep 18, 2005, 8:53:33 PM9/18/05
to
"Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:

>> On that issue all that true XHTML has to offer is a check
>> for well formedness.
>
>There is well-formedness in XML terms and there is the possibility of
>enforcing the DTD/Schema rules as to which elements may contain which
>other elements, eliminating one type of tag soup nonsense.

That is equally possible for HTML. But I hope that you are not seriously
suggesting that browsers should have a validator added to them. Apart
from the resource issue, users should not be bothered with such issues.

Not that it would do much good anyway, first because validity is also of
little consequence for the quality of the code as experienced by the
user. Secondly because a browser's HTML parser will continue to need
their extensive error recovery mechanism modeled after the behaviour of
IE to deal with the legacy content out there. So even if browsers were
to be equipped with a validator, and authors would then suddenly all
start producing valid code (why should they, it works no?), it would
have little perceivable benefit.

>> Well formed code can still be invalid and most importantly
>> dreadful markup. Well formedness is imo merely a technical
>> requirement for code to be parsed with an XML parser, it
>> has no other benefits.
>
>Much as following the syntax rules in a programming language in no way
>ensures that the results will be good, or even functional. I think of it
>more in terms of the possible influence on the attitude of the creators
>of marked-up documents. Any unavoidable increase in the formal
>requirements

For it to be "unavoidable", a browser's validating error should result
in a no show, not just a status line message saying "this document is
invalid". That would impede user access to legacy content. If it doesn't
result in a no show, but only produces a status bar message, then I'm
not optimistic about the effect on authors.

>may encourage individuals who are forced to learn syntax
>rules to go on to learn the full set of applicable rules.

I don't share your optimism on this.

>As it stands almost any sequence of characters will produced a 'result'
>of some sort or another in at least one tag soup browser. And that seems
>to allow some individuals to work in web development for years without
>apparently ever realising that HTML has anything objectively knowable
>behind it.

Imo this has in no small measure contributed to lowering the threshold
for the lay person to publicize on the web. This I value much more than
a technically sound construct.

>People learning programming languages don't sustain the notion that what
>they think they should be able to do has any relevance for very long, at
>least in part because the imposition of syntax rules stop them short if
>they try to make it up off the top of their heads.

In this context there is considerable merit in teaching to be strict in
what to produce, but lenient in what to accept. I'm not in favor of a
comparison with programming, it often fails imo. For one programming
syntax errors are produced to the author only, not the user (for
compiled code anyway).

>> Imo a potential benefit of true XHTML is mixed name space
>> documents.
>
>Yes, one of the reasons that I will not dismiss XHTML out of hand. In my
>own work there would be huge advantages in being able to include
>architectural CAD drawings in documents. Mixing SVG and XHTML may allow
>that

But SVG can be used as it stands, even in this case the benefits are
afaics mainly of a technical nature only relevant to authors.

>in the meanwhile there is lots of fun to be had with SVG plug-ins.

Being able to author mixed name space documents would not have an effect
on the browser's ability to render SVG.

>> Even if IE7 will have that capability, it will take a long
>> time for the number of users who's browser is not capable
>> of rendering true XHTML to fall below a level where it
>> becomes viable to ignore them.
>
>Yes, two to eight years if the history of the widespread adoption of web
>technologies has anything to say on the subject.

I'd consider eight years as the minimum, no IE7 for anyone except folk
running at least XP with the latest service packs afaik.

--
Spartanicus

Brian

unread,
Sep 19, 2005, 4:55:42 PM9/19/05
to
Alan J. Flavell wrote:

> There /is/ a published interworking specification for content-type
> negotiation. It isn't exactly new! Client agents get what they
> asked for[1].

[...]


> [1] I note that in IE's case it means (seeing that our campus
> standard MS Windows installation includes MS Office) that they always
> get MS Word format, if available, in preference to HTML. Well, if
> that's what they want, who am I to argue? :-}

It's precisely this behavior that made me give up content negotiation
for a contest entry application that I make available in HTML markup,
plain text, and MS Word. Content negotiation in conjunction with
MSIE/Win caused the MS Word version to load. As you well know, the HTML
version was last in line. The confusion that was likely to cause IE/Win
users was unacceptable to me.

But since I have explicit links to the other variants anyways, was
content negotiation really necessary? I'm not sure if users would
explicitly configure their user agent to retrieve an plain text or MS
Word variant over an HTML one.

--
Brian

Richard Cornford

unread,
Sep 20, 2005, 8:26:33 PM9/20/05
to
Spartanicus wrote:

> Richard Cornford wrote:
>
>>> On that issue all that true XHTML has to offer is
>>> a check for well formedness.
>>
>>There is well-formedness in XML terms and there is the
>>possibility of enforcing the DTD/Schema rules as to which
>>elements may contain which other elements, eliminating one
>>type of tag soup nonsense.
>
> That is equally possible for HTML. But I hope that you
> are not seriously suggesting that browsers should have
> a validator added to them.

If 'seriously' means my having any expectation that it would ever
happen, then no, I am not seriously suggesting it.

> Apart from the resource issue, users should not be
> bothered with such issues.

If IE responded to structurally incorrect mark-up by putting up an big
dialog saying "This page has been incorrectly authored" (or anything
else that unambiguously blamed the web author for errors/faults) then
the user would never be bothered by it, because the people writing the
pages would be too embarrassed to publish pages that caused it to show.

<snip>


>>Much as following the syntax rules in a programming language
>>in no way ensures that the results will be good, or even
>>functional. I think of it more in terms of the possible
>>influence on the attitude of the creators of marked-up
>>documents. Any unavoidable increase in the formal requirements
>
> For it to be "unavoidable", a browser's validating error
> should result in a no show, not just a status line message
> saying "this document is invalid".

It doesn't have to represent a no-show (though unless progressive
rendering happened that would be a consequence), and that would tend to
make the browser/user's computer look broken. A nice obvious dialog box
that could not be switched off, and placed the blame where it belonged,
would be sufficient.

> That would impede user access to legacy content.

There is legacy content in XHTML? :)

> If it doesn't result in a no show, but only produces
> a status bar message, then I'm not optimistic about
> the effect on authors.

Yes, and in practice a status bar message is probably about the most
extreme reaction we can expect from browsers.

<snip>


>>As it stands almost any sequence of characters will
>>produced a 'result' of some sort or another in at
>>least one tag soup browser.

<snip>


>
> Imo this has in no small measure contributed to lowering
> the threshold for the lay person to publicize on the web.
> This I value much more than a technically sound construct.

It is quite nice that HTML + web browsers are friendly enough to allow
virtually anyone to create a 'web page'. It is just a bit irritating to
find people who would describe themselves as professionals using that as
an excuse for never actually understanding what they are doing.

>>People learning programming languages don't sustain the
>>notion that what they think they should be able to do has
>>any relevance for very long, at least in part because the
>>imposition of syntax rules stop them short if they try to
>>make it up off the top of their heads.

<snip>
> ... . I'm not in favor of a comparison with programming,


> it often fails imo. For one programming syntax errors are
> produced to the author only, not the user (for compiled
> code anyway).

Which is in part what I am getting at. If the browser's response to
errors was sufficiently obvious then they would only be a matter for
authors as the authors would be well motivated to correct them before
the user had a chance to experience them.

<snip>


>>> Even if IE7 will have that capability, it will take

>>> a long time for the number ...
<snip>


>>Yes, two to eight years if the history of the widespread
>>adoption of web technologies has anything to say on the
>>subject.
>
> I'd consider eight years as the minimum, no IE7 for anyone
> except folk running at least XP with the latest service
> packs afaik.

However long it may take XHTML is not for the present. Its future might
be a subject for idle speculation but that is about all.

Richard.


Nick Kew

unread,
Sep 21, 2005, 3:14:07 AM9/21/05
to
Richard Cornford wrote:

>>Apart from the resource issue, users should not be
>>bothered with such issues.
>
>
> If IE responded to structurally incorrect mark-up by putting up an big
> dialog saying "This page has been incorrectly authored" (or anything
> else that unambiguously blamed the web author for errors/faults) then
> the user would never be bothered by it, because the people writing the
> pages would be too embarrassed to publish pages that caused it to show.

No need to go that far. Just a little bright red broken-!! tucked away
somewhere in the manner of that favicon, or the SSL padlock. And
another one for when the browser error-corrects bogus HTTP (as if).

Users needn't be bothered with what it means at all. Some will, and
the manual will tell them. Maybe clicking any of the "broken" icons
could pop up an explanation. It's all simple UI design.

Lynx has flagged Bad Markup while also rendering it for many years.

--
Nick Kew

Alan J. Flavell

unread,
Sep 21, 2005, 4:27:43 AM9/21/05
to
On Wed, 21 Sep 2005, Nick Kew wrote:

> No need to go that far. Just a little bright red broken-!! tucked
> away somewhere in the manner of that favicon, or the SSL padlock.
> And another one for when the browser error-corrects bogus HTTP (as
> if).

I've little hope of that. This is a browser-like object whose primary
selling point seems to be to the *makers* of web pages, not to the
recipients. IMHO one only has to compare the documentation offered
for the two types of user to come to a conclusion about *that*.

I don't believe that the developers would do anything much to
embarrass the makers of web pages "du jour", no matter how much it
rated to inform the users. The best one can hope for is something
aimed at *malicious* web pages, but when e.g the user message for
javascript from untrusted web pages says nothing more threatening than
"Scripts are usually safe"[1], I don't expect a lot. If you want a
browser that's aimed at users, choose a competing product.

[1] My translation: scripts are occasionally disastrous, and executing
untrusted scripts has been known to result in trashing not only the
browser but the entire OS. There's been lots of sticking plaster
applied since I last saw that happen with IE, but I've seen few fixes
where I thought the fundamental issue had really been addressed.

One of our users happily infected himself with a Trojan yesterday. If
he'd observed the local recommendations and used a www-compatible
browser instead of IE, this wouldn't have happened. Fortunately, the
anti-virus product spotted what he'd done, but that isn't something
that one can rely on.

> Users needn't be bothered with what it means at all. Some will, and
> the manual will tell them. Maybe clicking any of the "broken" icons
> could pop up an explanation. It's all simple UI design.

It needs more than UI design - it needs motivation. Are IE users
telling MS in no uncertain terms that they demand this or else they
use a competing product? I doubt that many of those who chose a
competing product would put that reason very near the top of their
list, TBH, even if asked the question directly. I wish it were
otherwise, but that's how it seems to me.

> Lynx has flagged Bad Markup while also rendering it for many years.

To be fair, it's flagged *some kinds of* bad markup. The absence of a
warning doesn't by any means guarantee the absence of errors, in my
experience.

all the best

Alan J. Flavell

unread,
Sep 21, 2005, 4:36:26 AM9/21/05
to
On Wed, 21 Sep 2005, Richard Cornford wrote:

> There is legacy content in XHTML? :)

Oh yes :-{

There's been many years already for those attracted by sexy XHTML,
while caring nothing for the W3C's hopes and prayers, to accrete a
considerable volume of XHTML-flavoured tag soup to accompany the
existing legacy of HTML-flavoured tag soup.

> It is quite nice that HTML + web browsers are friendly enough to
> allow virtually anyone to create a 'web page'.

True, but it's even more unfortunate that it seems to need a real
expert to produce true simplicity. The average page that I've seen
from untrained beginners has been awash with incredibly complex hacks
that they seem to have inherited from somewhere without the slightest
comprehension of what they're doing.

Spartanicus

unread,
Sep 21, 2005, 6:55:24 AM9/21/05
to
"Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:

>If IE responded to structurally incorrect mark-up by putting up an big
>dialog saying "This page has been incorrectly authored" (or anything
>else that unambiguously blamed the web author for errors/faults) then
>the user would never be bothered by it, because the people writing the
>pages would be too embarrassed to publish pages that caused it to show.

Only for new content authored by people who use that particular version
of IE. A huge part of the content on the web is legacy content that
isn't and won't be updated.

To weigh down a browser like that and pester users with issues that are
none of their concern, nor within their control to fix in the hope that
a tiny percentage amongst those users are the people you want to shame
into learning how to appease a dumb bot that checks for rigid syntax
rules that have neither much influence on the quality of the code as
perceived by a user, or necessarily any effect on browser rendering is,
ridiculous.

>> That would impede user access to legacy content.
>
>There is legacy content in XHTML? :)

Is your case for adding a validator to browsers intended to apply only
to true XHTML and not to XHTML served as text/html?

>> ... . I'm not in favor of a comparison with programming,
>> it often fails imo. For one programming syntax errors are
>> produced to the author only, not the user (for compiled
>> code anyway).
>
>Which is in part what I am getting at. If the browser's response to
>errors was sufficiently obvious then they would only be a matter for
>authors as the authors would be well motivated to correct them before
>the user had a chance to experience them.

(Leaving out the resource consideration for the sake of discussing this
specific point) Syntax errors often found in code in interpreted
languages such as HTML, CSS and javascript could be reduced
significantly if there was a requirement on the parser to present an
error message to the user. But for that to work it would have to be a
mandatory part of such parsers from the beginning. If it isn't, then
introducing it as an after thought can only result in frustrating end
users.

--
Spartanicus

Michael Winter

unread,
Sep 21, 2005, 5:36:57 PM9/21/05
to
On 18/09/2005 14:24, Richard Cornford wrote:

[snip]

> The decision as to whether [widespread XHTML] may eventually become a
> reality is entirely down to Microsoft. If they [...] take their usual


> attitude and use a parser that error-corrects any old nonsense into
> something useable/renderable, and compromise their XHTML DOM
> implementation with all of the shortcuts, etc. that are in their HTML
> DOM, then XHTML is lost forever.

I recently read an article from ICEsoft that seems to suggest that they
have no interest in making ICEbrowser a conforming XML parser.

<URL:http://support.icesoft.com/jive/entry!default.jspa?categoryID=20&entryID=23>

Rather depressingly, the reasons they seem to be using are based solely
on their perception of HTML, rather than the fact that they are dealing
with a new language that can be treated differently (when served as
XHTML, of course). I can agree that they don't want to implement a
validating parser[1], but that they don't want to enforce XML syntax
rules is very disappointing.

Does this article actually apply to documents served as
application/xhtml+xml? It isn't specific in this regard.

[snip]

Mike


[1] I would /like/ to think that such an implementation was a good
idea. However, it's a shame that Spartanicus is correct in
saying that validation in no way implies good authoring
practices. After all, validity is frequently declared like a
badge of honour, rather than something that should be expected
(barring any overriding reasons).

--
Michael Winter
Prefix subject with [News] before replying by e-mail.

Toby Inkster

unread,
Sep 21, 2005, 6:35:38 PM9/21/05
to
Richard Cornford wrote:

> However, I am concerned that simplistic expressions of content
> negotiation are resulting in actively bad manifestations of "content
> negotiation" server-scripts.

http://groups.google.com/groups?q=author%3Ainkster+substr_count

Stewart Gordon

unread,
Sep 22, 2005, 6:49:15 AM9/22/05
to
Alan J. Flavell wrote:
> On Fri, 16 Sep 2005, Stewart Gordon wrote:
<snip>
> It's close enough to real life for me. If you insist on having two
> URLs that differ only in letter case, then you'd probably have
> difficulties, I suppose. I never tried it. Best read the
> win32-specific release notes if you want to know the sordid details.
<snip>

If you're developing under Windows, it goes without saying that you'd
avoid having two URLs that differ only in case but point to different
pages. But you still might want to check that you've done all the links
correctly.

But then again, a simple link checker tool would probably do this
checking for you. And personally, my routine is to use only lowercase
for names of files that I'm going to put on the web (except when the
names are generated by a program such as Doxygen...).

Stewart.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS-
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox. Please keep replies on
the 'group where everyone may benefit.

Richard Cornford

unread,
Sep 24, 2005, 10:43:34 PM9/24/05
to
Spartanicus wrote:
> "Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:
<snip>

>>> That would impede user access to legacy content.
>>
>>There is legacy content in XHTML? :)
>
> Is your case for adding a validator to browsers intended to
> apply only to true XHTML and not to XHTML served as text/html?

Yes, it could never be practical to retroactively attempt to impose any
additional restrictions on HTML, and 'XHTML' served as text/html is HTML
(as far as the receiving software is concerned).

>>> ... . I'm not in favor of a comparison with programming,
>>> it often fails imo. For one programming syntax errors are
>>> produced to the author only, not the user (for compiled
>>> code anyway).
>>
>> Which is in part what I am getting at. If the browser's
>> response to errors was sufficiently obvious then they
>> would only be a matter for authors as the authors would
>> be well motivated to correct them before the user had a
>> chance to experience them.
>
> (Leaving out the resource consideration for the sake of
> discussing this specific point) Syntax errors often found
> in code in interpreted languages such as HTML, CSS and
> javascript could be reduced significantly if there was a
> requirement on the parser to present an error message to
> the user. But for that to work it would have to be a
> mandatory part of such parsers from the beginning. If it
> isn't, then introducing it as an after thought can only
> result in frustrating end users.

Yes, from the begging, or sufficiently close to the begging. Which is
why it can be the subject of speculation concerning a future adoption of
XHTML. It isn't going to happen though, is it? ;)

Richard.


Richard Cornford

unread,
Sep 24, 2005, 10:43:22 PM9/24/05
to
Alan J. Flavell wrote:
> On Wed, 21 Sep 2005, Richard Cornford wrote:
>
>> There is legacy content in XHTML? :)
>
> Oh yes :-{
>
> There's been many years already for those attracted by
> sexy XHTML, while caring nothing for the W3C's hopes and
> prayers, to accrete a considerable volume of XHTML-flavoured
> tag soup to accompany the existing legacy of HTML-flavoured
> tag soup.

But when does XHTML-flavoured tag soup (served as text/html) become
XHTML. We see plenty of incidence of XHTML-style "<br />" appearing in
documents that have HTML doctypes, but wouldn't claim that those are
XHTMl documents. So there are a number of possible criteria for
considering a document to be XHTML when it is not being served as
XHTML:-

1. The author thinks he/she is writing XHTML
2. The document has an XHTML doctype (and possible exclusively XML/XHTML
(but probably Appendix C) style mark-up).
3. A (preferably bug free) validator can be persuaded to declare the
document valid XHTML

If a document is served as text/html the browser will treat it as HTML
tag soup so if it is considered to be XHTML it is only considered to be
such in the mind of some observer.

>> It is quite nice that HTML + web browsers are friendly
>> enough to allow virtually anyone to create a 'web page'.
>
> True, but it's even more unfortunate that it seems to need
> a real expert to produce true simplicity. The average page
> that I've seen from untrained beginners has been awash with
> incredibly complex hacks that they seem to have inherited
> from somewhere without the slightest comprehension of what
> they're doing.

Yes; authoring by mystical incantation. And justified because, by some
criteria, it "works". Unfortunately it is not a practice that is limited
to the beginner.

Richard.


Richard Cornford

unread,
Sep 24, 2005, 10:43:27 PM9/24/05
to
Nick Kew wrote:
> Richard Cornford wrote:
>>>Apart from the resource issue, users should not be
>>>bothered with such issues.
>>
>> If IE responded to structurally incorrect mark-up by putting
>> up an big dialog saying "This page has been incorrectly
>> authored" (or anything else that unambiguously blamed the
>> web author for errors/faults) then the user would never be
>> bothered by it, because the people writing the pages would
>> be too embarrassed to publish pages that caused it to show.
>
> No need to go that far. Just a little bright red broken-!!
> tucked away somewhere in the manner of that favicon, or the
> SSL padlock. And another one for when the browser
> error-corrects bogus HTTP (as if).

When a script error happens in IE a small yellow symbol appears in the
status bar. That just isn't enough. Because scripting browsers is a
significant proportion of what I do I invariably have IE configured to
actually pop-up its error dialog in addition to showing that symbol. And
when I browse the Internet that error dialog pops up on a daily basis
(google being a constant offender (but only because I use google quite a
lot). If the authors were seeing that error dialog (and expecting the
user to see them) those faulty scripts would not be being exposed on the
public Internet, but the little yellow symbol is apparently easy for an
author who doesn't care to disregard.

> Users needn't be bothered with what it means at all.

<snip>

In principle users should never even see them because they should never
be provoked. And in that case the nature of the statement is not an
issue for the user, only the developer, who should not be able to avoid
them.

Richard.


Richard Cornford

unread,
Sep 24, 2005, 10:43:11 PM9/24/05
to
Michael Winter wrote:
> On 18/09/2005 14:24, Richard Cornford wrote:
>> The decision as to whether [widespread XHTML] may eventually
>> become a reality is entirely down to Microsoft. If they [...]
>> take their usual attitude and use a parser that error-corrects
>> any old nonsense into something useable/renderable, and
>> compromise their XHTML DOM implementation with all of the
>> shortcuts, etc. that are in their HTML DOM, then XHTML is
>> lost forever.
>
> I recently read an article from ICEsoft that seems to
> suggest that they have no interest in making ICEbrowser
> a conforming XML parser.
>
>
<URL:http://support.icesoft.com/jive/entry!default.jspa?categoryID=20&en
tryID=23>
>
> Rather depressingly, the reasons they seem to be using are
> based solely on their perception of HTML, rather than the
> fact that they are dealing with a new language that can be
> treated differently (when served as XHTML, of course).
>
> I can agree that they don't want to implement a
> validating parser[1], but that they don't want to enforce
> XML syntax rules is very disappointing.

Over the years ice soft have put a lot of effort into making their
browser behave like IE, for fairly obvious reasons of expedience. They
have done such a good job of doing that that many are not aware
IceBrowser exists, and are not going to be taking any measures to
accommodate its peculiarities. That may mean making IceBrowser
ridiculously tolerant.

> Does this article actually apply to documents served as
> application/xhtml+xml? It isn't specific in this regard.
<snip>

It certainly isn't clear, and I don't have a working evaluation version
of IceBrowser to check its Accept header. If they are announcing
acceptance of application/xhtml+xml then this decision may be in
anticipation of how they expect Microsoft to act, and I would not be
surprised to find an XHTML comprehending future version of IE being
exactly as tolerant.

Richard.


Spartanicus

unread,
Sep 25, 2005, 2:33:53 AM9/25/05
to
"Richard Cornford" <Ric...@litotes.demon.co.uk> wrote:

>> Is your case for adding a validator to browsers intended to
>> apply only to true XHTML and not to XHTML served as text/html?
>
>Yes, it could never be practical to retroactively attempt to impose any
>additional restrictions on HTML, and 'XHTML' served as text/html is HTML
>(as far as the receiving software is concerned).

>> (Leaving out the resource consideration for the sake of


>> discussing this specific point) Syntax errors often found
>> in code in interpreted languages such as HTML, CSS and
>> javascript could be reduced significantly if there was a
>> requirement on the parser to present an error message to
>> the user. But for that to work it would have to be a
>> mandatory part of such parsers from the beginning. If it
>> isn't, then introducing it as an after thought can only
>> result in frustrating end users.
>
>Yes, from the begging, or sufficiently close to the begging. Which is
>why it can be the subject of speculation concerning a future adoption of
>XHTML.

The same problems occur when you only apply it to true XHTML, since
validation isn't part of current day XHTML parsers, true XHTML currently
found on the net is likely to be full of validation errors. There is far
less true XHTML on the net compared to stuff served as text/html, but
the principle flaw remains.

>It isn't going to happen though, is it? ;)

Let's hope not, it makes no sense at all.

--
Spartanicus

Toby Inkster

unread,
Sep 25, 2005, 7:09:07 AM9/25/05
to
Richard Cornford wrote:

> But when does XHTML-flavoured tag soup (served as text/html) become
> XHTML.

If a dcoument uses an XHTML namespace, XHTML doctype and validates against
the XHTML doctype supplied, then it is XHTML.

The Content-Type header is used by one possible transport mechanism and is
not part of the document itself.

Talk of "true XHTML documents must use application/xhtml+xml" becomes
laughable when you consider that the documents may very well be served
over FTP, via the local file system, or via some other transport mechanism
that doesn't specify a content type.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Now Playing ~ ./coldplay/parachutes/04_sparks.ogg

Norman L. DeForest

unread,
Sep 25, 2005, 2:33:45 PM9/25/05
to

On Fri, 16 Sep 2005, Alan J. Flavell wrote:

> On Fri, 16 Sep 2005, Stewart Gordon wrote:
>

> > Alan J. Flavell wrote:
> [...]
> >
> > Is it possible/easy to get hold of a staging server for Windows that
> > emulates a Unix webserver, or vice versa?
>
> see below re win32 Apache, close enough to the Apache-based servers
> which most service providers seem to use. Certainly I've never had
> any real problems relative to the linux-based production Apache server
> which we run at our research group.
>
> > Stuff like case-sensitivity....


>
> It's close enough to real life for me. If you insist on having two

^^^^^^^^^^^^^^^^^^^^^^^^^^^


> URLs that differ only in letter case, then you'd probably have

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


> difficulties, I suppose. I never tried it. Best read the

^^^^^^^^^^^^^^^^^^^^^^^^


> win32-specific release notes if you want to know the sordid details.

[snip]

See:
http://www.chebucto.ns.ca/~af380/Profile.html
and:
http://www.chebucto.ns.ca/~af380/profile.html

The first filename is the default home page for users at my ISP.
The second solved the problem of users mistyping the filename for
the first one in all lower-case. "I tried to go to your home page
but all I got was a 404 error."

(No, they aren't fancy or pretty and maybe not even valid. My site just
sort of grew (like fungus) and only recently have I had graphical access
to see my own pages with anything other than Lynx. Conversion to valid
and graphically pleasing pages depends on finding the spare time needed
for the job. Maybe by 2015....)

--
``Why don't you find a more appropiate newsgroup to post this tripe into?
This is a meeting place for a totally differnt kind of "vision impairment".
Catch my drift?'' -- "jim" in alt.disability.blind.social regarding an
off-topic religious/political post, March 28, 2005

Richard Cornford

unread,
Sep 25, 2005, 6:59:56 PM9/25/05
to
Toby Inkster wrote:
> Richard Cornford wrote:
>
>> But when does XHTML-flavoured tag soup (served as text/html)
>> become XHTML.
>
> If a dcoument uses an XHTML namespace, XHTML doctype and
> validates against the XHTML doctype supplied, then it is XHTML.

An objective criteria, and eliminating any concern that legacy
XHTML-flavoured tag soup might suffer from any possible more ridged
handling of XHTML by excluding any of it that may be problematic.

> The Content-Type header is used by one possible transport
> mechanism and is not part of the document itself.
>
> Talk of "true XHTML documents must use application/xhtml+xml"
> becomes laughable when you consider that the documents may very
> well be served over FTP, via the local file system, or via some
> other transport mechanism that doesn't specify a content type.

It clearly is ridiculous to say that XHTML that is not served as
application/xhtml+xml is not true XHTML. However, it is completely
reasonable to assert that XHTML (and XHTML-like tag soup) served as
text/html is tag soup HTML. It is HTML because the receiving browser
will treat it as (tag soup) HTML (whether it supports XHTML or not). If
a document is destined to be interpreted as (error-filled) HTML it seems
reasonable to question the circumstances under which it might also be
regarded as XHTML.

Richard.


Alan J. Flavell

unread,
Sep 26, 2005, 7:26:53 AM9/26/05
to
On Sun, 25 Sep 2005, Norman L. DeForest wrote:

> On Fri, 16 Sep 2005, Alan J. Flavell wrote:
>
> > On Fri, 16 Sep 2005, Stewart Gordon wrote:

[question about using a Win32 Apache for local verification
before uploading to a unix-style system...]

> > see below re win32 Apache, close enough to the Apache-based servers
> > which most service providers seem to use. Certainly I've never had
> > any real problems relative to the linux-based production Apache server
> > which we run at our research group.
> >
> > > Stuff like case-sensitivity....
> >
> > It's close enough to real life for me. If you insist on having two
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > URLs that differ only in letter case, then you'd probably have
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > difficulties, I suppose. I never tried it. Best read the
> ^^^^^^^^^^^^^^^^^^^^^^^^
> > win32-specific release notes if you want to know the sordid details.
> [snip]

Having reviewed the Win32 notes, I have to admit that they don't
seem to be as helpful as I had hoped. However, at this URL:

http://httpd.apache.org/docs-2.0/sections.html#file-and-web

there are some useful notes aimed at drawing the distinction between
between URL paths and file system paths on a case-insensitive file
system, and the unfortunate effect of making an inappropriate choice
of <Location> directive for controlling access to a particular
resource.

Naturally, even Win32 Apache can see case differences in /URLs/ ,
and take appropriate actions via directives: it's only "if and when" a
URL finally hits a file path that the case insensitivity comes to
light.


> See:
> http://www.chebucto.ns.ca/~af380/Profile.html

Apparently a unix-based Apache 1.3 server, which does not
exhibit the case-insensitive behaviour that is under discussion.

> and:
> http://www.chebucto.ns.ca/~af380/profile.html
>
> The first filename is the default home page for users at my ISP.

Is it? Then its advertised URL ought, according to normal good
practice, to be http://www.chebucto.ns.ca/~af380/ , without the
specific file path at the end. However, that URL seems to go to a
different page, so I'm not sure in what sense Profile.html is the
"default home page".

> The second solved the problem of users mistyping the filename for
> the first one in all lower-case.

This is an unfortunate choice, since the second URL leads to a
different web page and returns 200 OK, meaning that to an indexing
robot it appears to be substantive content, rather than an anomalous
(error) condition. I would strongly recommend handling such issues
with some kind of status that indicates that the URL is irregular:
this could, for example, be done with a redirection status (301 would
be appropriate), or with a custom error page.

In fact, enabling mod_speling will handle this automatically, and for
all other pages too.

> "I tried to go to your home page but all I got was a 404 error."

The 404 error status *was* correct, since the requested page isn't
supposed to exist (indeed it didn't until you put something there!).
Better practice would be a 404 error page which suggested going to the
corrected URL. In fact your present page would seem good enough, but
it ought IMHO to be delivered with an error status (404 is
appropriate), instead of the 200 OK that's happening presently.

That could of course be done with an ErrorDocument directive in your
.htaccess, assuming that the provider hasn't disabled that facility.

So those IMHO are two choices that are better than what's currently
happening:

1. status 301 redirection to the corrected URL (or enable
mod_speling); or

2. status 404 to a helpful error page with link to the corrected URL.


Coming back to the original issue, though, I'm not sure what your
point was about reviewing pages on a case-insensitive file system
prior to uploading them to a case-sensitive server.

best regards

Norman L. DeForest

unread,
Sep 26, 2005, 9:04:26 AM9/26/05
to

On Mon, 26 Sep 2005, Alan J. Flavell wrote:

> On Sun, 25 Sep 2005, Norman L. DeForest wrote:
>
> > On Fri, 16 Sep 2005, Alan J. Flavell wrote:
> >
> > > On Fri, 16 Sep 2005, Stewart Gordon wrote:
>
> [question about using a Win32 Apache for local verification
> before uploading to a unix-style system...]
>
> > > see below re win32 Apache, close enough to the Apache-based servers
> > > which most service providers seem to use. Certainly I've never had
> > > any real problems relative to the linux-based production Apache server
> > > which we run at our research group.
> > >
> > > > Stuff like case-sensitivity....
> > >
> > > It's close enough to real life for me. If you insist on having two
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > URLs that differ only in letter case, then you'd probably have
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > difficulties, I suppose. I never tried it. Best read the
> > ^^^^^^^^^^^^^^^^^^^^^^^^
> > > win32-specific release notes if you want to know the sordid details.
> > [snip]

[snip]


> Naturally, even Win32 Apache can see case differences in /URLs/ ,
> and take appropriate actions via directives: it's only "if and when" a
> URL finally hits a file path that the case insensitivity comes to
> light.
>
>
> > See:
> > http://www.chebucto.ns.ca/~af380/Profile.html
>
> Apparently a unix-based Apache 1.3 server, which does not
> exhibit the case-insensitive behaviour that is under discussion.
>
> > and:
> > http://www.chebucto.ns.ca/~af380/profile.html
> >
> > The first filename is the default home page for users at my ISP.
>
> Is it? Then its advertised URL ought, according to normal good
> practice, to be http://www.chebucto.ns.ca/~af380/ , without the
> specific file path at the end. However, that URL seems to go to a
> different page, so I'm not sure in what sense Profile.html is the
> "default home page".

Each user gets a home page named "Profile.html" automatically generated
and a Lynx shortcut "go profile" allowed a user to edit their home page
without knowing too much about navigating their filespace. Users had to
create their own "index.html" file (the default file for the fileserver)
if they wanted one. That way, it reduced the number of times a
user-support person had to ask the user to temporarily rename their
"index.html" file in response to a support question of the type "What
did I do wrong with my web page?" or "Why don't the links on my web page
work?" or "Why can't anyone access my images?" Being able to view a
user's directory was often necessary to see if the user had done something
such as upload images in lower-case but use upper- or mixed-case in the
URLs (or just misspelled the URL). It was assumed that users who knew
enough to create their own "index.html" file to hide the directory listing
would also be less likely to make such simple errors and need less
hand-holding.

New Privacy Laws have required my ISP to stop the practice but there
used to be a script available to users only (and not outside visitors)
with a Lynx shortcut, "go who", that provided links to the home pages of
the users logged in at the time. I found many interesting sites that way.
The default file accessed with that script was "Profile.html".

>
> > The second solved the problem of users mistyping the filename for
> > the first one in all lower-case.
>
> This is an unfortunate choice, since the second URL leads to a
> different web page and returns 200 OK, meaning that to an indexing
> robot it appears to be substantive content, rather than an anomalous
> (error) condition. I would strongly recommend handling such issues

There are no links to that file so search-engines are unlikely to find it.

> with some kind of status that indicates that the URL is irregular:
> this could, for example, be done with a redirection status (301 would
> be appropriate), or with a custom error page.
>
> In fact, enabling mod_speling will handle this automatically, and for
> all other pages too.

User scripts, custom server configuration and custom error pages are not
available for users here.

>
> > "I tried to go to your home page but all I got was a 404 error."
>
> The 404 error status *was* correct, since the requested page isn't
> supposed to exist (indeed it didn't until you put something there!).
> Better practice would be a 404 error page which suggested going to the
> corrected URL. In fact your present page would seem good enough, but
> it ought IMHO to be delivered with an error status (404 is
> appropriate), instead of the 200 OK that's happening presently.

That's not under my control. Creating a new page with a lower-case name
*was* under my control.

>
> That could of course be done with an ErrorDocument directive in your
> .htaccess, assuming that the provider hasn't disabled that facility.

Users here cannot create or modify dot files (except for those indirectly
modified by applications such as (for example) pine modifying .pinerc,
.newsrc, and .addressbook).

>
> So those IMHO are two choices that are better than what's currently
> happening:
>
> 1. status 301 redirection to the corrected URL (or enable
> mod_speling); or
>
> 2. status 404 to a helpful error page with link to the corrected URL.
>
>
> Coming back to the original issue, though, I'm not sure what your
> point was about reviewing pages on a case-insensitive file system
> prior to uploading them to a case-sensitive server.
>
> best regards

I was commenting on the statement, "If you insist on having two URLs that
differ only in letter case, then you'd probably have difficulties, I
suppose."

There *was* a reason for my having two filenames differing only in case
and it does cause difficulties in mirroring my site on my Windows machine.
(I had to rename the lower-case name to "profile2.html" in order to have
both files in the same Windows directory. Fortunately, nothing links to
"profile.html" so that's relatively harmless.) However, there *was* a
practical reason for having such case-differing filenames on my ISP's
system and it solved an even worse problem.

Alan J. Flavell

unread,
Sep 26, 2005, 10:32:52 AM9/26/05
to
On Mon, 26 Sep 2005, Norman L. DeForest wrote:

> Each user gets a home page named "Profile.html" automatically
> generated and a Lynx shortcut "go profile" allowed a user to edit
> their home page without knowing too much about navigating their
> filespace. Users had to create their own "index.html" file (the
> default file for the fileserver) if they wanted one.

Yes, but I was talking about good-practice on the web as a whole, and
what might be expected by some typical clued web user; rather than
some eccentric convention deployed by your particular service
provider. No offence meant to yourself, of course.

> User scripts, custom server configuration and custom error pages are
> not available for users here.

I take your point, but the fact is that this restriction has resulted
in a sub-optimal solution. I've already set out my reasons for saying
that, so I'll leave it there.

> I was commenting on the statement, "If you insist on having two URLs
> that differ only in letter case, then you'd probably have
> difficulties, I suppose."

Yes: some difficulties in *fully* testing a draft site on a Win32
Apache, prior to uploading them to a unix-based server, indeed. The
difficulties are not insuperable, but beyond a certain point (if a
site made comprehensive use of URLs that differ only in letter case)
it might be too much trouble to be worth the effort. In your "case"
(no pun intended!), if this is the only issue of this kind that you
have, I don't think it's more than a minor itch. (Modulo the fact, as
I already said, that in an ideal world this would be resolved in a
different way anyhow.)

> There *was* a reason for my having two filenames differing only in
> case and it does cause difficulties in mirroring my site on my
> Windows machine. (I had to rename the lower-case name to
> "profile2.html" in order to have both files in the same Windows
> directory. Fortunately, nothing links to "profile.html" so that's
> relatively harmless.)

Somehow that point seems to have been elided, or taken for granted, in
your original posting. Sorry if I misunderstood at first.


[digression...]
A partial solution, in your specific case, is to configure your
*Win32 Apache* to alias the URL "profile.html" internally to your
alternative file, profile2.html. The local URLs "profile.html" and
"Profile.html" will then both appear to behave on the local Win32
server as you expect them to behave on the production unix server
(even though the internal mechanisms are different).

Note that Alias belongs not in a .htaccess file, but in the main
configuration, and its second argument is a *file* path, not URL: so
the directive might be something like (module linewraps):

Alias /profile.html "C:/Program Files/Apache Group/Apache2/htdocs/profile2.html"

(just tested on my own win32 installation).

Of course there will also be your extraneous URL "/profile2.html", but
if you don't reference it from anywhere, it won't matter.
[...digression.]


I still think it's fair to say to someone who develops pages on a
free-standing Windows system, i.e with no immediate access to a unix
system, that the use of a local Win32 Apache system for reviewing the
draft site is valuable, subject to the issues mentioned. In general,
after uploading, they should run a link checker to unearth any
possible glitches in URL letter-case settings which wouldn't show up
under link checking on the "draft" server.

> However, there *was* a practical reason for having such
> case-differing filenames on my ISP's system and it solved an even
> worse problem.

Understood. But you'll have to excuse me if I rate it as an
understandable one-off kludge which was forced on you by the limited
range of facilities at your disposal on the production server.

Not that you even suggested this: but the idea of extending that
scheme to implement a general solution to mis-cased URLs just doesn't
bear thinking about.

best regards

0 new messages