Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

??? Crazy XHTML Strict Validation Problem ???

1 view
Skip to first unread message

rbronson1976

unread,
Oct 4, 2005, 11:39:52 PM10/4/05
to
Hello all,

I have a very strange situation -- I have a page that validates (using
http://validator.w3.org/) as "XHTML 1.0 Strict" just fine. This page
uses this DOCTYPE:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

When I change the DOCTYPE to (what should be the equivalent):
<!DOCTYPE html SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I get several validation errors.

The page I'm referring to is at:
http://www.absolutejava.com/testing.html

I know you are wondering *WHY* I would make this change. I'll get to
that in a moment. The important point is that, according to my
understanding, both DOCTYPEs use the *same* DTD and so the document
should validate (or not validate) consistently, right?

I also tried copying the DTD from www.w3.org to my server and then
modifying the DOCTYPE accordingly, but I still got the same validation
errors.

The reason I'm doing this is that I want to use "XHTML Strict" *except*
for one small tweak I need to make to the DTD. But, before I can make
the tweak I need the document to validate against a local copy of the
DTD.

Can anyone explain why the different DOCTYPEs produce different
validation results, even though they use the same DTD?

Thanks....

Lachlan Hunt

unread,
Oct 5, 2005, 2:05:51 AM10/5/05
to
rbronson1976 wrote:
> Hello all,
>
> I have a very strange situation -- I have a page that validates (using
> http://validator.w3.org/) as "XHTML 1.0 Strict" just fine. This page
> uses this DOCTYPE:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>
> When I change the DOCTYPE to (what should be the equivalent):
> <!DOCTYPE html SYSTEM
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>
> I get several validation errors.
>
> The page I'm referring to is at:
> http://www.absolutejava.com/testing.html

The problem is that without the public identifier, the validator does
not know that the system identifier is referencing an XML DTD, rather
than an SGML DTD, and because the document is being served with the
wrong MIME type (text/html instead of application/xhtml+xml) falls back
to using SGML based validation.

If you change the MIME type sent by the server in the HTTP Content-Type
header, to an XML MIME type, then the validator should behave as expected.

> The reason I'm doing this is that I want to use "XHTML Strict" *except*
> for one small tweak I need to make to the DTD. But, before I can make
> the tweak I need the document to validate against a local copy of the
> DTD.

Ignoring the question of why you want to modify the DTD, you should
consider using HTML and modifying the HTML 4.01 Strict DTD, rather than
trying to use XHTML incorrectly.

http://www.cs.tut.fi/~jkorpela/html/own-dtd.html

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox

rbronson1976

unread,
Oct 5, 2005, 12:46:58 PM10/5/05
to
Lachlan,

The public identifier, to my knowledge, is optional. The SYSTEM
identifier is what provides the DTD.

I've also tried other (custom) PUBLIC Identifiers, to no avail. For
example, I tried this DOCTYPE, but I got the same validation failures.

<!DOCTYPE html PUBLIC "-//ABS//DTD XHTML 1.0 Strict Special Tweak//EN"
"http://www.absolutejava.com/DTD/xhtml1-strict.dtd">

You also said, "...the validator does not know that the system


identifier is referencing an XML DTD, rather than an SGML DTD, and

because the document is being served with the wrong MIME type..."

That's an odd thing to say considering the document *DOES* validate
correctly if I use the first DOCTYPE, even though, according to you, I
am using the wrong MIME type (text/html). So, if the MIME type is
causing the problem, why doesn't it cause a problem with the first
DOCTYPE?

I do not believe the "text/html" MIME type is wrong or causing the
problem, although it is not preferred for XHTML. According to
http://www.w3.org/TR/xhtml-media-types, "...the use of 'text/html'
SHOULD be limited to HTML-compatible XHTML 1.0 documents."

In addition, "... XHTML Documents which follow the guidelines set forth
in Appendix C, 'HTML Compatibility Guidelines' may be labeled with the
Internet Media Type "text/html", as they are compatible with most HTML
browsers."

So, I don't think you can say I'm using the wrong MIME type.

Finally, I did try changing the MIME type in the <meta> element to
"application/xhtml+xml" but the same problem occurred.

If anyone *really* knows why I am getting these validation errors,
please respond...but no more half-baked, useless guesses, please.

Andreas Prilop

unread,
Oct 5, 2005, 12:50:23 PM10/5/05
to
On 5 Oct 2005, rbronson1976 wrote:

> Finally, I did try changing the MIME type in the <meta> element to
> "application/xhtml+xml" but the same problem occurred.

The MIME type is *not* set in the META thingy; it is set in the HTTP
header!

rbronson1976

unread,
Oct 5, 2005, 2:08:20 PM10/5/05
to
Really? That's very interesting. I know there is an HTTP header that
specifies content-type, BUT, there is also a <meta> tag (a.k.a.,
"thingy") that supplements the HTTP headers.

So, if the HTTP header, proper, indicates content-type of 'text/html'
and the <meta> tag indicates something else, which one should a user
agent accept? And, which W3 spec indicates this?

Lachlan Hunt

unread,
Oct 5, 2005, 7:16:22 PM10/5/05
to
rbronson1976 wrote:
> The public identifier, to my knowledge, is optional.

Yes, technically, it is according to the XML rec.

> The SYSTEM identifier is what provides the DTD.

Yes, it references the external DTD.

> I've also tried other (custom) PUBLIC Identifiers, to no avail.

You misunderstood what I meant. The validator switches to XML mode for
known PUBLIC identifiers for XML documents, such as XHTML, regardless of
the MIME type. Since the validator, obviously, does not know about your
custom PUBLIC identifier, it does not know that it should continue in
XML mode and, because it was served as text/html, defaults to SGML mode.
Using an XML MIME type, it uses XML mode.

> You also said, "...the validator does not know that the system
> identifier is referencing an XML DTD, rather than an SGML DTD, and
> because the document is being served with the wrong MIME type..."
>
> That's an odd thing to say considering the document *DOES* validate
> correctly if I use the first DOCTYPE, even though, according to you, I
> am using the wrong MIME type (text/html).

That's because the validator knows the XHTML DOCTYPEs

> So, if the MIME type is causing the problem, why doesn't it cause a
> problem with the first DOCTYPE?

Because, upon encountering a document with a known XML DOCTYPE, the
validator knows that it should continue in XML mode.

> I do not believe the "text/html" MIME type is wrong or causing the

> problem...


> In addition, "... XHTML Documents which follow the guidelines set forth
> in Appendix C, 'HTML Compatibility Guidelines' may be labeled with the
> Internet Media Type "text/html", as they are compatible with most HTML
> browsers."

Although it is allowed by the recommendation, you should be aware that
doing so is considered harmful.

> So, I don't think you can say I'm using the wrong MIME type.

No, it is the *wrong* MIME type, even if it is technically allowed under
certain conditions.

> Finally, I did try changing the MIME type in the <meta> element to
> "application/xhtml+xml" but the same problem occurred.

Change the MIME type in the HTTP headers, the meta element is only
useful for setting the charset in text/html documents, when the charset
parameter has been omitted from the HTTP Content-Type header, or when
the file is not being served over HTTP, or other protocol with such
information available.

In the HTTP headers, for HTML, use:
Content-Type: text/html; charset=XXX
(where XXX is whatever encoding you have used)

For XHTML, use:
Content-Type: application/xhtml+xml

(XML documents are self describing and don't need charset information in
the HTTP headers)

> If anyone *really* knows why I am getting these validation errors,
> please respond...but no more half-baked, useless guesses, please.

I do *really* know why you are getting these validation errors, it was
not a "half-baked, useless guess". If you can't remain civil in the
future and not insult those that choose to take the time to assist you,
simply because you failed to understand the advice given, then don't
expect too much from anyone else in the future.

rbronson1976

unread,
Oct 5, 2005, 9:52:07 PM10/5/05
to
Lachlan,

I'm sorry for being so snotty in my previous post -- this is all just
very frustrating and in the past I've found it's not uncommon for
people who seem to know nothing about a topic to post useless replies.
>From your reply I see that you do seem to know what you're talking
about....sorry again.

Anyway, for anyone that may be reading, I took Lachlan's advice and I'd
like to describe what I found -- I also have one final question.

Lachlan seems to be correct regarding the use of a "known public
identifier" (e.g., "-//W3C//DTD XHTML 1.0 Strict//EN") -- When a "known
public identifier" is used in the DOCTYPE it causes the validator to go
into "XML mode", even if the content type of the document is non-XML
(e.g., "text/html"). In fact, using a "known public identifier" seems
to cause the validator to ignore the system identifier completely! For
example, using the following DOCTYPE, my document validated just fine:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "Queen
Victoria">

It seems that once the validator recognized the public identifier (
"-//W3C//DTD XHTML 1.0 Strict//EN", in this case) it used some existing
copy of the "xhtml1-strict.dtd" DTD to validate the document. It did
*not* consult the DTD in the system identifier ("Queen Victoria", in
this case), but instead, completely ignores it.

This leads to my last question: Is it possible to use a "known public
identifier" *AND* still tell the validator to use a custom DTD? In
other words, I want to use a "known public identifier" so that the
validator goes into "XML mode" BUT I want it to use my version of the
DTD, not the one it has cached somewhere. Or, equivalently, can I use a
custom, unknown public identifier yet still somehow force the validator
into "XML mode"?

As a final note, I find I am able to get the validator to use my custom
DTD but to do so I have to specify a "custom" public identifier *AND* I
have to serve the document as an XML document (e.g.,
"application/xhtml+xml") so that the validator stays in XML mode, just
as Lachlan indicated. The only reason I prefer to serve as "text/html"
is that IE 6, as you probably know, does not understand
"application/xhtml+xml".

Okay, thanks for the replies. At least I can get my documents to
validate using a custom DTD, which is much farther than I was 24 hours
ago.

Lachlan Hunt

unread,
Oct 5, 2005, 11:31:27 PM10/5/05
to
rbronson1976 wrote:
> Lachlan seems to be correct regarding the use of a "known public
> identifier" (e.g., "-//W3C//DTD XHTML 1.0 Strict//EN") -- When a "known
> public identifier" is used in the DOCTYPE it causes the validator to go
> into "XML mode", even if the content type of the document is non-XML
> (e.g., "text/html"). In fact, using a "known public identifier" seems
> to cause the validator to ignore the system identifier completely! For
> example, using the following DOCTYPE, my document validated just fine:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "Queen
> Victoria">

The system identifier only needs to be dereferenced by a validating user
agent when it does not contain the public identifier within its catalogue.

> It seems that once the validator recognized the public identifier (
> "-//W3C//DTD XHTML 1.0 Strict//EN", in this case) it used some existing
> copy of the "xhtml1-strict.dtd" DTD to validate the document.

That's correct.

> It did *not* consult the DTD in the system identifier ("Queen Victoria", in
> this case), but instead, completely ignores it.

Ignoring the fact that the SI needs to be a URI, that is essentially
correct.

> This leads to my last question: Is it possible to use a "known public
> identifier" *AND* still tell the validator to use a custom DTD?

No. When you use a public identifier, it is expected that the DTD
referenced by the SI matches that identified by the public identifier.
You need to use <!DOCTYPE SYSTEM "http://...">, but, for the purpose of
validation, you also need to use an XML validator, not an SGML
validator. In the case of the W3 validator, XML mode is triggered by an
XML MIME type, which is the correct way to do what you want.

However, you can make use of another validator, like Page Valet [1],
that allows to to manually select XML validation, if you choose to
ignore the fact that by serving as text/html, your document will not be
treated as XML by any other UA.

Not only will Page Valet allow you do force XML mode, but it's also a
much better XML validator than the W3 validator, which is just an SGML
validator with a few patches to make it act like an XML validator with
"some limitations".

I recommend that, unless you have a really compelling reason to continue
using XHTML on the client side, that you deliver HTML 4.01 with a custom
DTD instead. If your authoring tool/process benefits from using XHTML,
that's fine, you can continue to use XHTML on the back end, but you
should consider transforming it to HTML for the client.

[1] http://valet.webthing.com/page/

Andreas Prilop

unread,
Oct 6, 2005, 8:09:44 AM10/6/05
to
On 5 Oct 2005, rbronson1976 wrote:

> Organization: http://groups.google.com
> User-Agent: G2/0.2

The innocents abroad.

> Really? That's very interesting.

What? What is interesting?
Please quote the statement you refer to!

http://www.xs4all.nl/~wijnands/nnq/nquote.html
http://www.netmeister.org/news/learn2quote.html

Eric B. Bednarz

unread,
Oct 7, 2005, 1:00:26 PM10/7/05
to
Lachlan Hunt <spam.m...@gmail.com> wrote:

> In the case of the W3
> validator, XML mode is triggered by an XML MIME type, which is the
> correct way to do what you want.

I wouldn't know of any 'XML mode', this is basically not a question of
XML versus SGML but a question of locating an SGML declaration.

The SGML declaration for XML

<http://validator.w3.org/sgml-lib/xml.dcl>

is -- dramatically -- different from the one for HTML 4

<http://validator.w3.org/sgml-lib/REC-html401-19991224/HTML4.decl>

which in turn is slightly different from e.g. the one for HTML 3

<http://validator.w3.org/sgml-lib/REC-html32-19970114/HTML32.dcl>

And so on.

Using a custom DTD for validation on a remote system is likely to get
you in trouble sooner or later if you don't know the default
declaration which will be choosen in advance (pick a card, and jolly
good luck).


On a side note, it's no good to draw conclusions from 'how stuff works'
by observing some particular behaviour in the wild; whether or not the
FPI OVERRIDEs the sytem identifier is just something else to be
configured in the catalog, see e.g.

<http://validator.w3.org/sgml-lib/REC-html401-19991224/HTML4.cat>

Id est, on a different validating system, the OP and Queen Victoria
might encounter quite different behaviour.

(On yet another side note, if you, for example, always use the html
4.01 strict dtd, you'd add something sensible like

doctype html strict.dtd

to your catalog and could free all your documents from the obsolete
cruft and just use <!doctype html system> for validation purposes.
It's utterly silly to want remote validation after publication;
validation, if at all, is useful in the production process, starting
with a local validating system and an editor that can read the catalog
and the dtd as well.)

> Not only will Page Valet allow you do force XML mode, but it's
> also a much better XML validator

It isn't a question of 'better' but rather yes or no. Page valet lets
you choose an XML parser, the w3c validator doesn't. Thus the latter
isn't an 'XML validator' (validating XML processor) at all.


--
Goodbye and keep cold

0 new messages