Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Case sensitivity of XHTML tags - where, how is this enforced?

0 views
Skip to first unread message

Karl Smith

unread,
Sep 2, 2002, 10:09:01 AM9/2/02
to
HTML tags are not case sensitive, <p> is the same thing as <P>. AIUI,
the HTML parser internally maps p->P, rather than P->p, but that isn't
really relevant.

XML tags are case sensitive, so <p>...</P> is not well-formed XML and
should halt parsing. XHTML specifies thy tags shall be lowercase.

Generally: where, and how, in the mysterious workings of validators and
DTDs is this restriction expressed?

Specifically: If I want to hack a copy of 15445.dtd to include the
requirement for lowercase tags only, howzeetdun?


--
Something about Britney falling into my lap,
if I just wait long enough...
falsely attributed to JRE, who now denies saying it.

Klaus Johannes Rusch

unread,
Sep 2, 2002, 10:54:42 AM9/2/02
to
Karl Smith wrote:
>
> HTML tags are not case sensitive, <p> is the same thing as <P>. AIUI,
> the HTML parser internally maps p->P, rather than P->p, but that isn't
> really relevant.
>
> XML tags are case sensitive, so <p>...</P> is not well-formed XML and
> should halt parsing. XHTML specifies thy tags shall be lowercase.
>
> Generally: where, and how, in the mysterious workings of validators and
> DTDs is this restriction expressed?
>
> Specifically: If I want to hack a copy of 15445.dtd to include the
> requirement for lowercase tags only, howzeetdun?

For an XML validator, regardless of the DTD used the element names are
always case sensitive (<p> </P> is, as you rightly noted, not
well-formed, regardless of an applicable DTD).

For an SGML validator, you need to include an SGML declaration
specifying case sensitivity. The same is true for other SGML features
like the ability to omit tags. See HTML4.decl included with the HTML 4
specification for details.

--
Klaus Johannes Rusch
Klaus...@atmedia.net
http://www.atmedia.net/KlausRusch/

Alan J. Flavell

unread,
Sep 2, 2002, 10:51:45 AM9/2/02
to
On Sep 2, Karl Smith inscribed on the eternal scroll:

> Generally: where, and how, in the mysterious workings of validators and
> DTDs is this restriction expressed?

None of the above. It's in the even more mysterious "SGML
declaration", see for example
http://www.w3.org/TR/html401/sgml/sgmldecl.html
NAMECASE GENERAL YES
ENTITY NO

> Specifically: If I want to hack a copy of 15445.dtd to include the
> requirement for lowercase tags only, howzeetdun?

If you have a DTD which defines them as upper case (as indeed you
have), then you'd not only have to negate the case-folding in the SGML
declaration, but also convert the definitions all to lower-case in the
DTD. That goes for attribute tokens (like ALIGN=RIGHT), as well as for
the tag names themselves. I suspect there'd be a sneaky way of doing
that conversion with SP, but I don't know what it is.

I don't think the online validators make provision for you to use a
custom SGML declaration (as opposed to a custom DTD). You'd need to
install your own copy of the validator, presumably.

E&OE, maybe someone more competent will step up with a better answer.


Karl Smith

unread,
Sep 2, 2002, 11:48:02 AM9/2/02
to
Klaus Johannes Rusch wrote:

> For an XML validator, regardless of the DTD used the element names are
> always case sensitive (<p> </P> is, as you rightly noted, not
> well-formed, regardless of an applicable DTD).

Does this mean that an XML validator does not use the SGML declaration
file? Simply removing the tag ommission from the DTD would make it
usable with an XML validator?

But in the case of this particular DTD, all the element names are
written in UPPERCASE (hiss), but if hypothetically they were written in
lowercase, would that be all I needed to do?

Karl Smith

unread,
Sep 2, 2002, 12:23:58 PM9/2/02
to
Alan J. Flavell wrote:
> On Sep 2, Karl Smith inscribed on the eternal scroll:
>
>
>>Generally: where, and how, in the mysterious workings of validators and
>>DTDs is this restriction expressed?
>
>
> None of the above. It's in the even more mysterious "SGML
> declaration", see for example
> http://www.w3.org/TR/html401/sgml/sgmldecl.html
> NAMECASE GENERAL YES
> ENTITY NO

Further clue required please.

NAMECASE GENERAL NO

Will that make it case sensitive?


>>Specifically: If I want to hack a copy of 15445.dtd to include the
>>requirement for lowercase tags only, howzeetdun?
>
>
> If you have a DTD which defines them as upper case (as indeed you
> have), then you'd not only have to negate the case-folding in the SGML
> declaration, but also convert the definitions all to lower-case in the
> DTD. That goes for attribute tokens (like ALIGN=RIGHT), as well as for
> the tag names themselves. I suspect there'd be a sneaky way of doing
> that conversion with SP, but I don't know what it is.

Nah, I'll just work through it "by hand", I can remove such abominations
as HR while I'm at it.


> I don't think the online validators make provision for you to use a
> custom SGML declaration (as opposed to a custom DTD). You'd need to
> install your own copy of the validator, presumably.

There's always a catch, isn't there?

Nick Kew

unread,
Sep 2, 2002, 1:24:56 PM9/2/02
to
In article <3D738832...@domain.invalid>, one of infinite monkeys

at the keyboard of Karl Smith <user...@domain.invalid> wrote:

> Does this mean that an XML validator does not use the SGML declaration
> file?

The inference is dubious, but your conclusion is valid.

> Simply removing the tag ommission from the DTD would make it
> usable with an XML validator?

No. You'd need to do the kind of exercise the W3C did in deriving
XHTML1.0 from HTML4. In fact, not even that will work, because
ISO HTML uses implicit elements (not permitted in XML) to enforce
structure.

> But in the case of this particular DTD, all the element names are
> written in UPPERCASE (hiss),

Well that at least you could change with one line of Perl.

--
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Arjun Ray

unread,
Sep 2, 2002, 7:20:39 PM9/2/02
to
In <3D738832...@domain.invalid>, Karl Smith
<user...@domain.invalid> wrote:

| Does this mean that an XML validator does not use the SGML declaration
| file?

Not exactly. XML is an SGML profile in the sense that, from an SGML
perspective, XML is a particular specification of the contents of an
SGML declaration. A generic SGML parser/validator would still have to
read in such an SGML declaration to parse XML-ized documents, but an XML
parser/validator need not - and in general, does not - because the SGML
declaration in this case is already "known" - being fixed for all XML
applications - and can be hard-coded into the software.

| Simply removing the tag ommission from the DTD would make it usable
| with an XML validator?

Mostly. SGML (and perhaps XML) goofed here. For one thing, the XML
spec does not even recognize the omissibility parameters in an element
type declaration, so their presence would choke an XML parser. So, yes,
one would have to remove the parameters to make a DTD usable by an XML
parser/validator.

But, for another thing, in SGML these omissibility parameters are legal
only if OMITTAG is YES. This is a nasty dependence of DTD syntax itself
on a setting in the relevant SGML declaration. It would be much more
convenient if one could say OMITTAG NO in the SGML declaration and then
have an SGML parser simply ignore the omissibility parameters if and
when found, instead simply enforce the rule in the instance; but this
isn't possible.


| But in the case of this particular DTD, all the element names are
| written in UPPERCASE (hiss), but if hypothetically they were written
| in lowercase, would that be all I needed to do?

At a minimum, yes - but not the keywords (ELEMENT, ATTLIST, etc.) too!
You would also have to ensure no content models with inclusions or
exclusions, exapnd all declarations with name groups for associated
element types into individual declarations for each element type, and
finally account for the possibility that some element types in the DTD
were *meant* to have their tags omitted in the instance (a nasty design
hack, found all too often.)

Alan J. Flavell

unread,
Sep 3, 2002, 10:11:54 AM9/3/02
to
On Sep 3, Karl Smith inscribed on the eternal scroll:

> NAMECASE GENERAL NO
>
> Will that make it case sensitive?

I reckon so...

> >>Specifically: If I want to hack a copy of 15445.dtd to include the
> >>requirement for lowercase tags only, howzeetdun?

I'll have to excuse myself that my previous answer was on the basis
of doing nothing more than forcing ISO-HTML into lower-case - but
otherwise retaining the HTML flavour.

If in fact you wanted to validate according to an ISO-HTML-flavoured
XHTML (to coin a phrase), then the answers you got from other folks
are probably closer to what you want.

Just in case there's any misunderstanding: HTML and XHTML are
fudamentally incompatible at the SGML level, if only because of the
self-closing tags. The compatibility which is discussed in the
notorious "Appendix C" is a form of compatibility for typical web
browsers, _not_ for formal SGML validators. For example <br /> means
something entirely different to an SGML validator applying an HTML DTD
(see recent discussions of "SHORTTAG" and "NET"), than it does to
XHTML.

cheers

--
A: Top posting
Q: What is the most irritating thing on Usenet?
- "Gordon" on apihna


Karl Smith

unread,
Sep 3, 2002, 10:42:59 AM9/3/02
to
Alan J. Flavell wrote:
> On Sep 3, Karl Smith inscribed on the eternal scroll:
>
>
>> NAMECASE GENERAL NO
>>
>>Will that make it case sensitive?
>
>
> I reckon so...
>
>
>>>>Specifically: If I want to hack a copy of 15445.dtd to include the
>>>>requirement for lowercase tags only, howzeetdun?
>
> I'll have to excuse myself that my previous answer was on the basis
> of doing nothing more than forcing ISO-HTML into lower-case - but
> otherwise retaining the HTML flavour.

Yes that is what I wanted to do. I searched through the DTD for
something like the switch above but couldn't find it. So I didn't know
whether such a thing existed or whether HTML must always be
case-insensitive. Your answer to look in the .dec file was exactly what
I wanted to know. Thanks.


> If in fact you wanted to validate according to an ISO-HTML-flavoured
> XHTML (to coin a phrase), then the answers you got from other folks
> are probably closer to what you want.

No. I probably shouldn't have mentioned XHTML at all.


I'm sorry, did you mention "Appendix C"... GRR.

0 new messages