Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Shortest possible valid HTML document?

0 views
Skip to first unread message

Stewart Gordon

unread,
May 30, 2006, 12:26:22 PM5/30/06
to
I've just been experimenting with creating the shortest possible HTML
document that passes validation. Here's what I've come up with:

----------
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"><title//s
----------

Of course, it's only valid given a character encoding in the HTTP
headers. And the W3C validator's direct input interface happens to
treat there as being one....

Further challenges:

1. Find a browser that renders this correctly!

2. Find the shortest valid code in each version of (X)HTML.

Stewart.

Michael Winter

unread,
May 30, 2006, 4:00:10 PM5/30/06
to
On 30/05/2006 17:26, Stewart Gordon wrote:

> I've just been experimenting with creating the shortest possible HTML
> document that passes validation.

We've done that before. It was Toby's challenge, and I seem to remember
'winning' (though Jukka picked a few holes). :-)

> Here's what I've come up with:
>
> ----------
> <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"><title//s

It can be shorter. See

Subject: Minimal HTML
Author: Toby Inkster
Date: 2004-11-24 21:56:10
Message-ID: pan.2004.11.24...@tobyinkster.co.uk

and the discussion that followed. Note that if you read it though Google
Groups, the post order is slightly messed up.

[snip]

> 1. Find a browser that renders this correctly!

None of the major ones do (I mention that in the thread). You'd have to
find a one implementing a proper SGML parser.

> 2. Find the shortest valid code in each version of (X)HTML.

They'll all be variations on the same theme, just changing the FPI and
coping with required content. For example, in Strict document types,
your 's' (my '.') could be replaced with <p// (HTML) or <p/> (XHTML).

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.

Stewart Gordon

unread,
May 31, 2006, 9:57:51 AM5/31/06
to
Michael Winter wrote:
> On 30/05/2006 17:26, Stewart Gordon wrote:
>
>> I've just been experimenting with creating the shortest possible HTML
>> document that passes validation.
<snip>

>> <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"><title//s
>
> It can be shorter. See
>
> Subject: Minimal HTML
> Author: Toby Inkster
> Date: 2004-11-24 21:56:10
> Message-ID: pan.2004.11.24...@tobyinkster.co.uk

It's a shame message IDs look exactly like email addresses! NTS
SeaMonkey wanted to address an email to it when I clicked it.

> and the discussion that followed. Note that if you read it though Google
> Groups, the post order is slightly messed up.
<snip>

Indeed. But for anyone who wants to get at it quickly anyway:

http://tinyurl.com/rz52q

I'd no idea that there was a version of HTML that's just called HTML.
But that discussion has helped me to get down to 49 characters:

<!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.

Now try and beat that!

Stewart.

Michael Winter

unread,
May 31, 2006, 11:15:22 AM5/31/06
to
On 31/05/2006 14:57, Stewart Gordon wrote:

[snip]

> <!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.

^^
The SGML grammar requires at least one white space character between the
'SYSTEM' literal (or public identifier) and the system identifier, so I
don't think that quite qualifies. :-P

[73] external identifier (10.1.6, 379:1) =
( ( "SYSTEM"
| ( "PUBLIC",
+ps [65],
public identifier [74] ) ),
?( +ps [65],
system identifier [75] ) )

-- #73 External Identifier, SGML Productions[1]

> Now try and beat that!

As far as I know, the system identifier is a URI reference so a relative
reference (such as a single letter) is permissible, but I could very
well be wrong. In the previous discussion, Jukka seemed to imply that an
absolute URL was necessary, but that could have been due to a limitation
of the W3C Validator (which was the required validator for the
challenge). It is based loosely upon James Clark's SP which only
supports the http scheme (or so the documentation says[2]).

So, if a relative reference is allowed, surely /the/ shortest has to be:

<!DOCTYPE html SYSTEM "d"><title//. (35 bytes)

?

I can't believe I'm doing this again. *shakes head*

Mike


[1] SGML Productions
<ftp://ftp.ifi.uio.no/pub/SGML/productions>
[2] "System identifiers" in SP
<http://www.jclark.com/sp/sysid.htm>

David Håsäther

unread,
May 31, 2006, 11:56:31 AM5/31/06
to
Michael Winter <m.wi...@blueyonder.co.uk> wrote:

> On 31/05/2006 14:57, Stewart Gordon wrote:
>
> [snip]
>
>> <!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.
> ^^
> The SGML grammar requires at least one white space character
> between the 'SYSTEM' literal (or public identifier) and the system
> identifier, so I don't think that quite qualifies. :-P
>
> [73] external identifier (10.1.6, 379:1) =
> ( ( "SYSTEM"
> | ( "PUBLIC",
> +ps [65],
> public identifier [74] ) ),
> ?( +ps [65],
> system identifier [75] ) )

No, it requires a _parameter separator_. However, those are not
required in all circumstances. The SGML Handbook (372:15) says this:

| A required ps that is adjacent to a delimiter or another ps can be
| can be omitted if no ambiguity would be created thereby.

Therefore, the document type declaration above is correct.

> As far as I know, the system identifier is a URI reference so a
> relative reference (such as a single letter) is permissible, but I
> could very well be wrong.

You're right.

> In the previous discussion, Jukka seemed
> to imply that an absolute URL was necessary, but that could have
> been due to a limitation of the W3C Validator (which was the
> required validator for the challenge).

Yes, I believe the W3C validator only supports absolute URIs.

> So, if a relative reference is allowed, surely /the/ shortest has
> to be:
>
> <!DOCTYPE html SYSTEM "d"><title//. (35 bytes)

With a properly set up catalog, you can do it even shorter since e.g.
"<!DOCTYPE HTML>" is a syntactically correct document type declaration.
Something like the following should be able to validate against any
HTML DTD:

<!doctype p><p>

Again, this needs a properly set up catalog.

I'm not going to dig deeper into this though, since I don't really see
the point in this exercise :-)

--
David Håsäther

Michael Winter

unread,
May 31, 2006, 12:27:39 PM5/31/06
to
On 31/05/2006 16:56, David Håsäther wrote:

> Michael Winter <m.wi...@blueyonder.co.uk> wrote:

[snip]

>> The SGML grammar requires at least one white space character
>> between the 'SYSTEM' literal (or public identifier) and the system
>> identifier, so I don't think that quite qualifies. :-P

[snip]

> No, it requires a _parameter separator_.

Yes, I went too far there. Even /if/ the separator couldn't be omitted,
it wouldn't necessarily require a white space character; anything
matching that production should do.

> However, those are not required in all circumstances.

I stand corrected on both counts. Thank you. :-)

[snip]

> <!doctype p><p>
>
> Again, this needs a properly set up catalog.

I wondered if there might be trickery along those lines. However, a(n
irrelevant) philosophical question: would that actually count as a valid
HTML document? Yes, it may validate against a HTML DTD, but without the
html element as the root element, is it still HTML, or just a fragment?

> I'm not going to dig deeper into this though, since I don't really
> see the point in this exercise :-)

It's an interesting diversion (and more pleasant than the content of
another composition window I have open :-( ).

Mike

David Håsäther

unread,
May 31, 2006, 3:44:25 PM5/31/06
to
Michael Winter <m.wi...@blueyonder.co.uk> wrote:

>> <!doctype p><p>

[...]

> I wondered if there might be trickery along those lines. However,
> a(n irrelevant) philosophical question: would that actually count
> as a valid HTML document? Yes, it may validate against a HTML DTD,
> but without the html element as the root element, is it still
> HTML, or just a fragment?

Actually, it's impossible to tell from the doctype declaration alone,
whether a document is this or that. For a lengthier discussion on this
topic, see
<http://groups.google.com/group/comp.text.sgml/msg/c3e53dee2c152a81>

--
David Håsäther

Stewart Gordon

unread,
Jun 1, 2006, 10:32:20 AM6/1/06
to
David Håsäther wrote:
<snip>

> No, it requires a _parameter separator_. However, those are not
> required in all circumstances. The SGML Handbook (372:15) says this:
>
> | A required ps that is adjacent to a delimiter or another ps can be
> | can be omitted if no ambiguity would be created thereby.
<snip>

"A required ps" ... "can be omitted". What contradiction.

Stewart.

Andy Dingley <dingbat@codesmiths.com>

unread,
Jun 1, 2006, 11:20:10 AM6/1/06
to

It's not a contradiction, it's a question of levels in the structure.
At a high level it's required (parameters must be distinguishable), at
a low lexical level it isn't (something else already makes them
distinct).

Toby Inkster

unread,
Jun 1, 2006, 8:08:26 PM6/1/06
to
David Håsäther wrote:

> <!doctype p><p>

Is this really HTML any more though, or just flavourless SGML?

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

David Håsäther

unread,
Jun 2, 2006, 6:51:18 AM6/2/06
to
Toby Inkster <usenet...@tobyinkster.co.uk> wrote:

> David HÃĨsÃĪther wrote:
>
>> <!doctype p><p>
>
> Is this really HTML any more though, or just flavourless SGML?

Could be HTML. Could be something else. See my reply to Michael Winter:
<http://groups.google.com/group/alt.html/msg/42cc2dfe6c71dd9f>


--
David Håsäther

0 new messages