Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[OT] HTML5 for standards wonks like Keith (aka xhtml2 is dead long live html5)

2 views
Skip to first unread message

Philip Chee

unread,
Jul 7, 2009, 5:46:06 AM7/7/09
to
An Unofficial Q&A about the Discontinuation of the XHTML2 WG
<http://hsivonen.iki.fi/xhtml2-html5-q-and-a/>

Phil

--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.

Keith F. Lynch

unread,
Jul 7, 2009, 8:09:16 PM7/7/09
to
Philip Chee <phi...@aleytys.pc.my> wrote:
> An Unofficial Q&A about the Discontinuation of the XHTML2 WG
> <http://hsivonen.iki.fi/xhtml2-html5-q-and-a/>

Does W3C have any recommendations on HTML intended for archival
purposes, i.e. web pages that will never be touched after being
created? Or is it all about whatever is new and shiny, with nobody
caring about last week's content? Thanks.
--
Keith F. Lynch - http://keithlynch.net/
Please see http://keithlynch.net/email.html before emailing me.

Philip Chee

unread,
Jul 8, 2009, 4:54:48 AM7/8/09
to
On 7 Jul 2009 20:09:16 -0400, Keith F. Lynch wrote:
> Philip Chee <phi...@aleytys.pc.my> wrote:
>> An Unofficial Q&A about the Discontinuation of the XHTML2 WG
>> <http://hsivonen.iki.fi/xhtml2-html5-q-and-a/>
>
> Does W3C have any recommendations on HTML intended for archival
> purposes, i.e. web pages that will never be touched after being
> created? Or is it all about whatever is new and shiny, with nobody
> caring about last week's content? Thanks.

Anything which is strictly standards compliant now will continue to
work. The problem with XHTML2 was that it wasn't backward compatible
with anything.

<http://dbaron.org/log/20090707-ex-html>

Tim McDaniel

unread,
Jul 8, 2009, 12:38:34 PM7/8/09
to
In article <2uucc0....@news.alt.net>,

Philip Chee <phi...@aleytys.pc.my> wrote:
>On 7 Jul 2009 20:09:16 -0400, Keith F. Lynch wrote:
>> Philip Chee <phi...@aleytys.pc.my> wrote:
>>> An Unofficial Q&A about the Discontinuation of the XHTML2 WG
>>> <http://hsivonen.iki.fi/xhtml2-html5-q-and-a/>
>>
>> Does W3C have any recommendations on HTML intended for archival
>> purposes, i.e. web pages that will never be touched after being
>> created? Or is it all about whatever is new and shiny, with nobody
>> caring about last week's content? Thanks.
>
>Anything which is strictly standards compliant now will continue to
>work.

I'm afraid I took only a quick look at
<http://www.w3.org/TR/html5-diff/> and the HTML5 spec. Can you easily
explain this statement?

1.2 Backwards Compatible

HTML 5 is defined in a way that it is backwards compatible with
the way user agents handle deployed content. To keep the authoring
language relatively simple for authors several elements and
attributes are not included as outlined in the other sections of
this document, such as presentational elements that are better
dealt with using CSS.

User agents, however, will always have to support these older
elements and attributes and this is why the specification clearly
separates requirements for authors and user agents. ...

How does the spec define what to do with <center>, <marquee>, <a
name=...>, <table summary=...>, <meta http-equiv=...>, <img border="0"
...>, when they are "not included"? (I'm going by
<http://www.w3.org/TR/html5/the-xhtml-syntax.html#obsolete-features>
for a listing.) If they're mentioned in the spec, they're "included".

--
Tim McDaniel, tm...@panix.com

Philip Chee

unread,
Jul 8, 2009, 4:39:54 PM7/8/09
to
On Wed, 8 Jul 2009 16:38:34 +0000 (UTC), Tim McDaniel wrote:
> In article <2uucc0....@news.alt.net>,
> Philip Chee <phi...@aleytys.pc.my> wrote:

>>Anything which is strictly standards compliant now will continue to
>>work.
>
> I'm afraid I took only a quick look at
> <http://www.w3.org/TR/html5-diff/> and the HTML5 spec. Can you easily
> explain this statement?

No.

Ben Yalow

unread,
Jul 8, 2009, 5:43:43 PM7/8/09
to
In <h30o3c$47c$1...@panix3.panix.com> "Keith F. Lynch" <k...@KeithLynch.net> writes:

>Philip Chee <phi...@aleytys.pc.my> wrote:
>> An Unofficial Q&A about the Discontinuation of the XHTML2 WG
>> <http://hsivonen.iki.fi/xhtml2-html5-q-and-a/>

>Does W3C have any recommendations on HTML intended for archival
>purposes, i.e. web pages that will never be touched after being
>created? Or is it all about whatever is new and shiny, with nobody
>caring about last week's content? Thanks.

Nothing lasts forever.

But specifying the DTD helps.

>--
>Keith F. Lynch - http://keithlynch.net/
>Please see http://keithlynch.net/email.html before emailing me.

Ben
--
Ben Yalow yb...@panix.com
Not speaking for anybody

Keith F. Lynch

unread,
Jul 8, 2009, 9:18:43 PM7/8/09
to
Ben Yalow <yb...@panix.com> wrote:
> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>> Does W3C have any recommendations on HTML intended for archival
>> purposes, i.e. web pages that will never be touched after being
>> created? Or is it all about whatever is new and shiny, with nobody
>> caring about last week's content? Thanks.

> Nothing lasts forever.

> But specifying the DTD helps.

Nothing lasts forever, but I don't think there's any excuse for
continuing to let information be lost. This isn't the dark ages,
when it was thought reasonable destroy the last copy of some work
of antiquity to free up the vellum for yet another Bible.

There's no reason why every rasff post, for instance, shouldn't last
as long as civilization. And no reason why civilizations shouldn't
last trillions of eons, or perhaps much longer.

Keith F. Lynch

unread,
Jul 8, 2009, 9:45:50 PM7/8/09
to
Philip Chee <phi...@aleytys.pc.my> wrote:
> Anything which is strictly standards compliant now will continue
> to work.

Thanks.

> The problem with XHTML2 was that it wasn't backward compatible with
> anything.

So? Neither was the first version of HTML, or of anything else.
Neither, for instance, is digital television. I feel sorry for
anyone who created much content in XHTML2.

Ben Yalow

unread,
Jul 8, 2009, 11:15:27 PM7/8/09
to
In <h33ghj$g9u$1...@panix3.panix.com> "Keith F. Lynch" <k...@KeithLynch.net> writes:

>Ben Yalow <yb...@panix.com> wrote:
>> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>>> Does W3C have any recommendations on HTML intended for archival
>>> purposes, i.e. web pages that will never be touched after being
>>> created? Or is it all about whatever is new and shiny, with nobody
>>> caring about last week's content? Thanks.

>> Nothing lasts forever.

>> But specifying the DTD helps.

>Nothing lasts forever, but I don't think there's any excuse for
>continuing to let information be lost. This isn't the dark ages,
>when it was thought reasonable destroy the last copy of some work
>of antiquity to free up the vellum for yet another Bible.

It isn't lost. It just can't be displayed/understood without special
tools. And those tools may not be web browsers.

>There's no reason why every rasff post, for instance, shouldn't last
>as long as civilization. And no reason why civilizations shouldn't
>last trillions of eons, or perhaps much longer.

That depends on somebody being willing to pay Google enough to keep the
data they have (where "enough" is defined by Google -- so far, the amount
is $0).

>--
>Keith F. Lynch - http://keithlynch.net/
>Please see http://keithlynch.net/email.html before emailing me.

Ben

Philip Chee

unread,
Jul 8, 2009, 11:48:22 PM7/8/09
to
On 8 Jul 2009 21:45:50 -0400, Keith F. Lynch wrote:
> Philip Chee <phi...@aleytys.pc.my> wrote:
>> Anything which is strictly standards compliant now will continue
>> to work.
>
> Thanks.
>
>> The problem with XHTML2 was that it wasn't backward compatible with
>> anything.
>
> So? Neither was the first version of HTML, or of anything else.
> Neither, for instance, is digital television. I feel sorry for
> anyone who created much content in XHTML2.

All major browsers support what they call "quirks mode" so that version
should still work in these browsers. And as long as your current html
documents conform to HTML 4.01 Strict they will continue to be supported
indefinitely.

> I feel sorry for
> anyone who created much content in XHTML2.

Given that no major browser (IE, Firefox, Safari/Webkit, Opera) ever
implemented XHTML2, I have no sympathy for anyone who created any
contend in XHTML2 (as opposed to XHTML1).

Phil (imho xhtml1 was a solution looking for a problem)

Keith F. Lynch

unread,
Jul 9, 2009, 9:55:52 PM7/9/09
to
Ben Yalow <yb...@panix.com> wrote:
> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>> There's no reason why every rasff post, for instance, shouldn't
>> last as long as civilization. And no reason why civilizations
>> shouldn't last trillions of eons, or perhaps much longer.

> That depends on somebody being willing to pay Google enough to keep
> the data they have (where "enough" is defined by Google -- so far,
> the amount is $0).

They've made enough of a botch of their user interface that it's
almost worthless as is. For instance I was unable to retrieve
James Nicoll's original "The problem with defending the purity
of the English language" post recently, when the subject came up
in alt.folklore.urban.

Disk space is cheap enough today that it would be practical for
individuals to keep a copy of their complete Usenet database -- if
Google was willing to sell copies. It's not as if it was doing them
or anyone else any good where it is.

Brett Paul Dunbar

unread,
Jul 9, 2009, 11:15:12 PM7/9/09
to
In message <h32i29$ns4$1...@reader1.panix.com>, Tim McDaniel
<tm...@panix.com> writes

What it means is that, for backwards compatibility, a browser must
correctly display these elements, however they do not form part of the
spec for new code and should be shown as incorrect by an authoring
application. That is they must be supported by browsers but are strongly
deprecated as bad practice and should not be used. Effectively they are
incorrect but must be understood.
--
Great Internet Mersenne Prime Search http://www.mersenne.org/prime.htm
Livejournal http://brett-dunbar.livejournal.com/
Brett Paul Dunbar
To email me, use reply-to address

Ben Yalow

unread,
Jul 10, 2009, 9:45:18 PM7/10/09
to
In <h36738$845$1...@panix1.panix.com> "Keith F. Lynch" <k...@KeithLynch.net> writes:

>Ben Yalow <yb...@panix.com> wrote:
>> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>>> There's no reason why every rasff post, for instance, shouldn't
>>> last as long as civilization. And no reason why civilizations
>>> shouldn't last trillions of eons, or perhaps much longer.

>> That depends on somebody being willing to pay Google enough to keep
>> the data they have (where "enough" is defined by Google -- so far,
>> the amount is $0).

>They've made enough of a botch of their user interface that it's
>almost worthless as is. For instance I was unable to retrieve
>James Nicoll's original "The problem with defending the purity
>of the English language" post recently, when the subject came up
>in alt.folklore.urban.

>Disk space is cheap enough today that it would be practical for
>individuals to keep a copy of their complete Usenet database -- if
>Google was willing to sell copies. It's not as if it was doing them
>or anyone else any good where it is.

They clearly think it is. But feel free to ask them to quote a price.

And, of course, Usenet takes a lot of disk space -- pr0n and warez are
big. The text groups are small, of course -- but that's a tiny fraction
of Usenet.

>--
>Keith F. Lynch - http://keithlynch.net/
>Please see http://keithlynch.net/email.html before emailing me.

Ben

Keith F. Lynch

unread,
Jul 11, 2009, 2:12:53 PM7/11/09
to
Ben Yalow <yb...@panix.com> wrote:
> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>> They've made enough of a botch of their user interface that it's
>> almost worthless as is. For instance I was unable to retrieve
>> James Nicoll's original "The problem with defending the purity of
>> the English language" post recently, when the subject came up in
>> alt.folklore.urban.

>> Disk space is cheap enough today that it would be practical for
>> individuals to keep a copy of their complete Usenet database -- if
>> Google was willing to sell copies. It's not as if it was doing
>> them or anyone else any good where it is.

> They clearly think it is. But feel free to ask them to quote
> a price.

I may do so once my economic position improves.

> And, of course, Usenet takes a lot of disk space -- pr0n and warez
> are big. The text groups are small, of course -- but that's a tiny
> fraction of Usenet.

Does Google save the binaries, or just the text groups? I was
thinking only of the latter -- and mostly of the earlier postings. If
I were to dedicate a 1 terabyte drive to it, how many years of Usenet,
starting at the beginning, excluding binaries, would that hold? Would
it cover all of the '80s? All of the '90s? More? Does anyone know?

netcat

unread,
Jul 14, 2009, 10:22:29 AM7/14/09
to
In article <h3akn5$hmo$1...@panix3.panix.com>, k...@KeithLynch.net says...

> Does Google save the binaries, or just the text groups?

They don't have binaries.



> thinking only of the latter -- and mostly of the earlier postings. If
> I were to dedicate a 1 terabyte drive to it, how many years of Usenet,
> starting at the beginning, excluding binaries, would that hold? Would
> it cover all of the '80s? All of the '90s? More? Does anyone know?

Google says their archive has over a billion messages. What is a
reasonable average size of a Usenet post?

rgds,
netcat

cryptoguy

unread,
Jul 14, 2009, 10:40:22 AM7/14/09
to

When DejaNews first started, they included binaries in their archives.
However, these were soon dropped for space reasons. This was well
before Google's purchase of DN.

I'm another person who'd pay a reasonable fee for an unencumbered copy
Google's complete Usenet archives on DvDs or a hard drive, but only if
they included all the messages that have been dropped over the years.

pt

Ben Yalow

unread,
Jul 14, 2009, 10:58:20 AM7/14/09
to

Probably a couple of K. So the non-binaries will probably all fit onto a
few T. Not shippable over the net in any reasonable amount of time, but
trivial to move around on a directly connected hard drive.

>rgds,
>netcat

David Friedman

unread,
Jul 14, 2009, 11:20:36 AM7/14/09
to
In article <h3i6ec$oma$1...@reader1.panix.com>,
Ben Yalow <yb...@panix.com> wrote:

I wonder how practical it would be for them to sell selections--all
groups in a hierarchy, say, or the whole archive up to 1995, or ... .

--
http://www.daviddfriedman.com/ http://daviddfriedman.blogspot.com/
Author of
_Future Imperfect: Technology and Freedom in an Uncertain World_,
Cambridge University Press.

Michael Stemper

unread,
Jul 14, 2009, 12:55:19 PM7/14/09
to
In article <h38qre$hif$1...@reader1.panix.com>, Ben Yalow <yb...@panix.com> writes:
>In <h36738$845$1...@panix1.panix.com> "Keith F. Lynch" <k...@KeithLynch.net> writes:
>>Ben Yalow <yb...@panix.com> wrote:

>>> That depends on somebody being willing to pay Google enough to keep
>>> the data they have (where "enough" is defined by Google -- so far,
>>> the amount is $0).
>
>>They've made enough of a botch of their user interface that it's
>>almost worthless as is. For instance I was unable to retrieve
>>James Nicoll's original "The problem with defending the purity
>>of the English language" post recently, when the subject came up
>>in alt.folklore.urban.

>>Disk space is cheap enough today that it would be practical for
>>individuals to keep a copy of their complete Usenet database -- if
>>Google was willing to sell copies. It's not as if it was doing them
>>or anyone else any good where it is.

>And, of course, Usenet takes a lot of disk space -- pr0n and warez are

>big. The text groups are small, of course -- but that's a tiny fraction
>of Usenet.

Interestingly, rec.arts.sf.written is often listed by them as being
one of the top-ten traffic groups (as is another that I read).

--
Michael F. Stemper
#include <Standard_Disclaimer>
No animals were harmed in the composition of this message.

Keith F. Lynch

unread,
Jul 14, 2009, 10:16:56 PM7/14/09
to
David Friedman <dd...@daviddfriedman.nopsam.com> wrote:
> I wonder how practical it would be for them to sell selections--all
> groups in a hierarchy, say, or the whole archive up to 1995, or ... .

I wonder if anyone has asked them.

Keith F. Lynch

unread,
Jul 14, 2009, 10:18:23 PM7/14/09
to
cryptoguy <treif...@gmail.com> wrote:
> I'm another person who'd pay a reasonable fee for an unencumbered
> copy Google's complete Usenet archives on DvDs or a hard drive, but
> only if they included all the messages that have been dropped over
> the years.

You don't think they've permanently and irrevocably deleted those
messages from their archives?

David G. Bell

unread,
Jul 14, 2009, 10:49:49 AM7/14/09
to
On Tuesday, in article
<MPG.24c6c1204...@news.octanews.com>
net...@devnull.eridani.eol.ee "netcat" wrote:

> Google says their archive has over a billion messages. What is a
> reasonable average size of a Usenet post?

Back in the days when I was downloading newsgroups from Demon, over a
phone line, reckoning 2000 bytes per article was near enough for a
useful prediction of the time I'd need.

--
David G. Bell -- SF Fan, Filker, and Punslinger.

On the horizon, a carrier task force of the Salvation Navy was
turning into the wind, preparing to launch Zeppelins.

cryptoguy

unread,
Jul 15, 2009, 2:48:54 PM7/15/09
to
On Jul 14, 10:18 pm, "Keith F. Lynch" <k...@KeithLynch.net> wrote:

> cryptoguy <treifam...@gmail.com> wrote:
> > I'm another person who'd pay a reasonable fee for an unencumbered
> > copy Google's complete Usenet archives on DvDs or a hard drive, but
> > only if they included all the messages that have been dropped over
> > the years.
>
> You don't think they've permanently and irrevocably deleted those
> messages from their archives?

No, I don't.

Google isn't noted for either refusing to gather, or for throwing away
information. Short of a court order to delete it, I expect they still
have it.

pt

Seth

unread,
Jul 27, 2009, 1:16:53 PM7/27/09
to
In article <h3je9f$mp9$1...@panix1.panix.com>,

Keith F. Lynch <k...@KeithLynch.net> wrote:
>cryptoguy <treif...@gmail.com> wrote:
>> I'm another person who'd pay a reasonable fee for an unencumbered
>> copy Google's complete Usenet archives on DvDs or a hard drive, but
>> only if they included all the messages that have been dropped over
>> the years.
>
>You don't think they've permanently and irrevocably deleted those
>messages from their archives?

If you mean articles with expiration headers, they haven't deleted
them, they just won't show them to free searches.

Seth

Keith F. Lynch

unread,
Jul 27, 2009, 9:45:29 PM7/27/09
to
Seth <se...@panix.com> wrote:
> If you mean articles with expiration headers, they haven't deleted
> them, they just won't show them to free searches.

What about x-no-archive messages? What about old messages whose
authors asked Google to remove all of them?

Wwlcome back to rasff. I missed you. It's been six months and 14,000
messages. Are you reading *all* of them?

David G. Bell

unread,
Jul 28, 2009, 4:51:49 AM7/28/09
to
On Monday, in article <h4kne5$l76$1...@reader1.panix.com>
se...@panix.com "Seth" wrote:

It is rumoured that there are people who would sue if certain deleted
messages were published, and who routinely Google for their names
appearing. Long-time net-users in the UK may recall a court case
involving one of the pioneer ISPs in the UK.

(I also recall that the NNTP articles which triggered that case were
pretty vile stuff.)

0 new messages