Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Thesis on how to cope with incorrect HTML

1 view
Skip to first unread message

Dagfinn R. Parnas

unread,
Jan 23, 2003, 6:58:05 AM1/23/03
to
A year ago I wrote my master thesis on How to cope with incorrect HTML.
The main part of the thesis explains how one can parse incorrect HTML in
an orderly fashion. There is also some material on SGML->HTML link and the
first known statistic evaluation into invalid HTML (2.5 million webpages
validated). The statistics shows that 0.7 % of web pages on the net are
valid (statistics also include what kinds of errors occur)

All in all it should be an interesting read.

The thesis is available from
http://www.ub.uib.no/elpub/2001/h/413001/
in pdf format

and from
http://www.ii.uib.no/~dagfinn/hfag.ps
Postscript version (recommended)


At the same time I want to thank the people on this newsgroup who have
helped me, especially Arjun Ray and Jan Roland Eriksson.

Cheers
Dagfinn Parnas

Stephen Poley

unread,
Jan 23, 2003, 11:35:59 AM1/23/03
to
On Thu, 23 Jan 2003 12:58:05 +0100, "Dagfinn R. Parnas"
<dag...@ii.uib.no> wrote:

>A year ago I wrote my master thesis on How to cope with incorrect HTML.
>The main part of the thesis explains how one can parse incorrect HTML in
>an orderly fashion. There is also some material on SGML->HTML link and the
>first known statistic evaluation into invalid HTML (2.5 million webpages
>validated). The statistics shows that 0.7 % of web pages on the net are
>valid (statistics also include what kinds of errors occur)

Good grief! I knew things were bad, but not that bad.

--
Stephen Poley
Barendrecht, Holland

http://www.xs4all.nl/~sbpoley/webmatters/

Bertil Wennergren

unread,
Jan 23, 2003, 11:44:34 AM1/23/03
to
Stephen Poley:

>>The statistics shows that 0.7 % of web pages on the net are
>>valid

> Good grief! I knew things were bad, but not that bad.

0.7% valid pages sounds like a huge improvement. It used to be more like
0.0007%. Things are looking brighter!

--
Bertil Wennergren <bert...@gmx.net> <http://www.bertilow.com>

Brian

unread,
Jan 23, 2003, 12:33:58 PM1/23/03
to
Bertil Wennergren wrote:
>
>>> The statistics shows that 0.7 % of web pages on the net are
>>> valid
>
>> Good grief! I knew things were bad, but not that bad.
>
> 0.7% valid pages sounds like a huge improvement. It used to be more
> like 0.0007%. Things are looking brighter!

It does seem dreary, but until a year ago, I didn't validate my pages.
I didn't know such a thing existed. I used to produce a radio
program, and thought I was writing good code. Since discovering
online validators, I checked my old pages. Error-ridden!

--
Brian
follow the directions in my address to email me

Kris

unread,
Jan 23, 2003, 12:42:26 PM1/23/03
to
In article <aIVX9.10666$6G4.4285@sccrnsc02>,
Brian <br...@wfcr.deletethispart.org> wrote:

> >>> The statistics shows that 0.7 % of web pages on the net are
> >>> valid
> >
> >> Good grief! I knew things were bad, but not that bad.
> >
> > 0.7% valid pages sounds like a huge improvement. It used to be more
> > like 0.0007%. Things are looking brighter!
>
> It does seem dreary, but until a year ago, I didn't validate my pages.
> I didn't know such a thing existed. I used to produce a radio
> program, and thought I was writing good code. Since discovering
> online validators, I checked my old pages. Error-ridden!

Until a year ago, me either. Almost two years ago even, I knew nothing
about websites, HTML, CSS and the like. All I did was Flash. And not
even bad at it, if I may say so :)

Still, my mom is proud of me. :D

--
It's a web site Jim, but not as we know it.

andkonDOTcom

unread,
Jan 23, 2003, 3:19:34 PM1/23/03
to
"Dagfinn R. Parnas" <dag...@ii.uib.no> wrote in message news:<Pine.SOL.4.44.030123...@apal.ii.uib.no>...

> A year ago I wrote my master thesis on How to cope with incorrect HTML.
> The main part of the thesis explains how one can parse incorrect HTML in
> an orderly fashion. There is also some material on SGML->HTML link and the
> first known statistic evaluation into invalid HTML (2.5 million webpages
> validated). The statistics shows that 0.7 % of web pages on the net are
> valid (statistics also include what kinds of errors occur)
>

Oh how sweet! I am in the top 0.7%...

> All in all it should be an interesting read.
>
> The thesis is available from
> http://www.ub.uib.no/elpub/2001/h/413001/
> in pdf format
>

The irony of pdf format is staggering ;)

> and from
> http://www.ii.uib.no/~dagfinn/hfag.ps
> Postscript version (recommended)
>

> Dagfinn Parnas

Nick Kew

unread,
Jan 23, 2003, 5:44:59 PM1/23/03
to
In article <Pine.SOL.4.44.030123...@apal.ii.uib.no>, one of infinite monkeys

at the keyboard of "Dagfinn R. Parnas" <dag...@ii.uib.no> wrote:

> All in all it should be an interesting read.

Indeed, from your post I thought it might be not just interesting,
but worth linking to. Unfortunately I just get a blank grey screen
in my PDF reader. Have you considered an HTML (or plain text) version?

--
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Pete Wilson

unread,
Jan 23, 2003, 8:33:37 PM1/23/03
to
In article <b9rp0b...@jarl.webthing.com>,

Nick Kew <ni...@webthing.com> wrote:
>In article <Pine.SOL.4.44.030123...@apal.ii.uib.no>, one of infinite monkeys
> at the keyboard of "Dagfinn R. Parnas" <dag...@ii.uib.no> wrote:
>
>> All in all it should be an interesting read.
>
>Indeed, from your post I thought it might be not just interesting,
>but worth linking to. Unfortunately I just get a blank grey screen
>in my PDF reader. Have you considered an HTML (or plain text) version?

Yes, I surely agree. And did you know that Adobe offers an online
PDF-to-HTML service? Very handy:

http://www.adobe.com/products/acrobat/access_simple_form.html

It would be just great if you'd convert it, if it's not too much
trouble.
--
Pete Wilson
http://www.pwilson.net/

0 new messages