Structured HTML

Roedy Green

unread,

Nov 20, 2011, 11:19:26 PM11/20/11

to

jTextArea.setText( someHTML ); takes FOREVER for an even moderately
long document.

It is parsing the HTML and turning it into some sort of tree structure
for rendering.

It would be nice if:

1. you could pre-parse the HTML and feed it the digested tree quickly
to the JTextArea.

2. There were methods you could use to build a tree directly without
going through HTML. The whole process of composing and rendering
complex documents would be much faster.

3. Some browsers were taught to eat this stuff and render compact
compressed pre-parsed pages very quickly.

4. Gradually text-HTML pages would disappear to be replaced by such
trees that CAN'T have syntax errors, at least not ones caused by
webdesigner error.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I can't come to bed just yet. Somebody is wrong on the Internet.

Silvio Bierman

unread,

Nov 21, 2011, 3:54:58 AM11/21/11

to

On 11/21/2011 05:19 AM, Roedy Green wrote:
> jTextArea.setText( someHTML ); takes FOREVER for an even moderately
> long document.
>
> It is parsing the HTML and turning it into some sort of tree structure
> for rendering.
>
> It would be nice if:
>
> 1. you could pre-parse the HTML and feed it the digested tree quickly
> to the JTextArea.
>
> 2. There were methods you could use to build a tree directly without
> going through HTML. The whole process of composing and rendering
> complex documents would be much faster.
>
> 3. Some browsers were taught to eat this stuff and render compact
> compressed pre-parsed pages very quickly.
>
> 4. Gradually text-HTML pages would disappear to be replaced by such
> trees that CAN'T have syntax errors, at least not ones caused by
> webdesigner error.

Parsing HTML is hardly a very time-consuming action in general. In my
experience using something like nu.validator.htmlparser.sax.HtmlParser
allows parsing of HTML about as quickly as one could parse XML.

JTextArea HTML rendering has been broken from the beginning. I haven't
used it for a couple of years now but I don't remember reading anywhere
that lots of effort has been put into this lately.

I would guess that most of the time is spent rendering the HTML, not
parsing it. Especially when the HTML markup contains a bunch of (nested)
tables it can easily bring a poorly written renderer to a grinding halt.
This description certainly fits JTextArea as I remember it.

I hope you are not advocating a regression from text-based protocols
back into binary crap with your last point? All things taken into
consideration XHTML + CSS is pretty optimal for the task at hand.

Roedy Green

unread,

Nov 21, 2011, 5:10:06 PM11/21/11

to

On Mon, 21 Nov 2011 09:54:58 +0100, Silvio Bierman <sil...@moc.com>
wrote, quoted or indirectly quoted someone who said :

>I would guess that most of the time is spent rendering the HTML, not
>parsing it. Especially when the HTML markup contains a bunch of (nested)
>tables it can easily bring a poorly written renderer to a grinding halt.
>This description certainly fits JTextArea as I remember it.

The setText is what takes all the time. The repaint is pretty quick.

What the heck is it doing? All it has to do is create a parse tree,
and possibly decide the locations where it will render the various
tokens.

Has anyone defined a predigested web page format? Perhaps a new
component like JTextArea that understood it would be a way to get
fancier displays in decent time

Silvio Bierman

unread,

Nov 21, 2011, 5:17:37 PM11/21/11

to

On 11/21/2011 11:10 PM, Roedy Green wrote:
> On Mon, 21 Nov 2011 09:54:58 +0100, Silvio Bierman<sil...@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> I would guess that most of the time is spent rendering the HTML, not
>> parsing it. Especially when the HTML markup contains a bunch of (nested)
>> tables it can easily bring a poorly written renderer to a grinding halt.
>> This description certainly fits JTextArea as I remember it.
>
> The setText is what takes all the time. The repaint is pretty quick.
>

That does not have to mean anything. During setText it might do both
parsing and rendering. Considering how Swing repaints work this is the
most likely method.

> What the heck is it doing? All it has to do is create a parse tree,
> and possibly decide the locations where it will render the various
> tokens.

Yes, deciding the locations is the HTML rendering part. It is a quite
complex algorithm, even if implemented in a non-standard conforming way
like in JTextArea.

>
> Has anyone defined a predigested web page format? Perhaps a new
> component like JTextArea that understood it would be a way to get
> fancier displays in decent time
> .

PDF, SVG and even PostScript would fit the bill...

Roedy Green

unread,

Nov 21, 2011, 6:57:26 PM11/21/11

to

On Mon, 21 Nov 2011 23:17:37 +0100, Silvio Bierman <sil...@moc.com>

wrote, quoted or indirectly quoted someone who said :

>PDF, SVG and even PostScript would fit the bill...

Early Java days promised Display Postscript. Then all mention of it
vanished without comment. Steve Jobs wanted to use it as the what the
display driver ate. Maybe the problem was Adobe could never get it
fast enough for word processing.