[Dillo-dev] New design of tag nesting cleanup within the parser.

0 views
Skip to first unread message

Jorge Arellano Cid

unread,
Jun 21, 2016, 6:22:23 PM6/21/16
to Dillo mailing list
Hi there,

This is an important patchset that's just been committed.
There're plenty of details in the hg comments (hg log -v).

This are changes that affect the way dillo copes with Tag Soup
(i.e. bad formed HTML). The idea is to create a valid doc tree
before passing it to Dw for rendering, so yes, rendering may change.

Please check the comments, then browse your regular sites and give it
a test drive, then send some feedback.

At least in my case there were nice surprises!

--
Cheers
Jorge.-

_______________________________________________
Dillo-dev mailing list
Dill...@dillo.org
http://lists.dillo.org/cgi-bin/mailman/listinfo/dillo-dev

eocene

unread,
Jun 22, 2016, 10:20:21 AM6/22/16
to dill...@dillo.org
Jorge wrote:
> Please check the comments, then browse your regular sites and give it
> a test drive, then send some feedback.

The links to articles on arstechnica.com aren't working for me today.
The bug meter has lots of "<a> can't contain <figure>".

Nick Warne

unread,
Jun 22, 2016, 11:39:06 AM6/22/16
to dill...@dillo.org
On Wed, 22 Jun 2016 14:18:00 +0000
eocene <eoc...@gmx.com> wrote:

> Jorge wrote:
> > Please check the comments, then browse your regular sites and give
> > it a test drive, then send some feedback.
>
> The links to articles on arstechnica.com aren't working for me today.
> The bug meter has lots of "<a> can't contain <figure>".
> /lists.dillo.org/cgi-bin/mailman/listinfo/dillo-dev

Yes, same type of errors/no link works on the BBC news site:

http://www.bbc.co.uk/news

Nick
--
Gosh that takes me back... or is it forward? That's the trouble with
time travel, you never can tell."
-- Doctor Who "Androids of Tara"

eocene

unread,
Jun 22, 2016, 1:20:28 PM6/22/16
to dill...@dillo.org
Jorge wrote:
> Please check the comments, then browse your regular sites and give it
> a test drive, then send some feedback.

A fltk.org page gave the warning
"...<small> must have been closed before </td>..."
and it wasn't clear what that meant.

The src has
<td><small>some text</td>

and in that case "<small> must be closed before </td>" sounds right
to me.

Jorge Arellano Cid

unread,
Jun 22, 2016, 3:33:15 PM6/22/16
to dill...@dillo.org
On Wed, Jun 22, 2016 at 02:18:00PM +0000, eocene wrote:
> Jorge wrote:
> > Please check the comments, then browse your regular sites and give it
> > a test drive, then send some feedback.
>
> The links to articles on arstechnica.com aren't working for me today.
> The bug meter has lots of "<a> can't contain <figure>".

Yes, this is the new HTML5's "transparent" mode.

A is a transparent container, which is not yet handled by dillo.

Let me think it a bit...

--
Cheers
Jorge.-

Jorge Arellano Cid

unread,
Jun 22, 2016, 3:46:45 PM6/22/16
to dill...@dillo.org
Hi,

On Wed, Jun 22, 2016 at 05:18:12PM +0000, eocene wrote:
> Jorge wrote:
> > Please check the comments, then browse your regular sites and give it
> > a test drive, then send some feedback.

Oh, I meant the comments in hg! :-)

Anyway checking the bug meter's messages doesn't hurt.

> A fltk.org page gave the warning
> "...<small> must have been closed before </td>..."
> and it wasn't clear what that meant.
>
> The src has
> <td><small>some text</td>
>
> and in that case "<small> must be closed before </td>" sounds right
> to me.

Oh, I didn't know until now that the phrase "must have been" can
have two meanings in english.

Yes, "must be closed" is the intended meaning.

Is there a way to have the _same_ message in a softer tone?

--
Cheers
Jorge.-

eocene

unread,
Jun 22, 2016, 4:48:33 PM6/22/16
to dill...@dillo.org
Jorge wrote:
> Yes, "must be closed" is the intended meaning.
>
> Is there a way to have the _same_ message in a softer tone?

Umm...in ordinary spoken English, we'd be more likely to say
"has to be closed", but I might be missing what you're getting at.

Jorge Arellano Cid

unread,
Jun 22, 2016, 6:58:46 PM6/22/16
to dill...@dillo.org
On Wed, Jun 22, 2016 at 08:46:28PM +0000, eocene wrote:
> Jorge wrote:
> > Yes, "must be closed" is the intended meaning.
> >
> > Is there a way to have the _same_ message in a softer tone?
>
> Umm...in ordinary spoken English, we'd be more likely to say
> "has to be closed", but I might be missing what you're getting at.

I like it!

I just don't want the bug meter to sound harsh or pedantic.

This phrase sounds well to me: (*advice*)

"<small> has to be closed before </td>" // *advice*

instead of:

"<small> must be closed before </td>" // *reproach*


If that's the correct interpretation of meaning, I'd prefer to
use the "has to be closed" expression.


--
Cheers
Jorge.-

eocene

unread,
Jun 22, 2016, 7:17:54 PM6/22/16
to dill...@dillo.org
Jorge wrote:
> I just don't want the bug meter to sound harsh or pedantic.
>
> This phrase sounds well to me: (*advice*)
>
> "<small> has to be closed before </td>" // *advice*
>
> instead of:
>
> "<small> must be closed before </td>" // *reproach*
>
>
> If that's the correct interpretation of meaning, I'd prefer to
> use the "has to be closed" expression.

Yet another possibility is: "needs to be closed".

Jorge Arellano Cid

unread,
Jun 22, 2016, 7:34:07 PM6/22/16
to dill...@dillo.org
On Wed, Jun 22, 2016 at 03:30:30PM -0400, Jorge Arellano Cid wrote:
> On Wed, Jun 22, 2016 at 02:18:00PM +0000, eocene wrote:
> > Jorge wrote:
> > > Please check the comments, then browse your regular sites and give it
> > > a test drive, then send some feedback.
> >
> > The links to articles on arstechnica.com aren't working for me today.
> > The bug meter has lots of "<a> can't contain <figure>".
>
> Yes, this is the new HTML5's "transparent" mode.
>
> A is a transparent container, which is not yet handled by dillo.
>
> Let me think it a bit...

OK, we don't yet have HTML5 handling in dillo (besides ad-hoc),
but this new design is simpler to tailor to a new set of rules, so
that's the way to go.

Although it'll take some time, incremental patches will give a better
HTML5 experience step by step.

Just as a quick and dirty patch so arstecnica, BBC and others may
be read before I come with something more structured, you may try:

diff -r ca794cd4c182 src/html.cc
--- a/src/html.cc Tue Jun 21 18:04:00 2016 -0400
+++ b/src/html.cc Wed Jun 22 19:30:15 2016 -0400
@@ -3735,6 +3735,8 @@ static void Html_stack_cleanup_at_open(D
continue; // close
} else if (!(Tags[ni].Flags & 1) && !(Tags[ti].Flags & 4)) {
// Block element over a NON block container
+ if (ti == i_A && html->DocTypeVersion >= 5.0f)
+ break;
continue; // close
}


HTH.

Jorge Arellano Cid

unread,
Jun 23, 2016, 12:48:55 AM6/23/16
to dill...@dillo.org
On Wed, Jun 22, 2016 at 11:15:36PM +0000, eocene wrote:
> Jorge wrote:
> > I just don't want the bug meter to sound harsh or pedantic.
> >
> > This phrase sounds well to me: (*advice*)
> >
> > "<small> has to be closed before </td>" // *advice*
> >
> > instead of:
> >
> > "<small> must be closed before </td>" // *reproach*
> >
> >
> > If that's the correct interpretation of meaning, I'd prefer to
> > use the "has to be closed" expression.
>
> Yet another possibility is: "needs to be closed".

It also sounds OK to me.

Which one is better?

--
Cheers
Jorge.-

Johannes Hofmann

unread,
Jun 23, 2016, 8:59:59 AM6/23/16
to dill...@dillo.org
Hi Jorge,

it seems to fix too narrow forum posts on
http://forum.xda-developers.com/ - very nice!

Cheers,
Johannes

eocene

unread,
Jun 23, 2016, 9:02:41 AM6/23/16
to dill...@dillo.org
Jorge wrote:
> On Wed, Jun 22, 2016 at 11:15:36PM +0000, eocene wrote:
> > Yet another possibility is: "needs to be closed".
>
> It also sounds OK to me.
>
> Which one is better?

I would choose 'needs'.

Ben Woolley

unread,
Jun 23, 2016, 10:23:56 AM6/23/16
to eocene, dill...@dillo.org


> On Jun 23, 2016, at 9:00 AM, eocene <eoc...@gmx.com> wrote:
>
> Jorge wrote:
>>> On Wed, Jun 22, 2016 at 11:15:36PM +0000, eocene wrote:
>>> Yet another possibility is: "needs to be closed".
>>
>> It also sounds OK to me.
>>
>> Which one is better?
>
> I would choose 'needs'.
>

It seems that compilers use the word "expected" to connote a softer tone, like it is just the way that they interpret it. "Expected <small> to be closed" would be the way to say it in that way. "Needs" implies that you need to do it in any case, but "expected" implies that you just need to do it to fix this particular case.

However, I do like using "needs" for the bug meter, because it is a bug meter, not a warning message. It is there to tell you what to fix. :)

Nick Warne

unread,
Jun 23, 2016, 12:59:17 PM6/23/16
to dill...@dillo.org
On Wed, 22 Jun 2016 19:30:51 -0400
Jorge Arellano Cid <jc...@dillo.org> wrote:

> Just as a quick and dirty patch so arstecnica, BBC and others may
> be read before I come with something more structured, you may try:
>
> diff -r ca794cd4c182 src/html.cc
> --- a/src/html.cc Tue Jun 21 18:04:00 2016 -0400
> +++ b/src/html.cc Wed Jun 22 19:30:15 2016 -0400
> @@ -3735,6 +3735,8 @@ static void Html_stack_cleanup_at_open(D
> continue; // close
> } else if (!(Tags[ni].Flags & 1) && !(Tags[ti].Flags & 4)) {
> // Block element over a NON block container
> + if (ti == i_A && html->DocTypeVersion >= 5.0f)
> + break;
> continue; // close
> }
>

Thanks, that works fine.

Nick
--
Gosh that takes me back... or is it forward? That's the trouble with
time travel, you never can tell."
-- Doctor Who "Androids of Tara"

Jorge Arellano Cid

unread,
Jun 24, 2016, 11:15:08 AM6/24/16
to dill...@dillo.org
On Thu, Jun 23, 2016 at 02:57:40PM +0200, Johannes Hofmann wrote:
> Hi Jorge,
>
> it seems to fix too narrow forum posts on
> http://forum.xda-developers.com/ - very nice!

Good!

That's what I was expecting...

The parser is tuned for HTML 4.01 now, and as these days I see
lots of sites using HTML5, most probably we'll see more cases
of improved rendering as the parser is made aware of HTML5-specific
rules.

The new design is simpler to customize as it has a more
definite separation between HTML rules and parsing decisions.

Jorge Arellano Cid

unread,
Jun 24, 2016, 11:34:01 AM6/24/16
to dill...@dillo.org
On Thu, Jun 23, 2016 at 10:10:42AM -0400, Ben Woolley wrote:
>
>
> > On Jun 23, 2016, at 9:00 AM, eocene <eoc...@gmx.com> wrote:
> >
> > Jorge wrote:
> >>> On Wed, Jun 22, 2016 at 11:15:36PM +0000, eocene wrote:
> >>> Yet another possibility is: "needs to be closed".
> >>
> >> It also sounds OK to me.
> >>
> >> Which one is better?
> >
> > I would choose 'needs'.
> >
>
> It seems that compilers use the word "expected" to connote a
> softer tone, like it is just the way that they interpret it.
> "Expected <small> to be closed" would be the way to say it in
> that way. "Needs" implies that you need to do it in any case,
> but "expected" implies that you just need to do it to fix this
> particular case.
>
> However, I do like using "needs" for the bug meter, because it
> is a bug meter, not a warning message. It is there to tell you
> what to fix. :)

Thanks for the feedback.

Committed.

--
Cheers
Jorge.-

Jorge Arellano Cid

unread,
Jun 24, 2016, 11:35:01 AM6/24/16
to dill...@dillo.org
On Thu, Jun 23, 2016 at 05:57:05PM +0100, Nick Warne wrote:
> On Wed, 22 Jun 2016 19:30:51 -0400
> Jorge Arellano Cid <jc...@dillo.org> wrote:
>
> > Just as a quick and dirty patch so arstecnica, BBC and others may
> > be read before I come with something more structured, you may try:
> >
> > diff -r ca794cd4c182 src/html.cc
> > --- a/src/html.cc Tue Jun 21 18:04:00 2016 -0400
> > +++ b/src/html.cc Wed Jun 22 19:30:15 2016 -0400
> > @@ -3735,6 +3735,8 @@ static void Html_stack_cleanup_at_open(D
> > continue; // close
> > } else if (!(Tags[ni].Flags & 1) && !(Tags[ti].Flags & 4)) {
> > // Block element over a NON block container
> > + if (ti == i_A && html->DocTypeVersion >= 5.0f)
> > + break;
> > continue; // close
> > }
> >
>
> Thanks, that works fine.

Committed.

--
Cheers
Jorge.-

Nick Warne

unread,
Jun 24, 2016, 2:07:34 PM6/24/16
to dill...@dillo.org
On Fri, 24 Jun 2016 11:32:04 -0400
Jorge Arellano Cid <jc...@dillo.org> wrote:

> On Thu, Jun 23, 2016 at 05:57:05PM +0100, Nick Warne wrote:
> > On Wed, 22 Jun 2016 19:30:51 -0400
> > Jorge Arellano Cid <jc...@dillo.org> wrote:
> >
> > > Just as a quick and dirty patch so arstecnica, BBC and others
> > > may be read before I come with something more structured, you may
> > > try:
> > >
> > > diff -r ca794cd4c182 src/html.cc
> > > --- a/src/html.cc Tue Jun 21 18:04:00 2016 -0400
> > > +++ b/src/html.cc Wed Jun 22 19:30:15 2016 -0400
> > > @@ -3735,6 +3735,8 @@ static void Html_stack_cleanup_at_open(D
> > > continue; // close
> > > } else if (!(Tags[ni].Flags & 1) && !(Tags[ti].Flags & 4))
> > > { // Block element over a NON block container
> > > + if (ti == i_A && html->DocTypeVersion >= 5.0f)
> > > + break;
> > > continue; // close
> > > }
> > >
> >
> > Thanks, that works fine.
>
> Committed.
>

Thanks Jorge - just in time to view the BBC news site for the best news
I have heard in over 50 years :)

Nick
--
Gosh that takes me back... or is it forward? That's the trouble with
time travel, you never can tell."
-- Doctor Who "Androids of Tara"

Reply all
Reply to author
Forward
0 new messages