<p> inside <a> parsed outside <a> - how customize TagSoup parsing ?

58 views
Skip to first unread message

reyz

unread,
Nov 8, 2015, 8:13:09 AM11/8/15
to tagsoup-friends

I have a problem with the following structure:


<a>
  <li></li>
  <p></p>
</a>


When parsed by TagSoup becomes,


<a>
  <li></li>
</a>
<p></p>


However this structure is not modified when parsed by standards like JavaFX WebKit or Chrome.

I have searched for ways to customize tagsoup parsing but it is not documented.


There's maybe something to do by overriding the html.tssl file (in tagsoup-1.2.1\src\definitions jar folder) or implementing the HTMLModels class (which could be set like this: saxParser.setProperty(Parser.schemaProperty, schema implementing HTMLModels)) ?

John Cowan

unread,
Nov 8, 2015, 8:48:02 AM11/8/15
to reyz, tagsoup-friends
reyz scripsit:

> There's maybe something to do by overriding the html.tssl file (in
> tagsoup-1.2.1\src\definitions jar folder)

Indeed, that's what you want. Modify html.tssl so that the description
of the "a" element involves whatever groups of child elements you wish
to include. You can create new groups if you are careful not to
overrun the limit of 32 groups.

--
John Cowan http://www.ccil.org/~cowan co...@ccil.org
This great college [Trinity], of this ancient university [Cambridge],
has seen some strange sights. It has seen Wordsworth drunk and Porson
sober. And here am I, a better poet than Porson, and a better scholar
than Wordsworth, somewhere betwixt and between. --A.E. Housman

Raymooz

unread,
Nov 8, 2015, 11:14:23 AM11/8/15
to John Cowan, tagsoup-friends
Thanks for your answer.
I finally found a solution inside my code by extending HTMLSchema:
private class CustomHTMLSchema extends HTMLSchema
{
    public CustomHTMLSchema()
    {
        super();
        ElementType elA = getElementType("a");
        elA.setModel(elA.model() | M_BLOCK);
    }
}

...

saxParser = SAXParserImpl.newInstance(null);
saxParser.setProperty(Parser.schemaProperty, new CustomHTMLSchema());
reyz
Reply all
Reply to author
Forward
0 new messages