I have a problem with the following structure:
<a>
<li></li>
<p></p>
</a>
When parsed by TagSoup becomes,
<a>
<li></li>
</a>
<p></p>
However this structure is not modified when parsed by standards like JavaFX WebKit or Chrome.
I have searched for ways to customize tagsoup parsing but it is not documented.
There's maybe something to do by overriding the html.tssl file (in tagsoup-1.2.1\src\definitions jar folder) or implementing the HTMLModels class (which could be set like this: saxParser.setProperty(Parser.schemaProperty, schema implementing HTMLModels)) ?
private class CustomHTMLSchema extends HTMLSchema
{
public CustomHTMLSchema()
{
super();
ElementType elA = getElementType("a");
elA.setModel(elA.model() | M_BLOCK);
}
}
...
saxParser = SAXParserImpl.newInstance(null);
saxParser.setProperty(Parser.schemaProperty, new CustomHTMLSchema());