Woodstox Without Validation (for HTML5)

16 views
Skip to first unread message

augus...@gmail.com

unread,
Oct 21, 2019, 1:57:31 PM10/21/19
to Woodstox User Mailing List
Hi!

I'm using Woodstox to iterate through HTML5 documents, and write to file with a script tag inserted into the body. I don't need validation, or any feature other than finding the <body> tag, and inserting <script>. Woodstox works great even though it wasn't designed for HTML. But it fails when encountering a lowercase doctype declaration, boolean attributes, unclosed tags, and other HTML5 quirks.

Is there a way to disable validation?

Tatu Saloranta

unread,
Oct 21, 2019, 2:01:15 PM10/21/19
to augus...@gmail.com, Woodstox User Mailing List
That is not technically validation (validation refers to checking XML Schema or DTD conformance), but well-formedness check.
But no, there is no way to disable that in general: it is core part of what XML parsers are required to do (validation is actually optional fwtw).

You will probably instead want to use HTML parsers like JSoup


(or NekoHTML, JTidy or other choices -> https://mvnrepository.com/tags/html )

I hope this helps.

-+ Tatu +-
 

August Nagro

unread,
Oct 21, 2019, 2:08:55 PM10/21/19
to Tatu Saloranta, Woodstox User Mailing List
Ok, thank you!
Reply all
Reply to author
Forward
0 new messages