I am from Yahoo (now Verizon Media). Our team is actively maintaining tagchowder project, which is a fork of tagsoup 1.2.1 java implementation. It is maven supported.
- speed up parser performance.
- added support to disable String.intern() using STRING_INTERNING_FEATURE feature.
- increase the parser buffer to 20000 bytes.
- initial support to fix AttributesImpl to remove O(n) lookup times, to O(1) using a
hashmap, to improve parser performance for larger files.
- fix html.tssl to support ul, ol, a html 5 tags.
- Replaced ant with maven, code check-style checks and code coverage checks.
Let us now if anyone interested and want to contribute.