HTML parser Jericho need replacing?

2 views
Skip to first unread message

jimbo

unread,
Apr 5, 2022, 11:21:30 AM4/5/22
to Group: okapi-devel

Jericho used by our HTML, XMLStream and AbstractMarkup filters hasn't been updated since 2015. It may be time to refactor with a newer parser.

Jsoup looks like a good alternative. Jericho has a stream based parser, which is nice for large XML files. But with memory cheap these days and the XML filter being DOM based is this an issue?

Any other modern HTML parsing alternatives?

I don't think I would tackle this any time soon. And definitely not this release which should be light on changes as it is mostly to get out the Java 11 support.

Jim

Reply all
Reply to author
Forward
0 new messages