Thanks, all of you, for your suggestions
What I need to do is get information from tags with specific class
values or attributes. JSoup has awesome functionality that supports
this and I compared it to some other parsers like Jericho, HTMLParser
and Validator.nu and it turns out that JSoup is faster than all of
them. NekoHTML ended up giving me a stackoverflow error. I wasn't
successful in using Apache Tika for this task, however, I think i'll
give it another go. I think I'll try using regexes and boilerpipe as
well, it will definitely will be great if any of these end up being
faster than JSoup.
Thanks,
Amrutha