It parses HTML the way your browser does (without any simplification) and it's pretty fast. Learn more on project's github page: https://github.com/inikulin/parse5
You can also play with a Cheerio fork that uses parse5: https://github.com/inikulin/whacko
I hope it would be useful for you.