Hi,
I'm working on a very old web app which uses JSPs. I'm using Beautiful Soup to parse the JSP files used by the web server and not parsing the dynamically rendered code sent to the browser.
BeautifulSoup's html5lib parser fails to parse the html unless I replace the JSP scriptlets denoted by <% %> and <%= %> with html comments <!-- -->. Surprisingly, html.parse is able to parse the html input without replacing <% %> with <!-- -->. Both html5lib and html.parse will parse the html when the scriptlet tags are replaced by html comments
Is there a way to extend BeautifulSoup's comment recognition logic to treat JSP scriptlet blocks the same way as html comments?
I'm using:
Beautiful Soup 4.12.3
Python 3.12.1
html5lib 1.1
lxml 5.2.1.0
Regards
Mike