strings generator adds semi-colon for unescaped ampersands

6 views
Skip to first unread message

Mike Ottum

unread,
Apr 26, 2017, 7:57:28 PM4/26/17
to beautifulsoup

>>> from bs4 import BeautifulSoup

>>> soup = BeautifulSoup('<p>example.com/foo?a=1&b=6</p>', 'html.parser')

>>> [s for s in soup.strings]

[u'example.com/foo?a=1&b;=6']

>>>


Note in the above example that the `strings` generator replaces the unescaped `&b` with `&b;`. This is admittedly invalid HTML, since the ampersand needs to be escaped as `&amp;`, but this sort of HTML exists in the wild quite a bit.

Expected output is:

Is this intended behavior? If so, is there any workaround?

Thanks,
Mike
Reply all
Reply to author
Forward
0 new messages