leonardr
unread,Nov 30, 2025, 10:15:54 AM (3 days ago) Nov 30Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to beautifulsoup
= 4.14.3 (20251130)
* When using one of the lxml tree builders, you can pass in
huge_tree=True to disable lxml's security restrictions and process
files that include huge text nodes. ("huge" means more than
10,000,000 bytes of text in a single node). Without this, lxml may
silently stop processing the file after encountering a huge text
node. [bug=2072424]
* The html.parser tree builder processes numeric character entities
using the algorithm described in the HTML spec. If this means
replacing some other character with REPLACEMENT CHARACTER, it will
set BeautifulSoup.contains_replacement_characters. [bug=2126753]
The other tree builders rely on the underlying parser to do this
sort of replacement. That means that Beautiful Soup never sees the
original character reference, so it doesn't know whether
REPLACEMENT_CHARACTER was the original content; therefore
the html.parser tree builder will set contains_replacement_characters in
situations where the other tree builders won't.
* Added a general test of the html.parser tree builder's ability to
turn any parsing exception from html.parser into a
ParserRejectedMarkup exception. This makes it possible to remove
version-dependent tests that depended on the existence of specific
bugs in html.parser. [bug=2121335,2121335]