Beautiful Soup 4.14.3

96 views

Skip to first unread message

leonardr

unread,

Nov 30, 2025, 10:15:54 AM11/30/25

to beautifulsoup

= 4.14.3 (20251130)

* When using one of the lxml tree builders, you can pass in
huge_tree=True to disable lxml's security restrictions and process
files that include huge text nodes. ("huge" means more than
10,000,000 bytes of text in a single node). Without this, lxml may
silently stop processing the file after encountering a huge text
node. [bug=2072424]

* The html.parser tree builder processes numeric character entities
using the algorithm described in the HTML spec. If this means
replacing some other character with REPLACEMENT CHARACTER, it will
set BeautifulSoup.contains_replacement_characters. [bug=2126753]

The other tree builders rely on the underlying parser to do this
sort of replacement. That means that Beautiful Soup never sees the
original character reference, so it doesn't know whether
REPLACEMENT_CHARACTER was the original content; therefore
the html.parser tree builder will set contains_replacement_characters in
situations where the other tree builders won't.

* Added a general test of the html.parser tree builder's ability to
turn any parsing exception from html.parser into a
ParserRejectedMarkup exception. This makes it possible to remove
version-dependent tests that depended on the existence of specific
bugs in html.parser. [bug=2121335,2121335]

Chris Papademetrious

unread,

Nov 30, 2025, 3:16:48 PM11/30/25

to beautifulsoup

Thank you Leonard! It is great to see Beautiful Soup continue to chug along and crank out the improvements release after release.