Beautiful Soup 4.11.0

155 views
Skip to first unread message

leonardr

unread,
Apr 7, 2022, 7:39:20 PM4/7/22
to beautifulsoup
I've just released version 4.11.0 of Beautiful Soup. This version contains bug fixes, improvements to the warnings issued in unusual-seeming situations, and minor enhancements to existing features. This version also begins the process of modernizing the code base, now that Python 2 is no longer supported.

Changelog:

* Ported unit tests to use pytest.

* Added special string classes, RubyParenthesisString and RubyTextString,
  to make it possible to treat ruby text specially in get_text() calls.
  [bug=1941980]

* It's now possible to customize the way output is indented by
  providing a value for the 'indent' argument to the Formatter
  constructor. The 'indent' argument works very similarly to the
  argument of the same name in the Python standard library's
  json.dump() function. [bug=1955497]

* If the charset-normalizer Python module
  (https://pypi.org/project/charset-normalizer/) is installed, Beautiful
  Soup will use it to detect the character sets of incoming documents.
  This is also the module used by newer versions of the Requests library.
  For the sake of backwards compatibility, chardet and cchardet both take
  precedence if installed. [bug=1955346]

* Added a workaround for an lxml bug
  (https://bugs.launchpad.net/lxml/+bug/1948551) that causes
  problems when parsing a Unicode string beginning with BYTE ORDER MARK.
  [bug=1947768]

* Issue a warning when an HTML parser is used to parse a document that
  looks like XML but not XHTML. [bug=1939121]

* Do a better job of keeping track of namespaces as an XML document is
  parsed, so that CSS selectors that use namespaces will do the right
  thing more often. [bug=1946243]

* Some time ago, the misleadingly named "text" argument to find-type
  methods was renamed to the more accurate "string." But this supposed
  "renaming" didn't make it into important places like the method
  signatures or the docstrings. That's corrected in this
  version. "text" still works, but will give a DeprecationWarning.
  [bug=1947038]

* Fixed a crash when pickling a BeautifulSoup object that has no
  tree builder. [bug=1934003]

* Fixed a crash when overriding multi_valued_attributes and using the
  html5lib parser. [bug=1948488]

* Standardized the wording of the MarkupResemblesLocatorWarning
  warnings to omit untrusted input and make the warnings less
  judgmental about what you ought to be doing. [bug=1955450]

* Removed support for the iconv_codec library, which doesn't seem
  to exist anymore and was never put up on PyPI. (The closest
  replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
  it--it's also quite old.)
Reply all
Reply to author
Forward
0 new messages