Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Beautiful Soup 3.2.1, and Beautiful Soup 4 beta 6
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Leonard Richardson  
View profile  
 More options Feb 16 2012, 8:49 am
From: Leonard Richardson <leona...@segfault.org>
Date: Thu, 16 Feb 2012 08:49:49 -0500
Local: Thurs, Feb 16 2012 8:49 am
Subject: Beautiful Soup 3.2.1, and Beautiful Soup 4 beta 6
Not one but two releases today. First, the first real 3.x release in
almost two years.

http://www.crummy.com/software/BeautifulSoup/bs3/download/3.x/Beautif...

This fixes a bug that can allow cross-site scripting attacks if
Beautiful Soup is used to sanitize HTML:

https://bugs.launchpad.net/beautifulsoup/+bug/868921

On output, angle brackets and bare ampersands are now escaped to XML
entities in strings. Previously they were only escaped in attribute
values. Beautiful Soup 4 escapes XML entities by default, so the
problem does not exist there unless you deliberately cause it (e.g. by
setting formatter=None).

-----

Now, on to the BS4 beta.

http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/beautif...

It's almost done at this point. All the reported bugs are fixed except
the lack of namespace support. I'd like to add that before the
release, but I don't know how much work it'll be.

Changelog:

* Multi-valued attributes like "class" always have a list of values,
  even if there's only one value in the list.

* Added a number of multi-valued attributes defined in HTML5.

* Stopped generating a space before the slash that closes an
  empty-element tag. This may come back if I add a special XHTML mode
  (http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
  useless.

* Passing text along with tag-specific arguments to a find* method:

   find("a", text="Click here")

  will find tags that contain the given text as their
  .string. Previously, the tag-specific arguments were ignored and
  only strings were searched.

* Fixed a bug that caused the html5lib tree builder to build a
  partially disconnected tree. Generally cleaned up the html5lib tree
  builder.

* If you restrict a multi-valued attribute like "class" to a string
  that contains spaces, Beautiful Soup will only consider it a match
  if the values correspond to that specific string.

That last one is implemented as a big hack, but I can remove the hack
later without changing the API.

Leonard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
leonardr  
View profile  
 More options Feb 16 2012, 9:15 am
From: leonardr <leonard.richard...@gmail.com>
Date: Thu, 16 Feb 2012 06:15:44 -0800 (PST)
Local: Thurs, Feb 16 2012 9:15 am
Subject: Re: Beautiful Soup 3.2.1, and Beautiful Soup 4 beta 6
BTW, this would be a great time to try and port your BS3 scripts to
BS4, and let me know how difficult it was and what you had to change.

Leonard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bruce Eckel  
View profile  
 More options Feb 16 2012, 3:57 pm
From: Bruce Eckel <brucetec...@gmail.com>
Date: Thu, 16 Feb 2012 13:57:14 -0700
Local: Thurs, Feb 16 2012 3:57 pm
Subject: Re: Beautiful Soup 3.2.1, and Beautiful Soup 4 beta 6

Beta 6 installed without problem using pip.

One issue I came across when running my new app with beta 6 is the lists
returned by find_all, based on the new search behavior. Here's my code:

    for tag in soup.body.find_all(True, klass):
        if type(klass) == list:
            klass = klass[0]
        tag['class'].remove(klass)

So here, klass can be either a string (I think with only a single class id
in it, right?) or a list of strings (each string with an individual class
id).

find_all will find all tags with any of those class ids in them. remove(),
however, requires that klass be a single item, not a list. In my case klass
could only be a list of one, so I pulled off the only element.

This is not a bug report, just an observation.

-- Bruce Eckel
www.Reinventing-Business.com
www.MindviewInc.com

On Thu, Feb 16, 2012 at 7:15 AM, leonardr <leonard.richard...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »