CSS-Style selectors in BS4

284 views
Skip to first unread message

Mike Axiak

unread,
Feb 3, 2012, 7:38:28 PM2/3/12
to beautifulsoup
Hey all,

I apologize if this has been asked before -- I didn't find it
searching the google group. I'm wondering if you guys have considered
shipping with a selector based searching function, such as the one
soupselect package [1]. I think its merits are pretty self obvious,
and I guess the question is how far you want to go in that direction
now that less emphasis is put on parsing.

Thanks for writing such a wonderful library!

Cheers,
Mike


1: http://code.google.com/p/soupselect/

Leonard Richardson

unread,
Feb 8, 2012, 1:43:41 PM2/8/12
to beauti...@googlegroups.com
I'm not going to implement CSS selectors in Beautiful Soup, because
other parsers already support them.

Leonard

> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>

Derek Litz

unread,
Feb 8, 2012, 11:09:45 PM2/8/12
to beautifulsoup
Will you provide a standard interface to CSS selectors to when the
underlying parser supports them?

Leonard Richardson

unread,
Feb 9, 2012, 7:55:26 AM2/9/12
to beauti...@googlegroups.com
> Will you provide a standard interface to CSS selectors to when the
> underlying parser supports them?

That's a good question. If it's possible, I'll do it, but I don't
think it's possible. By the time the document is parsed, the parser
has given up control. lxml knows how to run CSS selectors on its own
DOM objects, but it doesn't know how to run CSS selectors on BS
objects.

Basically, it's a big new feature and I don't want to add any more big
new features to Beautiful Soup. But, I just looked at soupselect, and
if it's as simple as that, I might just add it.

Leonard

Simon Willison

unread,
Feb 29, 2012, 10:04:31 AM2/29/12
to beautifulsoup
On Feb 9, 12:55 pm, Leonard Richardson <leona...@segfault.org> wrote:
> Basically, it's a big new feature and I don't want to add any more big
> new features to Beautiful Soup. But, I just looked atsoupselect, and
> if it's as simple as that, I might just add it.

I just popped on to the mailing list to ask about soupselect, so it's
neat to see there's already a recent thread.

The most recent version of soupselect is here: https://github.com/simonw/soupselect
- I don't actively maintain it (though I did merge a pull request a
few days ago).

I've always wanted a CSS selector implementation for Python that can
work against a bunch of different backends - parsing HTML with CSS
selectors is incredibly useful, and it's the thing I've always missed
when working with alternative scraping libraries such as scrapy (which
uses XPath - not nearly as intuitive in my opinion).

I was going to suggest that BeautifulSoup 4 might be the right library
for that kind of work to take place, since it's now a wrapper that
sits in front of the lxml and html5lib. I understand completely if
that's not something you would want to maintain in the project though.

That said, if you're interested I'm happy to donate soupselect to the
BeautifulSoup project under whatever license would be the best fit.

Thanks,

Simon Willison
Reply all
Reply to author
Forward
0 new messages