Finding only jpg and png images?

909 views
Skip to first unread message

Yansky

unread,
Jun 6, 2008, 2:24:50 AM6/6/08
to beautifulsoup
Hi, I was wondering if it was possible to search for all images on a
page and filter them by their image extension so that I only get jpg
and png files and not gif et.al.

I tried the following, but without success:

f = urllib.urlopen('http://foo.com')
b = BeautifulSoup.BeautifulSoup(f)

getStuff = b.findAll(lambda img: src.endswith('jpg' or 'jpeg' or
'png') )

Is it possible to do it that way or do I have to do it manually with
findAll('img') and then iterate through all the results and filter out
what I don't want with if statements?

Cheers.

Jim Tittsler

unread,
Jun 6, 2008, 3:57:50 AM6/6/08
to beauti...@googlegroups.com
On Fri, Jun 6, 2008 at 6:24 PM, Yansky <thego...@gmail.com> wrote:
>
> Hi, I was wondering if it was possible to search for all images on a
> page and filter them by their image extension so that I only get jpg
> and png files and not gif et.al.
>
> I tried the following, but without success:
>
> f = urllib.urlopen('http://foo.com')
> b = BeautifulSoup.BeautifulSoup(f)
>
> getStuff = b.findAll(lambda img: src.endswith('jpg' or 'jpeg' or
> 'png') )

You could use a regular expression to filter based on the src attribute:

getStuff = b.findAll('img', {'src' : re.compile(r'(jpe?g)|(png)$')})


--
Jim Tittsler http://www.OnJapan.net/ GPG: 0x01159DB6
Python Starship http://Starship.Python.net/crew/jwt/
Mailman IRC irc://irc.freenode.net/#mailman

Yansky

unread,
Jun 6, 2008, 7:59:08 AM6/6/08
to beautifulsoup
Thanks, that seemed to do the trick. :)

On Jun 6, 5:57 pm, "Jim Tittsler" <jtitts...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages