On Sep 30, 2:23 pm, "Tal Einat" <talei...@gmail.com> wrote:
> tag['href']
>
> In general, 'contents' refers to stuff inside the tag (between the opener
> and closer), while the 'href' is an attribute of the tag, not part of its
> contents.
>
> - Tal
>
I agree that this is not easy to find in the documentation. But this
is Python, so just try things out in a shell, and usually things just
work as you would expect. This is especially true for BeautifulSoup,
which has a very Pythonic interface.
Example shell session:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""<div>You <i>bet</i>
<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>
rocks!</div>""")
>>>
>>> soup('a') # equivalent to soup.findAll('a')
[<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>]
>>> soup('a')[0]
<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>
>>> soup('a')[0].__class__ # this is a Tag object
<class BeautifulSoup.Tag at 0x014907B0>
>>> soup('a')[0]['href'] # get the 'href' attribute
u'http://www.crummy.com/software/BeautifulSoup/'
>>>
- Tal