How to get the adress a Link/a tag refers to?

2,248 views
Skip to first unread message

SiWi

unread,
Sep 30, 2007, 8:06:13 AM9/30/07
to beautifulsoup
I have a tag like this for example:
<a href="link.html">LINKY</a>
With contents[0] I can get LINKY, but how can I get the link.html?
I searched in the documentation, but I didn´t find anything.

Tal Einat

unread,
Sep 30, 2007, 8:23:25 AM9/30/07
to beauti...@googlegroups.com
tag['href']

In general, 'contents' refers to stuff inside the tag (between the opener and closer), while the 'href' is an attribute of the tag, not part of its contents.

- Tal

SiWi

unread,
Oct 3, 2007, 5:21:05 AM10/3/07
to beautifulsoup
Ok, but then how can I get the attributes?

On Sep 30, 2:23 pm, "Tal Einat" <talei...@gmail.com> wrote:
> tag['href']
>
> In general, 'contents' refers to stuff inside the tag (between the opener
> and closer), while the 'href' is an attribute of the tag, not part of its
> contents.
>
> - Tal
>

Tal Einat

unread,
Oct 3, 2007, 6:12:22 AM10/3/07
to beauti...@googlegroups.com
On 10/3/07, SiWi <wimme...@googlemail.com> wrote:
> On Sep 30, 2:23 pm, "Tal Einat" <talei...@gmail.com> wrote:
> > tag['href']
> >
> > In general, 'contents' refers to stuff inside the tag (between the opener
> > and closer), while the 'href' is an attribute of the tag, not part of its
> > contents.
> >
> > - Tal
>
> Ok, but then how can I get the attributes?
>

I agree that this is not easy to find in the documentation. But this
is Python, so just try things out in a shell, and usually things just
work as you would expect. This is especially true for BeautifulSoup,
which has a very Pythonic interface.

Example shell session:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""<div>You <i>bet</i>
<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>
rocks!</div>""")
>>>
>>> soup('a') # equivalent to soup.findAll('a')
[<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>]
>>> soup('a')[0]
<a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>
>>> soup('a')[0].__class__ # this is a Tag object
<class BeautifulSoup.Tag at 0x014907B0>
>>> soup('a')[0]['href'] # get the 'href' attribute
u'http://www.crummy.com/software/BeautifulSoup/'
>>>

- Tal

Reply all
Reply to author
Forward
0 new messages