how do i get beautiful soup to bring back a link

36 views
Skip to first unread message

Colemen

unread,
Dec 11, 2011, 5:16:24 AM12/11/11
to beautifulsoup
hey!
i just need beautiful soup to retrieve a link from site can someone
describe to me the basics of using beautiful soup?

mehdi moradi

unread,
Dec 11, 2011, 7:33:18 AM12/11/11
to beauti...@googlegroups.com
you need to know which tag(e.g "p" ) of HTML code of your page contain
"link",then some line of code that show below you can retrieve all the
links
p

soup = BeautifulSoup(open("test.html"))
p_tags = soup.findAll('p',text=True)
for i, p_tag in enumerate(p_tags):
print str(i) + p_tag

> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>

Colemen Atwood

unread,
Dec 11, 2011, 9:46:44 AM12/11/11
to beauti...@googlegroups.com
that makes sense but i need just one link how can i get it to take the first link out of a body of links

mehdi moradi

unread,
Dec 11, 2011, 9:50:07 AM12/11/11
to beauti...@googlegroups.com
send me your page code(html) and specify link that you want to retrieve

On Sun, Dec 11, 2011 at 6:16 PM, Colemen Atwood
<coleymol...@gmail.com> wrote:
> that makes sense but i need just one link how can i get it to take the first
> link out of a body of links
>
>
> On Sun, Dec 11, 2011 at 6:33 AM, mehdi moradi <mehdi.m...@gmail.com>
> wrote:
>>
>> you need to know which tag(e.g "p" ) of HTML code of your page contain
>> "link",then some line of code that show below you can retrieve all the

>> links wan

Colemen Atwood

unread,
Dec 11, 2011, 10:08:24 AM12/11/11
to beauti...@googlegroups.com
<tr>
<td class="type"><a href="/sub/2/0/">Divx/Xvid</a></td>
<td class="name"><a href="/torrent/255004/The-Hangover-2-2011-DVDRip-XviD-MAXSPEED/" class="icon commented" >The Hangover 2 (2011) DVDRip XviD-MAXSPEED</a></td>
<td class="seeds">23631</td>
<td class="leeches">34022</td>
<td class="completed">108872</td>
<td class="size">1.47GB</td>
<td class="health"><div class="bar"><div class="inline health" style="width: 41%;"></div></div></td>
<td class="author"><a href="/user/piratepedia/" class="icon VIP">piratepedia</a></td>
</tr>


/torrent/255004/The-Hangover-2-2011-DVDRip-XviD-MAXSPEED/
i need that link but its a variable because its based on user input  and need that exact position but what the title is doesnt matter
and out of curiosity is there anyway i could get the seeds and leeches aswell?

mehdi moradi

unread,
Dec 11, 2011, 10:37:45 AM12/11/11
to beauti...@googlegroups.com
ya there?
send main page me url?

On Sun, Dec 11, 2011 at 6:38 PM, Colemen Atwood

Colemen Atwood

unread,
Dec 11, 2011, 10:45:37 AM12/11/11
to beauti...@googlegroups.com
<tr>
<td class="type"><a href="/sub/2/0/">Divx/Xvid</a></td>
<td class="name"><a href="/torrent/255004/The-Hangover-2-2011-DVDRip-XviD-MAXSPEED/" class="icon commented" >The Hangover 2 (2011) DVDRip XviD-MAXSPEED</a></td>
<td class="seeds">23631</td>
<td class="leeches">34022</td>
<td class="completed">108872</td>
<td class="size">1.47GB</td>
<td class="health"><div class="bar"><div class="inline health" style="width: 41%;"></div></div></td>
<td class="author"><a href="/user/piratepedia/" class="icon VIP">piratepedia</a></td>
</tr>
"/torrent/255004/The-Hangover-2-2011-DVDRip-XviD-MAXSPEED/"
will this only bring this specific link or will it work on other pages with other links in its page?
and is there any way you know of that i could get the seeds and leeches?

mehdi moradi

unread,
Dec 11, 2011, 10:49:30 AM12/11/11
to beauti...@googlegroups.com
send me web page adress http:///.....................?


On Sun, Dec 11, 2011 at 7:15 PM, Colemen Atwood

Colemen Atwood

unread,
Dec 11, 2011, 10:52:20 AM12/11/11
to beauti...@googlegroups.com
http://1337x.org/search/the%20hangover/0/
im sorry if im an annoyance i just dont understand this portion everything else ive figured out but this is odd

mehdi moradi

unread,
Dec 11, 2011, 2:32:51 PM12/11/11
to beauti...@googlegroups.com
# -*- coding: utf-8 -*-

import codecs,re,math
from BeautifulSoup import *


fw=codecs.open("corpora/11.txt","r","utf-8")

soup = BeautifulSoup(fw.read())

for node in soup.findAll('td', attrs={'class': 'name'}):
#find all tag of "td" tath attributes 'class="name"'
mm=str(node.next)
jj=re.sub(re.compile('/".*?...</a>',re.DOTALL ) ,""
,mm).replace('<a href="',"")
#remove parts of line that starts with '/"' and end withs
'...</a>' and '<a href="'
print (jj)


On Sun, Dec 11, 2011 at 7:22 PM, Colemen Atwood

Reply all
Reply to author
Forward
0 new messages