soup = BeautifulSoup(open("test.html"))
p_tags = soup.findAll('p',text=True)
for i, p_tag in enumerate(p_tags):
print str(i) + p_tag
> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>
On Sun, Dec 11, 2011 at 6:16 PM, Colemen Atwood
<coleymol...@gmail.com> wrote:
> that makes sense but i need just one link how can i get it to take the first
> link out of a body of links
>
>
> On Sun, Dec 11, 2011 at 6:33 AM, mehdi moradi <mehdi.m...@gmail.com>
> wrote:
>>
>> you need to know which tag(e.g "p" ) of HTML code of your page contain
>> "link",then some line of code that show below you can retrieve all the
>> links wan
On Sun, Dec 11, 2011 at 6:38 PM, Colemen Atwood
On Sun, Dec 11, 2011 at 7:15 PM, Colemen Atwood
import codecs,re,math
from BeautifulSoup import *
fw=codecs.open("corpora/11.txt","r","utf-8")
soup = BeautifulSoup(fw.read())
for node in soup.findAll('td', attrs={'class': 'name'}):
#find all tag of "td" tath attributes 'class="name"'
mm=str(node.next)
jj=re.sub(re.compile('/".*?...</a>',re.DOTALL ) ,""
,mm).replace('<a href="',"")
#remove parts of line that starts with '/"' and end withs
'...</a>' and '<a href="'
print (jj)
On Sun, Dec 11, 2011 at 7:22 PM, Colemen Atwood