Re: Having trouble with table, tr, td elements

76 views
Skip to first unread message

cdv...@gmail.com

unread,
Oct 17, 2012, 2:59:44 AM10/17/12
to beauti...@googlegroups.com
Hello,

Somehow my mail from Oct 9th did not make it to the list. Here it is again...thanks.

On Tue, Oct 9, 2012 at 2:27 PM, Vick <cdv...@gmail.com> wrote:
Hello,

Working with a basic soup implem. This is what my html file looks like.

<html>
<head>
<script>...</script>
</head>

<body>...
<center>
<script>
<table><table></table>
</table>
<table>...</table>

<table border="1px"><tbody><tr><td><b>Date</b></td><td><b>Email</b></td><td><b>Opened</b></td><td><b>Link 1</b></td><td><b>Link 2</b></td><td><b>Link 3</b></td><td><b>MES</b></td><td><b>Evals</b></td><td><b>Pricing</b></td><td><b>Newsletter</b></td><td><b>DVD Updates</b></td><td><b>Options</b></td></tr>
<tr>
<td>2012-07-11</td>
<td>21...@msn.com</td>
<td><center><img src="./Campaign Manager 184_files/opened.png"></center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center><a name="0"></a><a href="/campaignTracking/viewCampaign.php?campaignID=184#0" onclick="showEmail(&#39;184&#39;, &#39;216...@msn.com&#39;)">Details</a> - <a target="_NEW" href="/campaignTracking/viewEmailFull.php?email=21...@msn.com">Full Details</a></center></td>
</tr>
<tr>
<td>2012-07-11</td>
<td>a....@tiscali.it</td>
<td><center><img src="./ Campaign Manager 184_files/  opened.png"></center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center> </center></td>
<td><center><a name="1"></a><a href="/campaignTracking/viewCampaign.php?campaignID=184#1" onclick="showEmail(&#39;184&#39;, &#39;a.c...@tiscali.it&#39;)">Details</a> - <a target="_NEW" href="/campaignTracking/viewEmailFull.php?email=a.c...@tiscali.it">Full Details</a></center></td>
</tr>

and many more <tr>'s (a big list) -- basically a tablee off email addresses and whetheer they were opened.


What I want to do: Count # of emails (if they were opened) for each date in the doc.

My method:  

loop thru' searching for an email address in the <td>
   if found, then 
      note date, set counter to 0
      if the next <td> contains the "opene" <img> tag, then counter++
   else
      continue looping

 My code: 
#!/usr/bin/env python

from bs4 import BeautifulSoup
import re

page = open('./184.htm')
soup=BeautifulSoup(page)
# print soup.prettify()

#for string in soup.stripped_strings:
#    print(repr(string))

srchtxt = re.compile(r'@', re.IGNORECASE)
xdate = soup.find_all("td", text=srchtxt)
print "\n", xdate[0], "\n", xdate[1]
# print "\n", xdate[0].next #  --- gives a "max recursion depth exceeded" error

# works print soup.table.tr
    
#works
#for atsym in soup.findAll("td", text=srchtxt):
    #print tr.text works
#    print atsym
        
#for eml in xdate:     # does not work
#    print xdate.previous_sibling, xdate.next_sibling 

# print xdate.td.previous_sibling
# print xdate.td.next_sibling

This script produces:  see below

As you can see, I tried several things, but no go! I am a Python newbie!
I just do not know how to read/access the content of the previous/next <td> element (using bsoup) once I have found an email address. Can you please help? Thank you very much.


Output:

<td>21...@msn.com</td> 

Traceback (most recent call last):
  File "I:\Python27\readhtml\test.py", line 17, in <module>
    print "\n", xdate[0].next, "\n", xdate[1], "\n", xdate[2]
  File "I:\Python27\lib\idlelib\rpc.py", line 595, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "I:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
    seq = self.asynccall(oid, methodname, args, kwargs)
  File "I:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
    self.putmessage((seq, request))
  File "I:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
    s = pickle.dumps(message)
  File "I:\Python27\lib\copy_reg.py", line 74, in _reduce_ex
    getstate = self.__getstate__
RuntimeError: maximum recursion depth exceeded

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beautifulsoup/-/wg1fgb4N8q4J.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.

Reply all
Reply to author
Forward
0 new messages