can't slice list elements from find_all('p')

190 views
Skip to first unread message

Givon

unread,
Aug 15, 2016, 11:36:32 PM8/15/16
to beautifulsoup
diagnose info:

>>> from bs4.diagnose import diagnose
>>> diagnose(soup)
Diagnostic running on Beautiful Soup 4.4.1
Python version 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330]
Found lxml version 3.5.0.0
Found html5lib version 0.999
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/bs4/diagnose.py", line 56, in diagnose
    data = data.read()
TypeError: 'NoneType' object is not callable

A bit of a newbie.  I have successfully used BS4 to create a parse tree through many web pages.  I was able to scrape the hrefs.  Now, I want to scrape the data.  I can generate a (python) list of all the paragraphs.  But, I can't slice them like I could to the list of hrefs.  The only difference is that when scraping the pages I used an extra line of code "section=z.get('href')"  But, that doesn't work for 'p'.  "z.get('p') doesn't get anything.  Any suggestions welcome.  Code below.

>>> from bs4 import BeautifulSoup
>>> from urllib.request import urlopen
>>> kentStart=urlopen("http://www.homeoint.org/books/kentrep/kent0000.htm").read()
>>> soup=BeautifulSoup(kentStart,"html.parser")
>>> y=soup.find_all('p')
>>> rubrics=[]
>>> for x in range(0,len(y)):
...     z=soup.find_all('p')[x]
...     rubrics.append(z)
>>> print(z[3:1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
    return self.attrs[key]
TypeError: unhashable type: 'slice'

>>> print(z)
<p>Copyright © 1998 MEDI-T</p>
>>> a=z.get('p')
>>> print(a)
None

>>> print(rubrics[2])
<p><a href="kentmind.htm">MIND</a><b><font color="#ff0000"> <a name="P1">p. 1</a></font></b></p>
>>> a=rubrics[2]
>>> print(a)
<p><a href="kentmind.htm">MIND</a><b><font color="#ff0000"> <a name="P1">p. 1</a></font></b></p>
>>> print(a[0:1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
    return self.attrs[key]
TypeError: unhashable type: 'slice'

Reply all
Reply to author
Forward
0 new messages