can't slice list elements from find_all('p')

190 views

Skip to first unread message

Givon

unread,

Aug 15, 2016, 11:36:32 PM8/15/16

to beautifulsoup

diagnose info:

>>> from bs4.diagnose import diagnose
>>> diagnose(soup)
Diagnostic running on Beautiful Soup 4.4.1
Python version 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330]
Found lxml version 3.5.0.0
Found html5lib version 0.999
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/diagnose.py", line 56, in diagnose
 data = data.read()
TypeError: 'NoneType' object is not callable

A bit of a newbie. I have successfully used BS4 to create a parse tree through many web pages. I was able to scrape the hrefs. Now, I want to scrape the data. I can generate a (python) list of all the paragraphs. But, I can't slice them like I could to the list of hrefs. The only difference is that when scraping the pages I used an extra line of code "section=z.get('href')" But, that doesn't work for 'p'. "z.get('p') doesn't get anything. Any suggestions welcome. Code below.

>>> from bs4 import BeautifulSoup
>>> from urllib.request import urlopen
>>> kentStart=urlopen("http://www.homeoint.org/books/kentrep/kent0000.htm").read()
>>> soup=BeautifulSoup(kentStart,"html.parser")
>>> y=soup.find_all('p')
>>> rubrics=[]
>>> for x in range(0,len(y)):
... z=soup.find_all('p')[x]
... rubrics.append(z)
>>> print(z[3:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
 return self.attrs[key]
TypeError: unhashable type: 'slice'

>>> print(z)
Copyright © 1998 MEDI-T
>>> a=z.get('p')
>>> print(a)
None

>>> print(rubrics[2])
<a href="kentmind.htm">MIND</a> <a name="P1">p. 1</a>
>>> a=rubrics[2]
>>> print(a)
<a href="kentmind.htm">MIND</a> <a name="P1">p. 1</a>
>>> print(a[0:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
 return self.attrs[key]
TypeError: unhashable type: 'slice'

Reply all

Reply to author

Forward

0 new messages