diagnose info:
>>> from bs4.diagnose import diagnose
>>> diagnose(soup)
Diagnostic running on Beautiful Soup 4.4.1
Python version 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330]
Found lxml version 3.5.0.0
Found html5lib version 0.999
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/diagnose.py", line 56, in diagnose
data = data.read()
TypeError: 'NoneType' object is not callable
A bit of a newbie. I have successfully used BS4 to create a parse tree through many web pages. I was able to scrape the hrefs. Now, I want to scrape the data. I can generate a (python) list of all the paragraphs. But, I can't slice them like I could to the list of hrefs. The only difference is that when scraping the pages I used an extra line of code "section=z.get('href')" But, that doesn't work for 'p'. "z.get('p') doesn't get anything. Any suggestions welcome. Code below.
>>> from bs4 import BeautifulSoup
>>> from urllib.request import urlopen
>>> kentStart=urlopen("
http://www.homeoint.org/books/kentrep/kent0000.htm").read()
>>> soup=BeautifulSoup(kentStart,"html.parser")
>>> y=soup.find_all('p')
>>> rubrics=[]
>>> for x in range(0,len(y)):
... z=soup.find_all('p')[x]
... rubrics.append(z)
>>> print(z[3:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
return self.attrs[key]
TypeError: unhashable type: 'slice'
>>> print(z)
<p>Copyright © 1998 MEDI-T</p>
>>> a=z.get('p')
>>> print(a)
None
>>> print(rubrics[2])
<p><a href="kentmind.htm">MIND</a><b><font color="#ff0000"> <a name="P1">p. 1</a></font></b></p>
>>> a=rubrics[2]
>>> print(a)
<p><a href="kentmind.htm">MIND</a><b><font color="#ff0000"> <a name="P1">p. 1</a></font></b></p>
>>> print(a[0:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
return self.attrs[key]
TypeError: unhashable type: 'slice'