BeautifulSoup failed to run

48 views
Skip to first unread message

Andrei Uporov

unread,
Apr 22, 2023, 6:04:31 AM4/22/23
to beautifulsoup
HI Gents,

Kindly requesting your help on one problem related to running a Python script with envoking BeautifulSoup. 
Basically, I had unzipped the bs4.zip file and put the respective folder into the same folder where my .py files are stored.
When running the file, the program crashed with a huge traceback text:

Enter - http://www.dr-chuck.com/page2.htm
Traceback (most recent call last):
  File "C:\Books\py4e - Training\Exercises\Exercise12.4.py", line 18, in <module>
    soup = BeautifulSoup(html, 'html.parser')
  File "C:\Books\py4e - Training\Exercises\bs4\__init__.py", line 215, in __init__
    self._feed()
  File "C:\Books\py4e - Training\Exercises\bs4\__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "C:\Books\py4e - Training\Exercises\bs4\builder\_htmlparser.py", line 164, in feed
    parser.feed(markup)
  File "C:\Users\Наталья\AppData\Local\Programs\Python\Python311\Lib\html\parser.py", line 110, in feed
    self.goahead(0)
  File "C:\Users\Наталья\AppData\Local\Programs\Python\Python311\Lib\html\parser.py", line 170, in goahead
    k = self.parse_starttag(i)
  File "C:\Users\Наталья\AppData\Local\Programs\Python\Python311\Lib\html\parser.py", line 337, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "C:\Books\py4e - Training\Exercises\bs4\builder\_htmlparser.py", line 62, in handle_starttag
    self.soup.handle_starttag(name, None, None, attr_dict)
  File "C:\Books\py4e - Training\Exercises\bs4\__init__.py", line 404, in handle_starttag
    self.currentTag, self._most_recent_element)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 1001, in __getattr__
    return self.find(tag)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 1238, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 1259, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 516, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 1560, in __init__
    self.text = self._normalize_search_value(text)
  File "C:\Books\py4e - Training\Exercises\bs4\element.py", line 1565, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'

The code is very simple and shown below:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
if len(url)<1:
    url = 'http://data.pr4e.org'
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))


I have no idea what might have gone wrong.
Highly appreciate if you could help me on this.

leonardr

unread,
Apr 22, 2023, 7:34:38 AM4/22/23
to beautifulsoup
It looks like you installed a very old version of Beautiful Soup from the zipfile included in the course "Python For Everybody." This version won't work with newer versions of Python.

According to the response to the bug I filed against py4e about this:

The course provides a working bs4.zip as part of the downloadable course code, which is unzipped in the working code folder. It's also covered in pinned posts in the discussions. From the switch to Python 3 several years ago until Python 3.10 this has worked. From 3.10 the students are told to - pip install beautifulsoup4 and delete the unzipped bs4 folder.

As you noticed, the issue was installing and having the unzipped version present. 1 of the 4 bullet points in the pinned post is

  • make sure that you have deleted the bs4 folder if you downloaded it.
I can't see the pinned post the author mentions, since the discussions require a login, but their advice (deleting the C:\Books\py4e - Training\Exercises\bs4 directory) is what worked the last time someone wrote to this list with this problem.

Leonard

Andrei Uporov

unread,
Apr 23, 2023, 4:20:23 AM4/23/23
to beautifulsoup
Yeah, I tried to implement your recommendation and it has worked out! Thanks!
Before that, I had updated PIP to the latest. It may also have contributed to problem solving.

Reply all
Reply to author
Forward
0 new messages