`from bs4 import BeautifulSoup` ==> AttributeError: module 'html5lib.treebuilders' has no attribute '_base'

1,552 views
Skip to first unread message

Jonathan Morgan

unread,
Jul 16, 2016, 6:40:22 PM7/16/16
to beautifulsoup
Hello,

When I try to run `from bs4 import BeautifulSoup` at the ipython or python prompt, I get the following:

In [1]: from bs4 import BeautifulSoup
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-9144e80cd349> in <module>()
----> 1 from bs4 import BeautifulSoup

/home/jonathanmorgan/.virtualenvs/sourcenet3/lib/python3.5/site-packages/bs4/__init__.py in <module>()
     
28 import warnings
     
29
---> 30 from .builder import builder_registry, ParserRejectedMarkup
     
31 from .dammit import UnicodeDammit
     
32 from .element import (

/home/jonathanmorgan/.virtualenvs/sourcenet3/lib/python3.5/site-packages/bs4/builder/__init__.py in <module>()
   
312 register_treebuilders_from(_htmlparser)
   
313 try:
--> 314     from . import _html5lib
   
315     register_treebuilders_from(_html5lib)
   
316 except ImportError:

/home/jonathanmorgan/.virtualenvs/sourcenet3/lib/python3.5/site-packages/bs4/builder/_html5lib.py in <module>()
     
68
     
69
---> 70 class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
     
71
     
72     def __init__(self, soup, namespaceHTMLElements):

AttributeError: module 'html5lib.treebuilders' has no attribute '_base'



I am using Python 3.5.1 on ubuntu 16.04 in a virtualenv.

I just updated all packages in the virtualenv that were out of date, and I wasn't having this problem before I did these updates (versions of all packages installed are below).

I tried beautifulsoup4 versions 4.4.0 and 4.4.1, and html5lib versions 0.99999999, 0.999999999, and 1.0b10, and no combination of these allows `from bs4 import BeautifulSoup` to complete successfully.  I was able to get `from bs4 import BeautifulSoup` to work by uninstalling the `html5lib` package entirely, but that isn't a great answer for me long-term, since the HTML I need to parse often includes old-school markup that is best parsed by html5lib (luckily, I'm not parsing HTML currently, so this has me back in business).

Is there something I am doing wrong?  Any help or advice on how I can get this resolved will be greatly appreciated, and if I need to provide more information, please let me know!

Thanks,

Jonathan Morgan

========================================

Installed packages:
  • backports.shutil-get-terminal-size (1.0.0)
  • beautifulsoup4 (4.4.1)
  • bleach (1.4.3)
  • decorator (4.0.10)
  • Django (1.9.7)
  • django-ajax-selects (1.4.3)
  • django-taggit (0.20.2)
  • ipython (5.0.0)
  • ipython-genutils (0.1.0)
  • lxml (3.6.0)
  • nameparser (0.4.0)
  • numpy (1.11.1)
  • pandas (0.18.1)
  • path.py (8.2.1)
  • pexpect (4.2.0)
  • pickleshare (0.7.3)
  • pip (8.1.2)
  • prompt-toolkit (1.0.3)
  • psycopg2 (2.6.2)
  • ptyprocess (0.5.1)
  • py (1.4.31)
  • Pygments (2.1.3)
  • pyRserve (0.8.4)
  • pytest (2.9.2)
  • python-dateutil (2.5.3)
  • pytz (2016.6.1)
  • regex (2016.7.14)
  • requests (2.10.0)
  • scipy (0.17.1)
  • setuptools (24.0.3)
  • simplegeneric (0.8.1)
  • six (1.10.0)
  • SQLAlchemy (1.0.14)
  • traitlets (4.2.2)
  • wcwidth (0.1.7)
  • webencodings (0.5)
  • wheel (0.29.0)

Jonathan Morgan

unread,
Jul 16, 2016, 8:10:02 PM7/16/16
to beautifulsoup
A brief update - I didn't go back far enough (sorry) - with html5lib version 0.9999999 (7 nines), everything works with BeautifulSoup 4.4.1 for both Python 2 and 3.  It looks like this problem started with a change introduced in html5lib 0.99999999 (8 nines), and that is present in the html5lib 1.0 branch, as well.

More specifically, from github, looks like the only change is that the name for the file went from "_base.py" to "base.py" (in this commit from three days ago - https://github.com/html5lib/html5lib-python/commit/7bb34c7bb6e1ebddcfe70592ee072535b30cea56 - it says there were no changes other than the name).  So, fix for this is probably to just remove the underscore wherever you reference it.

Thanks,

Jon

leonardr

unread,
Jul 16, 2016, 10:35:52 PM7/16/16
to beautifulsoup
Jonathan,

Thanks for researching this. This was also filed as an issue:

https://bugs.launchpad.net/beautifulsoup/+bug/1603299

The fix will be in the next release of Beautiful Soup, within the next few days.

Leonard


On Saturday, July 16, 2016 at 6:40:22 PM UTC-4, Jonathan Morgan wrote:
Reply all
Reply to author
Forward
0 new messages