Cannot use lxml in bs4

59 views
Skip to first unread message

Alexey Kuntsevich

unread,
May 3, 2023, 7:28:17 AM5/3/23
to beautifulsoup
lxml and bs4 are installed, bs4 works for html, but not lxml. I can import&use lxml as a standalone library nevertheless. Installed from pip, tried force reinstalling. Could anybody help pls? 

Here's the ipython session outputs:

Python 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:26:08) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import lxml

In [2]: from bs4 import BeautifulSoup

In [3]: BeautifulSoup("", features="xml")
---------------------------------------------------------------------------
FeatureNotFound                           Traceback (most recent call last)
Cell In[3], line 1
----> 1 BeautifulSoup("", features="xml")

File /opt/homebrew/Caskroom/miniforge/base/envs/pyds310/lib/python3.10/site-packages/bs4/__init__.py:250, in BeautifulSoup.__init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs)
    248     builder_class = builder_registry.lookup(*features)
    249     if builder_class is None:
--> 250         raise FeatureNotFound(
    251             "Couldn't find a tree builder with the features you "
    252             "requested: %s. Do you need to install a parser library?"
    253             % ",".join(features))
    255 # At this point either we have a TreeBuilder instance in
    256 # builder, or we have a builder_class that we can instantiate
    257 # with the remaining **kwargs.
    258 if builder is None:

FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?

In [4]: lxml.__version__
Out[4]: '4.9.2'

In [5]: import bs4

In [6]: bs4.__version__
Out[6]: '4.12.2'

leonardr

unread,
May 3, 2023, 8:02:22 AM5/3/23
to beautifulsoup
This might be an issue specific to your conda environment.

Here are two things to try and report back:

from bs4.builder import _lxml

If this raises an ImportError, whatever's causing the ImportError is your underlying problem.

from bs4 import builder
print(builder.builder_registry.builders_for_feature)

If lxml is being imported but the LXML tree builders aren't being registered for some reason, the contents of this dictionary might provide a clue.

Leonard

Alexey Kuntsevich

unread,
May 3, 2023, 8:49:25 AM5/3/23
to beautifulsoup
The first import check helped, many thanks.

Details just in case:

Somehow the etree library binary was built for x86_64, but arm64 was required (the message was dlopen(/opt/homebrew/Caskroom/miniforge/base/envs/pyds310/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so, 0x0002): tried: '/opt/homebrew/Caskroom/miniforge/base/envs/pyds310/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/Caskroom/miniforge/base/envs/pyds310/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/pyds310/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')))

I went this route https://stackoverflow.com/questions/71737316/problems-installing-lxml-on-m1-mac, it didn't help at first, the problem was deeper in the environment. My guess was that iTerm2 installation that I had was there since before apple silicon version, and was potentially affecting the build process. Switching to the apple silicon iTerm2, recreating the environment+rebuilding lxml according to the SO link helped. Just please don't do 'sudo pip install' like they advise there ;)

Alexey

On Wednesday, May 3, 2023 at 2:02:22 PM UTC+2 leonardr wrote:
This might be an issue specific to your conda environment.

Here are two things to try and report back:

from bs4.builder import _lxml

If this raises an ImportError, whatever's causing the ImportError is your underlying problem.

from bs4 import builder
print(builder.builder_registry.builders_for_feature)

If lxml is being imported but the LXML tree builders aren't being registered for some reason, the contents of this dictionary might provide a clue.

Leonard
Reply all
Reply to author
Forward
0 new messages