Stuck on beautifulsoup error when using cron

177 views
Skip to first unread message

Alessandro Mantovan

unread,
May 19, 2021, 7:30:56 AM5/19/21
to beautifulsoup
Hello all,

I am working at a project which involves the use of beautifulsoup4 with python3.7.3.

page_source = requests.get('http://www.centrometeolombardo.com/content.asp?CatId=332&ContentType=Dati').text
soup = BeautifulSoup(page_source, 'lxml')
a = soup.select('table > tr:has(> td > a:-soup-contains("Bollate")) td')

Now this is part of the script. It runs from terminal.

When I launch this script with cron I get the following message:

Traceback (most recent call last):
  File "/home/pi/Documents/TwitterBot/main.py", line 17, in <module>
    a = soup.select('table > tr:has(> td > a:-soup-contains("Bollate")) td')
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1376, in select
    return soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "/usr/lib/python3/dist-packages/soupsieve/__init__.py", line 112, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "/usr/lib/python3/dist-packages/soupsieve/__init__.py", line 63, in compile
    return cp._cached_css_compile(pattern, namespaces, custom, flags)
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 205, in _cached_css_compile
    CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 1010, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 852, in parse_selectors
    has_selector, is_html = self.parse_pseudo_class(sel, m, has_selector, iselector, is_html)
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 510, in parse_pseudo_class
    has_selector = self.parse_pseudo_open(sel, pseudo, has_selector, iselector, m.end(0))
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 654, in parse_pseudo_open
    sel.selectors.append(self.parse_selectors(iselector, index, flags))
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 852, in parse_selectors
    has_selector, is_html = self.parse_pseudo_class(sel, m, has_selector, iselector, is_html)
  File "/usr/lib/python3/dist-packages/soupsieve/css_parser.py", line 585, in parse_pseudo_class
    "'{}' pseudo-class is not implemented at this time".format(pseudo)
NotImplementedError: ':-soup-contains' pseudo-class is not implemented at this time

Can you help me intrepreting this error message? I guess that understanding what is the root cause of the problem would help me instead of trying to guess as I am doing now :/

facelessuser

unread,
May 19, 2021, 8:37:27 AM5/19/21
to beautifulsoup

What version of soupsieve do you have installed? The error seems to be with that library (which I am the author of). It’s complaining about the pseudo class, so I suspect you are on an older version that maybe still uses the old name :contains().

facelessuser

unread,
May 19, 2021, 8:47:27 AM5/19/21
to beautifulsoup

Just any FYI, soupsieve 2.1 is where the rename took place: https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:-soup-contains.

Alessandro Mantovan

unread,
May 19, 2021, 9:45:11 AM5/19/21
to beautifulsoup
Thanks, nobody better than you can help me than :))

just checked, it is the 2.2.1 version

facelessuser

unread,
May 19, 2021, 10:27:04 AM5/19/21
to beautifulsoup
Let me run some tests, I don't see anything immediately wrong with your syntax.

```python
import soupsieve as sv
>>> sv.__version__
'2.2.1'
>>> sv.compile('table > tr:has(> td > a:-soup-contains("Bollate")) td', flags=sv.DEBUG)
## PARSING: 'table > tr:has(> td > a:-soup-contains("Bollate")) td'
TOKEN: 'tag' --> 'table' at position 0
TOKEN: 'combine' --> ' > ' at position 5
TOKEN: 'tag' --> 'tr' at position 8
TOKEN: 'pseudo_class' --> ':has(' at position 10
    is_pseudo: True
    is_open: True
    is_relative: True
TOKEN: 'combine' --> '> ' at position 15
TOKEN: 'tag' --> 'td' at position 17
TOKEN: 'combine' --> ' > ' at position 19
TOKEN: 'tag' --> 'a' at position 22
TOKEN: 'pseudo_contains' --> ':-soup-contains("Bollate")' at position 23
TOKEN: 'pseudo_close' --> ')' at position 49
TOKEN: 'combine' --> ' ' at position 50
TOKEN: 'tag' --> 'td' at position 51
## END PARSING
SoupSieve(pattern='table > tr:has(> td > a:-soup-contains("Bollate")) td', namespaces=None, custom=None, flags=1)
```

facelessuser

unread,
May 19, 2021, 10:30:48 AM5/19/21
to beautifulsoup

I ran your snippet:

from bs4 import BeautifulSoup
import requests

page_source = requests.get('http://www.centrometeolombardo.com/content.asp?CatId=332&ContentType=Dati').text
soup = BeautifulSoup(page_source, 'lxml')
a = soup.select('table > tr:has(> td > a:-soup-contains("Bollate")) td')
print(a)

And I got this:

[<td align="left" height="1" width="45%">
<a href="http://www.centrometeolombardo.com/content.asp?contentid=7299&amp;ContentType=Stazioni" target="_blank"><font color="#000080" face="Verdana" size="2"><b>Bollate</b></font></a>
</td>, <td align="center" height="4" width="8%"><font color="#000080" face="Verdana" size="2"><b>5.1</b></font></td>, <td align="center" height="4" width="8%"><font color="#000080" face="Verdana" size="2"><b>25.1</b></font></td>, <td align="center" height="4" width="27%"><font color="#000080" face="Verdana" size="2"><b>Pioggia</b></font></td>, <td align="center" height="1" width="8%"><font color="#000080" face="Verdana" size="2"><b>1.0</b></font></td>, <td align="center" height="1" width="8%"><font color="#000080" face="Verdana" size="2"><b>-</b></font></td>]

So, I’m not seeing your error. I’d double-check that what you are actually running is what you think you are running.

Alessandro Mantovan

unread,
May 19, 2021, 11:37:35 AM5/19/21
to beautifulsoup
Yes I double checked again and I have this version installed.

However, when cron runs this script (only cron, it works from terminal) it looks like cron uses an old soupsieve version. I can't understand

facelessuser

unread,
May 19, 2021, 11:50:00 AM5/19/21
to beautifulsoup

Unfortunately, this is beyond my expertise to give advice.

It is possible that you have multiple versions of Python and cron uses a different version than you are using in the terminal. Or maybe you have your libraries installed via pip as a user, and cron is running under root, so it accesses a different soupsieve?
My main concern, of course, is whether soupsieve is broken, which it appears it is not. If the correct versions of bs4 and soupsieve are used, there is no issue. I’m not sure what is going on with cron, I assume if you can figure out how to make available the expected versions of bs4 and soupsieve that your issues will be resolved.

Alessandro Mantovan

unread,
May 20, 2021, 8:11:55 AM5/20/21
to beautifulsoup
Yeah yesterday night I came to the same solutions: or too many python versions installed or libraries-cron belonging to different users.

No problem with soupsieve at all since it runs in every other way possible.

A last question: would, in your opinion
-re install the os on raspberry
-install both cron and libraries with pip
solve the problem?

thx, again

facelessuser

unread,
May 20, 2021, 9:52:47 AM5/20/21
to beautifulsoup

A last question: would, in your opinion

-reinstall the os on raspberry


-install both cron and libraries with pip
solve the problem?

Any advice I could give at this point would be guessing. Without understanding the root cause, I would just be wasting your time.

I would try to do things like logging the python version and python path from cron to see if it is the same python being run. Do you use --user when installing packages as a user? If not maybe it isn’t a user vs root issue, or maybe it isn’t, or maybe it is more complicated than I think.

I, unfortunately, do not have any real good advice except to first try and understand what the issue is before making just wasting time guessing at solutions.

Alessandro Mantovan

unread,
May 20, 2021, 6:14:53 PM5/20/21
to beautifulsoup
yeah you are right.

I will investigate that.

Thanks again :))
Reply all
Reply to author
Forward
0 new messages