Hi, I'm just wondering if anyone else has encountered this issue, or whether there is something I am doing wrong here.
When using Beatifulsoup4 and type checking with BasedPyright I've noticed that return types for find() and find_all() are _AtMostOneElement and _QueryResults respectively.
I understand _AtMostOneElement is a typing class that in practical terms equates to Tag | NavigableString | PageElement | None. And that _QueryResults is a list of _AtMostOneElement objects.
This does however cause a slight issue with strict typing as if I want to directly assign the results to a variable I have no way to declare the type.
For _AtMostOneElement I can do:
if soup.find("foo"):
bar: Tag | PageElement | NavigableString = soup.find("foo")
Then refine it further to narrow down which element is returned.
Or just do:
if is instance(soup.find("foo"), Tag):
bar: Tag = soup.find("foo")
But that gets repetative
For _QueryResults that wouldn't work as it only confirms the existance of a list. I would have to do it for each element of the list.
I know I can also use:
if soup.foo:
bar: Tag = soup.foo
But this doesn't allow for extra conditions.
For actual safety we *should* be doing this checking on every return value, I am fine with that. Even though is prevents the chaining of the
But the current return types make it difficult to even use a helper function for this as with something like:
def _ensure_tag(element) -> Tag:
"""
Ensures that the given element is a BeautifulSoup Tag.
If the element is not a Tag, raises a ValueError.
"""
if isinstance(element, Tag):
return element
raise ValueError("No Tag returned from BeautifulSoup query")
bar: Tag = _ensure_tag(soup.find("foo"))
I cannot assign a type to the 'element' argument which is a _AtMostOneElement object.
I can import _AtMostOneElement from bs4._typing and assign it but it's obviously not supposed to be used that way.
As usual with things like this it's more likely that this is user error and I am missing something or using the wrong syntax or have some kind of configuration wrong. But if not could someone take a look at the return types and either change them to be
Tag | NavigableString | PageElement | None
and
list[Tag | NavigableString | PageElement | None]
or assign a return type that is exposed for import for the purposes of type safety.
I think this would allow proper actual checking of the return types while ensuring correct documentation for the type checkers.
There is also the possibility of using one of the ElementFilter or SoupStrainer classes to restrict the search process itself, but when I tried this by setting an Element filter that returned isinstance(element, Tag) it was still typed as _AtMostOneElement. Plus I couldn't really figure out the correct syntax to combine that with other attribute criteria.
Again, I'm very much expecting that I'm using it wrong and I'm looking forward to learning something.
Thanks in advance for any help,
And Many thanks to all the maintainers who work on this project.
Tom