Typing of objects returned from find() and findall().

35 views
Skip to first unread message

Thomas Wylie

unread,
May 4, 2025, 10:14:01 AMMay 4
to beautifulsoup
Hi, I'm just wondering if anyone else has encountered this issue, or whether there is something I am doing wrong here.

When using Beatifulsoup4 and type checking with BasedPyright I've noticed that return types for find() and find_all() are _AtMostOneElement and  _QueryResults respectively.

I understand _AtMostOneElement is a typing class that in practical terms equates to Tag | NavigableString | PageElement | None. And that _QueryResults is a list of _AtMostOneElement objects.

This does however cause a slight issue with strict typing as if I want to directly assign the results to a variable I have no way to declare the type.

For _AtMostOneElement I can do:

if soup.find("foo"):
   bar: Tag | PageElement | NavigableString = soup.find("foo")

Then refine it further to narrow down which element is returned.

Or just do:
if is instance(soup.find("foo"), Tag):
    bar: Tag = soup.find("foo")

But that gets repetative

For _QueryResults that wouldn't work as it only confirms the existance of a list. I would have to do it for each element of the list.

I know I can also use:

if soup.foo:
  bar: Tag = soup.foo

But this doesn't allow for extra conditions.

For actual safety we *should* be doing this checking on every return value, I am fine with that. Even though is prevents the chaining of the

But the current return types make it difficult to even use a helper function for this as with something like:

def _ensure_tag(element) -> Tag:
    """
    Ensures that the given element is a BeautifulSoup Tag.
    If the element is not a Tag, raises a ValueError.
    """
    if isinstance(element, Tag):
        return element
    raise ValueError("No Tag returned from BeautifulSoup query")

bar: Tag = _ensure_tag(soup.find("foo"))

I cannot assign a type to the 'element' argument which is a _AtMostOneElement object.
I can import _AtMostOneElement from bs4._typing and assign it but it's obviously not supposed to be used that way.

As usual with things like this it's more likely that this is user error and I am missing something or using the wrong syntax or have some kind of configuration wrong. But if not could someone take a look at the return types and either change them to be

Tag | NavigableString | PageElement | None
and 
list[Tag | NavigableString | PageElement | None]

or assign a return type that is exposed for import for the purposes of type safety.

I think this would allow proper actual checking of the return types while ensuring correct documentation for the type checkers.

There is also the possibility of using one of the ElementFilter or SoupStrainer classes to restrict the search process itself, but when I tried this by setting an Element filter that returned isinstance(element, Tag) it was still typed as _AtMostOneElement. Plus I couldn't really figure out the correct syntax to combine that with other attribute criteria.

Again, I'm very much expecting that I'm using it wrong and I'm looking forward to learning something.

Thanks in advance for any help,

And Many thanks to all the maintainers who work on this project.

Tom


leonardr

unread,
May 4, 2025, 11:07:41 AMMay 4
to beautifulsoup
Tom,

I don't think you're doing anything wrong. The find() API was designed to be used with duck typing. Although its type hints are technically correct, if you're doing type-safe programming you need to basically undo the duck typing every time based on what you know about the arguments to the method call.

In bug 2099772 I talk about some possibilities for changing this. I'm leaning towards recommending the low-level API introduced in 3.14 as the basis for type-safe programming. I think it will only take a few minor changes to get something that's usable.

Leonard

Thomas Wylie

unread,
May 4, 2025, 5:26:53 PMMay 4
to beautifulsoup
Hi Leonard,

Thanks for the quick reply! 

Those suggestions look like good options. Or possibly rather than having duplicate find methods (which it didn't seem like you are keen on anyway) woud it be possible to add a "return_type" argument which internally is just an ElementFilter that returns isinstance(element, <type>)?

Or in the docs have some simple templates for ElementFilters that achieve the same thing and make it clear how to combine them in a modular way with the other arguments.We can't really chain methods with the strict type checking so it might be that the searching-the-tree functions need to be able to accept multiple filters? (They might already be able to, I couldn't really work it out during my brief play with them)

For example: you define an isinstance(element, Tag) filter and a "class": "bar" filter and then just pass them both in - soup.find(is_tag, class_is_bar). Maybe with some kind of decorator for the ones that filter the return object to make them easily identifiable? I've only just started looking at BeautifulSoup in the last few days so I don't know what would be practical/easier for you.


The other way is maybe just to expose the intermediate classes for import for typing purposes and make it clear that's what they're for. It's no problem using them in the code (and actually easier than writing None | Tag | NavigableString | PageElement). When I imported _AtMostOneElement it actually works fine. 

After an "if:" check on _AtMostOneElement to exclude None the type checkers see it as Tag | NavigableString | PageElement

def _ensure_tag(element: _AtMostOneElement) -> tag
    if isinstance(element, Tag):
        return element
    raise ValueError("No Tag returned from BeautifulSoup query")

Works fine too.

And 

foo: Tag = _ensure_tag(soup("bar"))
baz: Tag = _ensure_tag(foo("qux"))

looks cleaner than

if soup("bar"):
    foo: Tag = soup("bar")
if foo("qux"):
    baz: Tag = soup("qux")

Which I think we'd still have to do if you implemented it in the api unless you had it internally raise an error if there are no results so that the return type was Tag and not Tag | None.

Importing _AtMostOneElement only threw up the one error on import because it's a private class otherwise everything else worked as expected. We just need an exposed type to assign them to.

Would something like that be an option? Perhaps in the short term? Plus if you then implemented the ElementFilter option afterwards it should be easy for users to turn the helper functions into filters.

Tom
Reply all
Reply to author
Forward
0 new messages