Possible bug with new_tag and soupsieve's select

29 views
Skip to first unread message

João Seckler

unread,
Jul 7, 2025, 7:36:00 AMJul 7
to beautifulsoup
Hello,

I think I encountered bug. I'd like to ask for help identifying if it is in fact a bug or if I'm misunderstanding something.

My problem is: if I create a new tag with a prefixed attribute, I can't seem to find it with soupsieve's select using attribute prefix matching (i.e. [prefix|name=value]).

My minimal reproducible example is based on this xhtml document:

soup = BeautifulSoup("""<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE html>

<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.id\

rg/2007/ops">

<body>

<p epub:type="toc"></p>

</body>

</html>

""", "xml")


If I try:

# Existing tag is found:

print(soup.select("[epub|type=toc]")) # [<p epub:type="toc"/>]


# Creating new tag

new_tag = soup.new_tag("p", attrs={"epub:type": "pagebreak"})

soup.body.append(new_tag)

# Created tag is not found

print(soup.select("[epub|type=pagebreak]")) # []


As a workaround,  I've found, however, that I can copy some existing tag and alter it to make it discoverable:

# Copy existing tag and insert it to document

tag = soup.p
new_tag = tag.copy_self()
new_tag.attrs["epub:type"] = "pagebreak"

soup.body.append(new_tag)

# Created tag is found
print(soup.select("[epub|type=pagebreak]"))  # [<p epub:type="pagebreak"/>]


I've tried reproducing the copy_self method be calling the Tag constructor directly, but had no success. I've done a little digging into soupsieve's code but couldn't understand what was the problem. Any pointers for helping me understand this bug -- if a bug at all -- are welcome.


If this is in fact a bug, I'd also like help understanding where to file it: at beautifulsoup or at soupsieve.


Thanks in advance,

João


Isaac Muse

unread,
Jul 8, 2025, 3:06:59 AMJul 8
to beautifulsoup

This is not a bug in soupsieve. Normally, when attributes that have namespaces are created, the key in the attribute, while it looks like a normal string, has special attributes on it that contains the namespace:

>>> element = soup.select("[epub|type=toc]")[0] >>> [(k.namespace, k.name) for k in element.attrs.keys()] [('http://www.id rg/2007/ops', 'type')]

It seems Beautiful Soup doesn’t create the new tag with the namespace context, maybe because it has no parent with the namespace, or maybe for other reasons, I haven’t looked into it. Regardless, soupsieve expects namespace attributes to be constructed as namespace attributes, and when they are not, they are assumed to be normal attributes. We can force the tag to have the namespace attributes by copying the attribute key, which is what copy_self does.

element = soup.select("[epub|type=toc]")[0] attrs2 = element.attrs.copy() attrs2["epub:type"] = "pagebreak" new_tag = soup.new_tag("p", attrs=attrs2) soup.body.append(new_tag)

Anyway, BeautifulSoup would need to have a way to create a new tag directly under another element so it could use that context to populate namespaces correctly based on where it is being inserted. Soupsieve is simply doing what is expected, checking if the attribute follows the namespace attribute convention, and when it doesn’t, it is assumed to be a normal attribute.

Reply all
Reply to author
Forward
0 new messages