Cannot detect boolean attribute

61 views
Skip to first unread message

Will Abbott

unread,
Jun 11, 2024, 6:23:57 AMJun 11
to beautifulsoup
I am trying to detect if an attribute was a boolean. There seems to be no difference if the tag contains either `test=""` or `test`, it always returns an empty string. I was hoping for a boolean True:

```python
soup = BeautifulSoup("<a test hello='1'>test</a>", "html.parser")

print(testsoup.div.attrs)
# prints: {'test': '', 'hello': ''}
       
# even though:
print(testsoup.encode(formatter="html5"))
# prints: <div hello="" test>test</div>
```

Am I missing something?

Isaac Muse

unread,
Jun 11, 2024, 10:02:51 AMJun 11
to beautifulsoup

Why are you checking div for hello when hello is on the a element?

Isaac Muse

unread,
Jun 11, 2024, 10:08:30 AMJun 11
to beautifulsoup

To clarify, you seem to be showing soup object, but then are operating on a testsoup object. It is very unclear as to what you are doing.

From my tests, things are working just fine:

from bs4 import BeautifulSoup soup = BeautifulSoup("<a test hello='1'>test</a>", "html.parser") print(soup.a.attrs) print(soup.encode(formatter="html5")) {'test': '', 'hello': '1'} b'<a hello="1" test>test</a>'

Keep in mind that HTML using text, not boolean types. It is possible if you are using a literal boolean opposed to strings, that you are running into an issue. In the future, BeautifulSoup might coerce those to strings, but instead of waiting/relying on Beautiful Soup to do such things, it is better to get in the habit of using strings.

Will Abbott

unread,
Jun 11, 2024, 11:48:06 AMJun 11
to beautifulsoup
Yes apologies! I wrote it whilst tired. 

Is there a way we can detect when the attribute is alone without the =""? At the moment, `test=""` and `test` in the tag will both give an empty string when we inspect the `.attrs`. 

I'm using BS4 to transpile one template syntax to another. A feature of the source syntax says that if a lone attribute is present, it will pass boolean True to the target implementation. But at the moment, I can't detect in BS4 if the attribute was indeed alone or with a value of "".

Isaac Muse

unread,
Jun 11, 2024, 11:58:20 AMJun 11
to beautifulsoup

Probably something like

a.get('test') == ''

It will get the attribute if available, and if not, it will return None. Then the value is then compared to the empty string.

Will Abbott

unread,
Jun 11, 2024, 12:24:42 PMJun 11
to beautifulsoup
Both, attributes as `test` and `test=""` return True. 

BS v 4.12.2, trying with both html.parser and html5lib

Isaac Muse

unread,
Jun 11, 2024, 12:38:55 PMJun 11
to beautifulsoup

Because that is how the parser represents them, it normalizes them. An attribute without a value has an empty value. You are showing one explicitly with an empty value, and one implicitly with an undefined value, but they are the same as far as the parser is concerned.

from bs4 import BeautifulSoup soup = BeautifulSoup("<a test hello>test</a>", "html.parser") print(soup.a.attrs) soup = BeautifulSoup("<a test hello=''>test</a>", "html.parser") print(soup.a.attrs) {'test': '', 'hello': ''} {'test': '', 'hello': ''}

Isaac Muse

unread,
Jun 11, 2024, 12:40:56 PMJun 11
to beautifulsoup
This is pretty much the way browsers treat them.

Will Abbott

unread,
Jun 11, 2024, 12:46:28 PMJun 11
to beautifulsoup
Frameworks such as vue will pass a boolean True for valueless/boolean attributes. It's the same behaviour I would like to replicate in my usecase.

Your example is what I was seeing and what led me to post.

The release notes for 4.10 raised my hopes a bit...

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

But I guess that's just in the way it normalizes the attributes, not how it's stored in the attrs on the instance.

leonardr

unread,
Jun 11, 2024, 1:01:43 PMJun 11
to beautifulsoup
I can confirm that all of the underlying HTML parsers (html.parser, lxml, and html5lib) treat the markup <tag value> and <tag value=""> the same way: by passing Beautiful Soup a dict that maps value to the empty string. This follows the HTML spec, which says that attribute values must always be strings. Theoretically there could be a parser that used a special stringlike object to preserve the distinction, but none of the parsers we have do that.

The fix for bug #2065525, which will be in the 4.13 release, changes the way Beautiful Soup handles non-string attribute values on output, to be consistent with the HTML spec. But the only way those values can show up is if they are set on a tag in Python code after parsing. The release note you saw about the html5 formatter is also dealing with the way tags in Beautiful Soup are output back into strings. We have a lot of control over how markup is output, but when it comes to the way markup is preprocessed on input, we're at the mercy of the parsers.

Leonard

Isaac Muse

unread,
Jun 11, 2024, 1:09:11 PMJun 11
to beautifulsoup

It should be noted that in a browser, an empty string value is no different to no value. While they are treated intuitively as boolean, they aren’t really booleans.

You can use this CSS in a browser [test=""] and find attributes with explicit and implicit attributes. There is no difference, and parsers should really have differences in this regards.

Isaac Muse

unread,
Jun 11, 2024, 1:10:10 PMJun 11
to beautifulsoup
I should have proofread. There is NO difference, and parsers should really have NO differences in this regards.

Will Abbott

unread,
Jun 11, 2024, 1:26:12 PMJun 11
to beautifulsoup
Ok, thanks for clearing that up for me, it closes a path but opens some potential new ones.
Reply all
Reply to author
Forward
0 new messages