How to preserve spaces in the html class

344 views
Skip to first unread message

Paras Jain

unread,
Aug 24, 2023, 7:23:19 AM8/24/23
to beautifulsoup
Hi All,

I have an HTML block where I'm using the classes as it is.
When I'm parsing HTML in soup, it normalises spaces and alters the class.

I this example<div class="class_with spaces">Content 1</div>

<div class="class_with         spaces">Content 1</div>

When I'm loading this with soup, it converts it into 
<div class="class_with spaces">Content 1</div>


How can I preserve spaces while using beautiful soup?


I really appreciate any help is provided.



leonardr

unread,
Aug 24, 2023, 2:09:20 PM8/24/23
to beautifulsoup
The default Beautiful Soup behavior when parsing HTML is to treat the 'class' attribute as a multi-valued attribute, and split its string value on whitespace to create a list of values. The HTML5 spec calls this a "set of space-separated tokens" and says that the whitespace isn't relevant.

The simplest way to stop Beautiful Soup from treating the 'class' attribute as a multi-valued attribute is to turn off multi-valued attributes altogether, which you can do by passing multi_valued_attributes=None into the BeautifulSoup constructor:

>>> from bs4 import BeautifulSoup
>>> markup = '<div class="class_with         spaces">Content 1</div>'
>>> BeautifulSoup(markup)

<div class="class_with spaces">Content 1</div>
>>> BeautifulSoup(markup, multi_valued_attributes=None)

<div class="class_with         spaces">Content 1</div>


The documentation on multi-valued attributes has a bit more information if you need finer-grained control than this.

Leonard
Reply all
Reply to author
Forward
0 new messages