How to shuffle only p tags (with their structure)

42 views
Skip to first unread message

Unk

unread,
Aug 24, 2023, 7:28:44 PM8/24/23
to beauti...@googlegroups.com
Is there any way to get all p tags from html then shuffle them and put them back to html where we got them with their structure.
I tried 12+hrs still can't get the solution

from bs4 import BeautifulSoup
import random

with open('scram.html') as file:
    html_content = file.read()

soup = BeautifulSoup(html_content, 'html.parser')

p_tags = soup.find_all('p')
shuffled_p = p_tags[:]
random.shuffle(shuffled_p)

for tag in p_tags:
    childs = tag.parent.find_all(recursive=False)
    parent_ = childs[0].parent
    shuffled_parent = soup.new_tag(parent_.name)
    for child in childs:
        if child in p_tags:
            shuffled_parent.append(shuffled_p[0])
            shuffled_p.pop(0)
        else:
            shuffled_parent.append(child)
    parent_.replace_with(shuffled_parent)

modified_html = str(soup)

with open('scram__.html', 'w') as file:
    file.write(modified_html)
scram.html

Isaac Muse

unread,
Aug 24, 2023, 8:02:41 PM8/24/23
to beautifulsoup

Yeah, you can do something like this. There are probably lots of ways to do this, this is just one.

from bs4 import BeautifulSoup, Tag import random with open('scram.html') as file: html_content = file.read() soup = BeautifulSoup(html_content, 'html.parser') def get_position(tag): """Get the position of the p tag and return the parent and the position under the parent.""" parent = tag.parent for e, child in enumerate(parent.children): if isinstance(child, Tag) and child is tag: return parent, e raise RuntimeError('Could not find tag position under a parent') p_tags = soup.css.select('p') tag_positions = [get_position(p) for p in p_tags] random.shuffle(tag_positions) for pos, p in zip(tag_positions, p_tags): parent, index = pos parent.insert(index, p) with open('scram2.html', 'wb') as file: file.write(soup.encode())
Reply all
Reply to author
Forward
0 new messages