A simple text extraction of a website

Skip to first unread message

Julius Hamilton

Sep 22, 2021, 12:12:05 PM9/22/21
to beauti...@googlegroups.com

I'm new to Beautiful Soup and I was wondering if someone could sketch out a very simple program to pull out some article text for me.

I inspected some elements in Firefox. I'm pretty sure I just want all the text content minus a few informational captions in the id="main" element / HTML node. That basically means all h tags (h1, h2), p tags, and also some tables inside div tags.

So, what would be a simple beautiful soup script to extract the text of all h, p and div tags inside the id="main" node?

Thanks very much,

Reply all
Reply to author
0 new messages