A simple text extraction of a website

39 views

Skip to first unread message

Julius Hamilton

unread,

Sep 22, 2021, 12:12:05 PM9/22/21

to beauti...@googlegroups.com

Hey,

I'm new to Beautiful Soup and I was wondering if someone could sketch out a very simple program to pull out some article text for me.

This is such an article: https://docs.microsoft.com/en-us/office/vba/api/excel.filters

I inspected some elements in Firefox. I'm pretty sure I just want all the text content minus a few informational captions in the id="main" element / HTML node. That basically means all h tags (h1, h2), p tags, and also some tables inside div tags.

So, what would be a simple beautiful soup script to extract the text of all h, p and div tags inside the id="main" node?

Thanks very much,

Julius

Reply all

Reply to author

Forward

0 new messages