Need help to skip removing of tags that are not html while parsing

34 views
Skip to first unread message

Srilalitha S

unread,
Apr 21, 2021, 9:32:38 AM4/21/21
to beautifulsoup
Hello,
I am trying to parse a plain text documents with a mixture of html tags and custom text inside angular braces (<>). I want to remove only the html tags and skip removing of the custom text tags. Can you help me with how to do it?

For an example, this is the text im trying to parse

Input: "Should be <i>the</i> <answer> here"
CurrentOutput: Should be the here
ExpectedOutput: Should be the <answer> here or Should be the answer here.

The sample code I tried,
```
def html_to_text(html):
    parser = BeautifulSoup(html, 'html.parser')
    diagnose(html)
    return parser.get_text()
```
Thanks,
Srilalitha S

Computer Learn Point

unread,
Apr 21, 2021, 12:43:48 PM4/21/21
to beauti...@googlegroups.com
I can help you.

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/7bf0638a-af48-4ac6-8d3d-9b9d3d0e8a42n%40googlegroups.com.

Srilalitha S

unread,
Apr 22, 2021, 7:26:05 AM4/22/21
to beauti...@googlegroups.com
Yes please. Can you tell me how we do it?

Srilalitha S
Consultant
Emailsril...@thoughtworks.com
Telephone+91 9940075542
ThoughtWorks


Reply all
Reply to author
Forward
0 new messages