Need help to skip removing of tags that are not html while parsing

Srilalitha S

unread,

Apr 21, 2021, 9:32:38 AM4/21/21

to beautifulsoup

Hello,

I am trying to parse a plain text documents with a mixture of html tags and custom text inside angular braces (<>). I want to remove only the html tags and skip removing of the custom text tags. Can you help me with how to do it?

For an example, this is the text im trying to parse

Input: "Should be <i>the</i> <answer> here"

CurrentOutput: Should be the here

ExpectedOutput: Should be the <answer> here or Should be the answer here.

The sample code I tried,
```

def html_to_text(html):
parser = BeautifulSoup(html, 'html.parser')
diagnose(html)
return parser.get_text()

```

Thanks,

Srilalitha S

Computer Learn Point

unread,

Apr 21, 2021, 12:43:48 PM4/21/21

to beauti...@googlegroups.com

I can help you.

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/7bf0638a-af48-4ac6-8d3d-9b9d3d0e8a42n%40googlegroups.com.

Srilalitha S

unread,

Apr 22, 2021, 7:26:05 AM4/22/21

to beauti...@googlegroups.com

Yes please. Can you tell me how we do it?

Srilalitha S
Consultant
Email	sril...@thoughtworks.com
Telephone	+91 9940075542

To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/CANo%2B7-HVgMWr09m_aL1H7c86DUd7grUbOtZg%3Da%2B07CGp2ivyYg%40mail.gmail.com.

Reply all

Reply to author

Forward