Hi, I'm wondering if this is possible with Beautiful Soup, or if anyone can suggest how to do it.
My application sends html-format messages to Telegram. These are html fragments, they don't contain a <body> tag, just various tags for marking up the text.
Telegram limits the message length to 4096 bytes.
Longer messages must be broken up into multiple smaller messages.
Splitting the messages on the 4096-byte boundary causes an exception when it splits messages in the middle of an html entity.
Allowed html entities are a reduced set: b, strong, i, u, img, a.
I thought I might be able to customize html.parser to do this, by creating a stack that tracked the position in the file of start/end tags, and then splitting the file only in places where the stack was empty (i.e. outside of any html tags). However, this was starting to look complicated.
Any suggestions on how to go about this, and might it be possible with Beautiful Soup?