For nested lists in HTML, the nested list is supposed to be within an <li> element rather than as a child of the <ol> or <ul>. However, given this slightly non-conformant HTML snippet produced by Gmail:
x = "<ol><li>1</li><ol><li>2</li></ol></ol>"
soup = BeautifulSoup(x, "lxml")
The above code snippet works the way I would expect, and creates the following tree (minus the html and body wrapper tags):
"<ol><li>1</li><ol><li>2</li></ol></ol>"
However, if we look at this very similar code snippet below:
y = "<ol><li>1</li><ul><li>*</li></ul></ol>"
soup = BeautifulSoup(y, "lxml")
We get the value:
"<ol><li>1</li></ol><ul><li>*</li></ul>"
Which, unless there is something I don't know about in the HTML spec prohibiting nested lists of mixed types, seems inconsistent and/or broken because now the unordered nested list is outside of the ordered list rather than nested within it.
I would expect instead for bs4 to produce one of the following two trees:
"<ol><li>1</li><ul><li>*</li></ul></ol>"
"<ol><li>1</li><li><ul><li>*</li></ul></li></ol>"
Either would be fine, but what currently gets produced seems pretty suboptimal because it substantially changes the meaning of the text.