LXML/html5lib + BOM

21 views
Skip to first unread message

AntonOfTheWoods

unread,
Feb 18, 2021, 10:07:43 AM2/18/21
to beautifulsoup
I saw that there were issues when trying to load into a soup with LXML (only with the "iterparse" load method?) but it looked like the were "fixed" according to the LXML folks. If I try to:

BeautifulSoup(codecs.BOM_UTF8.decode('utf8') + '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>Hi</title></head><body></body></html>', "lxml").head

with 4.6.2 then I get "None". Without the BOM it works fine, and it works fine with "html.parser". It also doesn't work with "html5lib". 

Google didn't throw up any current documentation around this. Is this documented somewhere? Thanks!
Reply all
Reply to author
Forward
0 new messages