I am using BeautifulSoup to parse DITA (xml) files and I am experiencing an issue with processing instructions. Unfortunately I don't understand xml well enough to know if this is a bug with BeautifulSoup or an error in the xml source.
Running the following script:
import bs4
xml = '<?xml version="1.0" encoding="utf-8"?><p>Test xml with PI <?dtall break="line"?>a..z<?dtall break="line"?></p>'
soup = bs4.BeautifulSoup(xml, 'xml')
str(soup)
Produces the following output:
'<?xml version="1.0" encoding="utf-8"?>\n<p>Test xml with PI <?dtall break="line">a..z<?dtall break="line"></p>'
The closing ? has been stripped from the <?dtall ... ?> tags.
Any help/advice will be appreciated.
Andrew