Processing Instructions

39 views
Skip to first unread message

Andrew Mercer

unread,
Sep 30, 2015, 9:02:31 PM9/30/15
to beautifulsoup
I am using BeautifulSoup to parse DITA (xml) files and I am experiencing an issue with processing instructions. Unfortunately I don't understand xml well enough to know if this is a bug with BeautifulSoup or an error in the xml source.

Running the following script:
    import bs4
    xml = '<?xml version="1.0" encoding="utf-8"?><p>Test xml with PI <?dtall break="line"?>a..z<?dtall break="line"?></p>'
    soup = bs4.BeautifulSoup(xml, 'xml')
    str(soup)

Produces the following output:
    '<?xml version="1.0" encoding="utf-8"?>\n<p>Test xml with PI <?dtall break="line">a..z<?dtall break="line"></p>'

The closing ? has been stripped from the <?dtall ... ?> tags.

Any help/advice will be appreciated.

Andrew

Reply all
Reply to author
Forward
0 new messages