Processing Instructions

39 views

Skip to first unread message

Andrew Mercer

unread,

Sep 30, 2015, 9:02:31 PM9/30/15

to beautifulsoup

I am using BeautifulSoup to parse DITA (xml) files and I am experiencing an issue with processing instructions. Unfortunately I don't understand xml well enough to know if this is a bug with BeautifulSoup or an error in the xml source.

Running the following script:

import bs4

xml = '<?xml version="1.0" encoding="utf-8"?><p>Test xml with PI <?dtall break="line"?>a..z<?dtall break="line"?></p>'

soup = bs4.BeautifulSoup(xml, 'xml')

str(soup)

Produces the following output:

'<?xml version="1.0" encoding="utf-8"?>\n<p>Test xml with PI <?dtall break="line">a..z<?dtall break="line"></p>'

The closing ? has been stripped from the <?dtall ... ?> tags.

Any help/advice will be appreciated.

Andrew

Reply all

Reply to author

Forward

0 new messages