I have this problem a lot with Python and XML. Even with Uche's
excellent yearly roundups I have a hard time finding how to do fancy
things with XML in Python. I think it's a bit like web server
frameworks in Python - too many choices.
http://www.xml.com/pub/a/2004/10/13/py-xml.html
my own favorite is libxml2. Something like the following:
#!/usr/bin/env python
import libxml2
import sys
def grep(what, where):
doc = libxml2.parseDoc(where)
for found in doc.xpathEval(what):
found.saveTo(sys.stdout, format=True)
if __name__ == '__main__':
try:
what = sys.argv[1]
except IndexError:
sys.exit("Usage: %s pattern file ..." % sys.argv[0])
else:
for where in sys.argv[2:]:
grep(what, file(where).read())
although you might want to be smarter with the errors...
--
John Lenton (jo...@grulic.org.ar) -- Random fortune:
The whole world is a scab. The point is to pick it constructively.
-- Peter Beard
I figured this out. Thanks for the help, John! Examples below.
I used this exercise as an opportunity to get something off my chest
about XML and Python - it's kind of a mess! More here:
http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath.html
Here are my samples, in three libraries:
# PyXML
from xml.dom.ext.reader import Sax2
from xml import xpath
doc = Sax2.FromXmlFile('foo.opml').documentElement
for url in xpath.Evaluate('//@xmlUrl', doc):
print url.value
# libxml2
import libxml2
doc = libxml2.parseFile('foo.opml')
for url in doc.xpathEval('//@xmlUrl'):
print url.content
# ElementTree
from elementtree import ElementTree
tree = ElementTree.parse("foo.opml")
for outline in tree.findall("//outline"):
print outline.get('xmlUrl')
Please see my blog entry for more commentary
http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath.html
http://www.oreillynet.com/pub/wlg/6224
http://www.oreillynet.com/pub/wlg/6225
Meanwhile, please don't make the mistake of bothering with XQuery.
It's despicable crap. And a huge impedance mismatch with Python.
--Uche