<channel name='#sandbox'>
<message user='PeterScott'>Hello, my bot</message>
<message user='PeterScott'>This is a message</message>
<nickchange>
<oldnick>PeterScott</oldnick>
<newnick>PeterSc</newnick>
</nickchange>
</channel>
I'm writing another program that should parse that sort of XML on its
stdin, printing out a more user-friendly representation. For this, I
need to parse the XML as it comes in, not all at once.
I wrote a parser using xml.sax, and it works well---provided that I
read in the whole document. However, I want to be able to just read
the XML piece by piece, calling event handlers whenever something
happens and waiting for more to happen.
Is there some way to do this with the standard python xml parsers?
Will I need to use PyXML? Or what?
Thanks,
-Peter
> Is there some way to do this with the standard python xml parsers?
> Will I need to use PyXML? Or what?
xml.parsers.expat can parse things in pieces. It shouldn't be *too* much
work to convert over.
Peter,
Check out the IncrementalParser class in the library module
Lib/xml/sax/xmlreader.py
This extension of the standard XMLReader class acts just like a SAX
parser, in that it delivers SAX2 events to your ContentHandler as it
processes the tokens from the source XML document.
But rather than the parser itself controlling when and how it gets its
input, you control that through the use of the .feed() method. So you
can "drip feed" the parser with input if you wish.
Not all XML parsers support an IncrementalParser interface. In order
for an XML parser to support incremental parsing, it must have been
coded specifically to do so. Fortunately, the expat wrapper supplied
with the base distribution of python does support incremental parsing.
Which I think should solve your problem quite nicely. When you start
up your process for the first time, feed() the IncrementalParser a
document element (all XML document must have one and only one document
element). Then simply feed the output of your logging stream directly
to the IncrementalParser, as and when you receive it.
You should not have any problems with XML tokens being split over two
different .feed() calls either. For example, this should work just
fine
ip = IncrementalParser()
ip.feed('<docu')
ip.feed('ment')
ip.feed('/>')
When your logging stream is closing, simply feed a close tag for your
document element to your IncrementalParser, and everything will clean
up nicely.
Here is some sample code:
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
import xml.sax
from xml.sax.handler import ContentHandler
logentry = """
<channel name='#sandbox'>
<message user='PeterScott'>Hello, my bot</message>
<message user='PeterScott'>This is a message</message>
<nickchange>
<oldnick>PeterScott</oldnick>
<newnick>PeterSc</newnick>
</nickchange>
</channel>
"""
incr_parser = xml.sax.make_parser('xml.sax.expatreader')
incr_parser.setContentHandler(ContentHandler())
incr_parser.feed('<mylogstream>')
incr_parser.feed(logentry)
incr_parser.feed('</mylogstream>')
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
regards,
--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan