Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

xml.sax parsing elements with the same name

4 views
Skip to first unread message

amadain

unread,
Jan 11, 2010, 2:13:01 PM1/11/10
to
I have an event log with 100s of thousands of entries with logs of the
form:

<event eventTimestamp="2009-12-18T08:22:49.035"
uniqueId="1261124569.35725_PFS_1_1340035961">
<result value="Blocked"/>
<filters>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Triggered"/>
</diagnostic>
</filter>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Blocked"/>
</diagnostic>
</filter>
</filters>
</event>

I am using xml.sax to parse the event log. The trouble with the file
above is when I parse for result value I get the last result value
(Blocked from above). I want to get the result value triggered (the
second in the event).

my code is as follows:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',"")
self.eventUniqueId = attrs.get('uniqueId', "")
if name == 'result':
self.resultValue = attrs.get('value',"")
return

def endElement(self, name):
if name=="event":
result= eval(self.filter)
if result:
...

How do I get the result value I require when events have the same
names like above?

John Bokma

unread,
Jan 11, 2010, 2:26:30 PM1/11/10
to
amadain <mfmd...@gmail.com> writes:

You have to keep track if you're inside a filters section, and keep
track of the filter elements (first, second, etc.) assuming you want the
result value of the first filter.

--
John Bokma

Read my blog: http://johnbokma.com/
Hire me (Perl/Python): http://castleamber.com/

amadain

unread,
Jan 11, 2010, 3:08:13 PM1/11/10
to
On Jan 11, 7:26 pm, John Bokma <j...@castleamber.com> wrote:

how do I keep track? The first result value is outside a filters
section and the rest are. Do you mean something like:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',"")
self.eventUniqueId = attrs.get('uniqueId', "")
if name == 'result':
self.resultValue = attrs.get('value',"")

if name == filters:
if name == 'result':
self.resultValueF = attrs.get('value',"")
return

A

John Bokma

unread,
Jan 11, 2010, 4:03:44 PM1/11/10
to
amadain <mfmd...@gmail.com> writes:

> On Jan 11, 7:26 pm, John Bokma <j...@castleamber.com> wrote:
>> amadain <mfmdev...@gmail.com> writes:


>> > <event eventTimestamp="2009-12-18T08:22:49.035"
>> > uniqueId="1261124569.35725_PFS_1_1340035961">
>> >    <result value="Blocked"/>
>> >       <filters>
>> >           <filter code="338" type="Filter_Name">
>> >               <diagnostic>
>> >                    <result value="Triggered"/>
>> >               </diagnostic>
>> >           </filter>
>> >           <filter code="338" type="Filter_Name">
>> >               <diagnostic>
>> >                    <result value="Blocked"/>
>> >               </diagnostic>
>> >           </filter>
>> >       </filters>
>> > </event>

> how do I keep track? The first result value is outside a filters


> section and the rest are. Do you mean something like:
>
> def startElement(self, name, attrs):
> if name == 'event':
> self.eventTime = attrs.get('eventTimestamp',"")
> self.eventUniqueId = attrs.get('uniqueId', "")
> if name == 'result':
> self.resultValue = attrs.get('value',"")
> if name == filters:
> if name == 'result':
> self.resultValueF = attrs.get('value',"")
> return

I was thinking about something like:

self.filterIndex = 0

in startElement:

if name == 'filter':
self.filterIndex += 1
return
if name == 'result' and self.filterIndex == 1:
... = attrs.get('value', '')

in EndElement

if name == 'filters':
self.filterIndex = 0

If you want the result of the first filter in filters

amadain

unread,
Jan 11, 2010, 4:24:14 PM1/11/10
to
> Hire me (Perl/Python):http://castleamber.com/e e

Thank you. I will try that

Stefan Behnel

unread,
Jan 12, 2010, 3:13:35 AM1/12/10
to
amadain, 11.01.2010 20:13:

> I have an event log with 100s of thousands of entries with logs of the
> form:
>
> <event eventTimestamp="2009-12-18T08:22:49.035"
> uniqueId="1261124569.35725_PFS_1_1340035961">
> <result value="Blocked"/>
> <filters>
> <filter code="338" type="Filter_Name">
> <diagnostic>
> <result value="Triggered"/>
> </diagnostic>
> </filter>
> <filter code="338" type="Filter_Name">
> <diagnostic>
> <result value="Blocked"/>
> </diagnostic>
> </filter>
> </filters>
> </event>
>
> I am using xml.sax to parse the event log.

You should give ElementTree's iterparse() a try (xml.etree package).
Instead of a stream of simple events, it will give you a stream of
subtrees, which are a lot easier to work with. You can intercept the event
stream on each 'event' tag, handle it completely in one obvious code step,
and then delete any content you are done with to safe memory.

It's also very fast, you will like not loose much performance compared to
xml.sax.

Stefan

Adam Tauno Williams

unread,
Jan 15, 2010, 2:08:25 PM1/15/10
to pytho...@python.org
> Thank you. I will try that

If you document is reasonably complex I usually define some modes like:

BPML_BOOTSTRAP_MODE = 0
BPML_PACKAGE_MODE = 1
BPML_PROCESS_MODE = 2
BPML_CONTEXT_MODE = 3
....
BPML_EVENT_MODE = 10
BPML_FAULTS_MODE = 11
BPML_ATTRIBUTES_MODE = 12

- so I can self.mode.append(BPML_PROCESS_MODE) when I enter an element
(startElement) and self.mode = self.mode[:-1] when I leave an element
(endElement). This provides you with a complete 'stack trace' of how
you got where you are and still lets you efficiently process the stream
[verses using evil document model]. In startElement you can check the
current mode and tag with something like -
...
elif (name == 'event' and self.mode[-1] -- BPML_PROCESS_MODE):
...

--
OpenGroupware developer: awil...@whitemice.org
<http://whitemiceconsulting.blogspot.com/>
OpenGroupare & Cyrus IMAPd documenation @
<http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>

dontcare

unread,
Feb 8, 2010, 11:40:54 PM2/8/10
to
If you are using jython, then you might also want to consider VTD-XML,
which is
a lot easier to use and faster than SAX, native XPath support may be
useful too

http;//vtd-xml.sf.net

On Jan 12, 12:13 am, Stefan Behnel <stefan...@behnel.de> wrote:
> amadain, 11.01.2010 20:13:
>
>
>
>
>
> > I have an event log with 100s of thousands of entries with logs of the
> > form:
>
> > <event eventTimestamp="2009-12-18T08:22:49.035"
> > uniqueId="1261124569.35725_PFS_1_1340035961">
> >    <result value="Blocked"/>
> >       <filters>
> >           <filter code="338" type="Filter_Name">
> >               <diagnostic>
> >                    <result value="Triggered"/>
> >               </diagnostic>
> >           </filter>
> >           <filter code="338" type="Filter_Name">
> >               <diagnostic>
> >                    <result value="Blocked"/>
> >               </diagnostic>
> >           </filter>
> >       </filters>
> > </event>
>

> > I am usingxml.saxto parse the event log.


>
> You should give ElementTree's iterparse() a try (xml.etree package).
> Instead of a stream of simple events, it will give you a stream of
> subtrees, which are a lot easier to work with. You can intercept the event
> stream on each 'event' tag, handle it completely in one obvious code step,
> and then delete any content you are done with to safe memory.
>
> It's also very fast, you will like not loose muchperformancecompared toxml.sax.
>

> Stefan- Hide quoted text -
>
> - Show quoted text -

0 new messages