Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SAX questions...

28 views
Skip to first unread message

Timothy Grant

unread,
Jul 22, 2003, 1:22:09 PM7/22/03
to
I've worked with DOM before, but never SAX, I have a script that seems to work
quite well, but every time I look at it I think it's amazingly unweildly and
that I must be doing something wrong.

def startElement(self, name, attr):
if name == "foo":
do_foo_stuff()
elif name == "bar":
do_bar_stuff()
elif name == "baz":
do_baz_stuff()

There's similar code in endElement() and characters() and of course, the more
tags that need processing the more unweildly each method becomes.

I could create a dictionary and dispatch to the correct method based on a
dictionary key. But it seems to me there must be a better way.

Is there?


--
Stand Fast,
tjg.

Timothy Grant
www.craigelachie.org


Alan Kennedy

unread,
Jul 22, 2003, 3:01:31 PM7/22/03
to
Timothy Grant wrote:

> I've worked with DOM before, but never SAX, I have a script that
> seems to work quite well, but every time I look at it I think it's
> amazingly unweildly and that I must be doing something wrong.
>
> def startElement(self, name, attr):
> if name == "foo":
> do_foo_stuff()
> elif name == "bar":
> do_bar_stuff()
> elif name == "baz":
> do_baz_stuff()
>
> There's similar code in endElement() and characters() and of course,
> the more tags that need processing the more unweildly each method
> becomes.
>
> I could create a dictionary and dispatch to the correct method based on a
> dictionary key. But it seems to me there must be a better way.

Tim,

No, you're not doing anything wrong in your above code. But, yes, code
like that can get pretty unwieldy when you've got deeply nested
structures. A couple of suggested alternative approaches:

1. As you already suggested, build up a dictionary mapping tag names
to functions. But that still means that you have to maintain state
information between your function calls.

2. Represent each new element that comes in as a python object
(object()), i.e. construct a "Python Object Model" or POM. All xml
attributes on the element become python attributes of the python
object. That object then goes onto a simple stack. Each object() also
has a list of children. When a new xml child element is passed,
through a nested startElement call, it too gets turned into a python
object, and made a child of the python object on the top of the stack.
Pop objects off the stack as you receive endElement events.

This paradigm is described here, with working code

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/149368

3. Have a specialised python class for each element type (i.e. tag
name), that implements the "business logic" you want associated with
the element type. Write descriptors or properties for each specialised
class that, for example, only allow child python objects of a certain
type. Create a dictionary mapping tag names to your classes, and
instantiate a new python object of that class everytime you receive
the relevant tag name through a startElement call. Put it onto a
stack, as above, store nested xml elements as children of the
top-of-stack, and pop the stack when you receive endElement calls.

That's just a couple of approaches. There might be other, perhaps more
appropriate, methods, if you could describe what you're trying to
achieve. Are you extracting data from the documents? Or are you
checking that the documents comply with some form of hierarchical
structure? Or something else entirely?

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan

Bob Gailer

unread,
Jul 22, 2003, 2:02:45 PM7/22/03
to
At 10:22 AM 7/22/2003 -0700, Timothy Grant wrote:

>I've worked with DOM before, but never SAX, I have a script that seems to
>work
>quite well, but every time I look at it I think it's amazingly unweildly and
>that I must be doing something wrong.
>
>def startElement(self, name, attr):
> if name == "foo":
> do_foo_stuff()
> elif name == "bar":
> do_bar_stuff()
> elif name == "baz":
> do_baz_stuff()
>
>There's similar code in endElement() and characters() and of course, the more
>tags that need processing the more unweildly each method becomes.
>
>I could create a dictionary and dispatch to the correct method based on a
>dictionary key. But it seems to me there must be a better way.

Please note that this is not a SAX-specific question. I suggest a subject
of How To Manage Function Dispatching.

Your question comes down to what's the "best" way to dispatch a piece of
logic based on some condition. You have already given 2 good alternatives:
your example, and using a dictionary. The other approach I sometimes use is
defining a class for each condition and then instantiating the class as needed.

Class foo:
def __init__(self):
# do foo stuff

etc for each case, then

try: x = eval(name+'()')
except: raise "Unknown name " + name

Which, of course, is just a dictionary lookup in disguise.

Bob Gailer
bga...@alum.rpi.edu
303 442 2625

Andrew Dalke

unread,
Jul 22, 2003, 6:18:32 PM7/22/03
to
Timothy Grant:

> def startElement(self, name, attr):
> if name == "foo":
> do_foo_stuff()
> elif name == "bar":
> do_bar_stuff()
> elif name == "baz":
> do_baz_stuff()
...

> I could create a dictionary and dispatch to the correct method based on a
> dictionary key. But it seems to me there must be a better way.

def startElement(self, name, attr):
f = getattr(self, "do_" + name + "_stuff", None)
if f is not None:
f()

However, the dictionary interface is probably faster, and you can
build it up using introspection, like

class MyHandler:
def __init__(self):
self.start_d = {}
self.end_d = {}
for name in MyHandler.__dict__:
if name.startswith("do_") and name.endswith("_stuff"):
s = name[3:-6]
self.start_d[s] = getattr(self, name)
elif name.startswith("done_") and ...
s = name[5:-6]
self.end_d[s] = getattr(self, name)

def startElement(self, name, attrs):
f = self.start_d.get(name)
if f:
f()

...

Andrew
da...@dalkescientific.com


Timothy Grant

unread,
Jul 23, 2003, 11:35:03 PM7/23/03
to
Thanks to everyone who replied to this request. I'm going to try each method
suggested just for the fun of it and see which fits my thinking patterns the
best.

I appreciate your input!

On Tuesday 22 July 2003 11:02 am, Bob Gailer wrote:
> At 10:22 AM 7/22/2003 -0700, Timothy Grant wrote:
> >I've worked with DOM before, but never SAX, I have a script that seems to
> >work
> >quite well, but every time I look at it I think it's amazingly unweildly
> > and that I must be doing something wrong.
> >

> >def startElement(self, name, attr):
> > if name == "foo":
> > do_foo_stuff()
> > elif name == "bar":
> > do_bar_stuff()
> > elif name == "baz":
> > do_baz_stuff()
> >

> >There's similar code in endElement() and characters() and of course, the
> > more tags that need processing the more unweildly each method becomes.

--

0 new messages