Generate empty strings instead of AttributeError

3 views

Skip to first unread message

Joel Bender

unread,

Mar 25, 2008, 1:20:00 PM3/25/08

to amara...@googlegroups.com

Uche, et. al.,

Before ranting and rambling begin, let me say thank you for an
outstanding toolkit and everything else you've contributed to the
XML/Python/RDF community.

If I can be so bold as to summarize the problem, the expression 'x.y'
needs to return something that distinguishes between "not a component",
"not there", and the empty string. When I use "component" I mean
attribute or element in the model, to distinguish it between XML
attributes and Python attributes.

As things stand now, if 'y' is a recognizable component of x, then x.y
should be None to indicate it does not exist and the empty string if it
exists and is empty. If 'y' is not a recognizable component of x, x.y
raises AttributeError.

I've been struggling with knowing if 'x.y' represents following the
attribute axis or the child element axis of a document, and to be able
to direct it to one or the other on demand. It is also a struggle to
deal with XML documents with namespaces, or at least the SVG documents
generated by Microsoft :-).

I've also been experimenting with building derived classes of Element to
provide a list of components, describing them as attributes or elements,
and namespace details if necessary, I'll start a new thread, because
this is too long already :-).

I propose a new component of bindery object, 'xml_schema'. The
xml_schema attribute has isAttribute, isElement, and isUndefined functions:

x = amara.parse("<x/>")
assert x.xml_schema.isUndefined('y')

x = amara.parse("<x y='z'/>")
assert x.xml_schema.isAttribute('y')

x = amara.parse("<x><y>z</y></x>")
assert x.xml_schema.isElement('y')

Note that both isAttribute and isElement can both be true at the same
time, and if that is the case, 'x.y' raises AttributeError, "ambiguous
attribute 'y'", unless the programmer has described how it should be
resolved.

Attributes can be defined, both when they don't already exist, and when
they do:

x = Element()
print x.y # raises AttributeError

x.xml_schema.asAttribute('y')
assert x.y is None # no attribute yet
x.y = 'z'
assert x.xml() == """<x y='z'/>"""

Continuing as a child element:

x.xml_schema.asElement('y') # redefined as an element
assert x.y is None # no child element yet
x.y = 'z'
assert x.xml() == """<x y='z'><y>z</y></x>"""

There would be a similar interface for dealing with qnames,
isAttributeNS, isElementNS, etc:

x = amara.parse("<x/>"
, prefixes={u'spam': u'http://example.com/spam'}
)
x.xml_schema.asElementNS(
u'http://example.com/spam'
, 'eggs'
)

x.spam = u'yum'
assert x.xml() == """<x xmlns:spam='http://example.com/spam'>
<spam:eggs>yum</spam:eggs>
</x>"""

The asAttributeNS and asElementNS have a third optional parameter where
there would be a simple name conflict:

x.xml_schema.asElementNS(
u'http://example.com/spam'
, 'eggs'
, 'spam_eggs'
)

x.xml_schema.asElementNS(
u'http://example.com/bacon'
, 'eggs'
, 'bacon_eggs'
)

Now I can have both x.spam_eggs and x.bacon_eggs in on the same (or
different) axis, and leave x.eggs undefined.

The xml_schema has an 'auto_chain' option which can be turned on or off.
If it is turned on, 'x.y' no longer raises AttributeError if it is
undefined, but returns an instance of a ChainElement object that builds
child element nodes on demand and can be chained to other nodes.

x = amara.parse("<x/>")
x.xml_schema.auto_chain = True
x.y.z = 'eggs'
assert x.xml() == """<x><y><z>eggs</z></y></x>"""

Since x.y in this mode no longer returns None, to test to see if the
component is already there, check the chain for being an instance of a
special class:

if isinstance(x.w.p, amara.ChainElement):
print "not there yet"

Comments?

Joel

Joel Bender

unread,

Mar 26, 2008, 8:15:26 AM3/26/08

to amara...@googlegroups.com

> I propose a new component of bindery object, 'xml_schema'.

I would like to subtly change this to xml_content_model. 'Schema' is
too overloaded.

Joel

Uche Ogbuji

unread,

Mar 26, 2008, 10:52:57 PM3/26/08

to Amara

Wow. What a very thoughtful post. I was tempted to mull on it for a
day or two, but first of all, I know that knowing how things go for
me, I'd probably never get back to it. Secondly, I think my immediate
reaction to your proposals is important, and that if we all keep an
open mind, an iterative discussion will bear a lot of fruit. So don't
take what I say in this message as my settled opinion :-)

On Mar 25, 11:20 am, Joel Bender <j...@cornell.edu> wrote:
> Uche, et. al.,
>
> Before ranting and rambling begin, let me say thank you for an
> outstanding toolkit and everything else you've contributed to the
> XML/Python/RDF community.
>
> If I can be so bold as to summarize the problem, the expression 'x.y'
> needs to return something that distinguishes between "not a component",
> "not there", and the empty string. When I use "component" I mean
> attribute or element in the model, to distinguish it between XML
> attributes and Python attributes.

Good summary, and I'll use it for the Wiki page except that I'll use
the more standard XML term "information item".

> As things stand now, if 'y' is a recognizable component of x, then x.y
> should be None to indicate it does not exist and the empty string if it
> exists and is empty. If 'y' is not a recognizable component of x, x.y
> raises AttributeError.

I'm still reluctant to use None to indicate it does not exist, because
None is so prevalent in Python, and I think using it as a blanket
return for all attributes could interfere with specialization (i.e.
subclassing, delegation, etc.)

Side note: I'll start another thread on how I think people should be
specializing Amara nodes. It's a tricky topic.

Anyway, I am wavering a bit in my resistance to None, I think. After
all, if someone specializes Amara nodes, the code they write will
already know how to handle the specialized behavior, even if it uses
None differently from Amara core. Meanwhile for generic code, the
specialized use of None would be no more confusing than any other
specialized behavior.

If I change my mind on None, and I'm not all the way there yet :-),
you wouldn't get AttributeError any more for any access to a node
object. You would always get None. This would make some people
happy, but wouldn't go far enough for others because if b does not
exist, a.b would return None, meaning a.b.c would be back to
AttributeError, since None is a singleton with no "c" attribute. More
on that later.

> I've been struggling with knowing if 'x.y' represents following the
> attribute axis or the child element axis of a document,

There are methods and properties right now to help you here.
node.xml_attributes and node.xml_child_elements (See http://wiki.xml3k.org/Amara/QuickRef
). I'd like to focus on use cases not covered by these, so please
post any thoughts along those lines.

> and to be able
> to direct it to one or the other on demand.

Yes, there is no way to do this in Amara 1.x.

> It is also a struggle to
> deal with XML documents with namespaces, or at least the SVG documents
> generated by Microsoft :-).

Again, to be sure you at last know what you have at your present
disposal, the mapping protocol on Amara 1.x nodes is how I generally
deal with psychotic and neurotic XML docs, such as those MS likes to
produce. See the "Namespaces" section of the quick ref. I see the
quick ref omits the fact that you can also use this mapping protocol
to force it to check attributes versus elements.

> I propose a new component of bindery object, 'xml_schema'. The
> xml_schema attribute has isAttribute, isElement, and isUndefined functions:

Your suggested "xml_content_model" is a better name, but it's more
than just content, so I'd just say "xml_model". This sounds like what
I've been thinking about for a way to capture XML model info from
RELAX NG, or to allow users to create it on the fly. Well you do go
farther than I would :-) But let's explore...

> x = amara.parse("<x/>")
> assert x.xml_schema.isUndefined('y')
>
> x = amara.parse("<x y='z'/>")
> assert x.xml_schema.isAttribute('y')
>
> x = amara.parse("<x><y>z</y></x>")
> assert x.xml_schema.isElement('y')
>
> Note that both isAttribute and isElement can both be true at the same
> time, and if that is the case, 'x.y' raises AttributeError, "ambiguous
> attribute 'y'", unless the programmer has described how it should be
> resolved.

I think the isElement vs isAttribute can be approximated well enough
by existing facilities, and that isUndefined can be taken care of by
Python's hasattr(). If we do need an xml_model object for more
sophisticated needs, though, I guess it's OK to also implement the
above, but I wonder about TIOOWTDI.

> Attributes can be defined, both when they don't already exist, and when
> they do:
>
> x = Element()
> print x.y # raises AttributeError
>
> x.xml_schema.asAttribute('y')
> assert x.y is None # no attribute yet
> x.y = 'z'
> assert x.xml() == """<x y='z'/>"""

You can use set_attribute for this now.

> Continuing as a child element:
>
> x.xml_schema.asElement('y') # redefined as an element
> assert x.y is None # no child element yet

Why is it not just an empty element at this point?

> x.y = 'z'
> assert x.xml() == """<x y='z'><y>z</y></x>"""

If I wanted a certain attribute converted to an element that is
probably something I'd want to do with a parse rule. Can you give an
example use-case for such an on-demand conversion as you suggest?

> There would be a similar interface for dealing with qnames,
> isAttributeNS, isElementNS, etc:
>
> x = amara.parse("<x/>"
> , prefixes={u'spam': u'http://example.com/spam'}
> )
> x.xml_schema.asElementNS(
> u'http://example.com/spam'
> , 'eggs'
> )

You can also do this by creating a new element normally. And you can
change a local name on an existing element by simply setting
x.eggs.localName = u'neweggs'

> x.spam = u'yum'
> assert x.xml() == """<x xmlns:spam='http://example.com/spam'>
> <spam:eggs>yum</spam:eggs>
> </x>"""
>
> The asAttributeNS and asElementNS have a third optional parameter where
> there would be a simple name conflict:
>
> x.xml_schema.asElementNS(
> u'http://example.com/spam'
> , 'eggs'
> , 'spam_eggs'
> )
>
> x.xml_schema.asElementNS(
> u'http://example.com/bacon'
> , 'eggs'
> , 'bacon_eggs'
> )
>
> Now I can have both x.spam_eggs and x.bacon_eggs in on the same (or
> different) axis, and leave x.eggs undefined.

Not sure I fully follow the above. Sounds like you're advocating a
way to specify the disambiguation process. Again isn't this better
done upon parse? Yes I agree Amara 1.x does not provide such parse
rules, but I think we can remedy that for 2.x :-)

> The xml_schema has an 'auto_chain' option which can be turned on or off.
> If it is turned on, 'x.y' no longer raises AttributeError if it is
> undefined, but returns an instance of a ChainElement object that builds
> child element nodes on demand and can be chained to other nodes.
>
> x = amara.parse("<x/>")
> x.xml_schema.auto_chain = True
> x.y.z = 'eggs'
> assert x.xml() == """<x><y><z>eggs</z></y></x>"""

Nice line of thinking. Again I kind of think that specifying such
"chaining" behavior makes more sense at parse than at runtime.

I think what I'd prefer is a way to register a factory for what is
created if a user tries to access a non-existent attribute. That way
you can specify anything you want, even something you decide to call
"ChainElement" :-)

I'll mull this over some more.

> Since x.y in this mode no longer returns None, to test to see if the
> component is already there, check the chain for being an instance of a
> special class:
>
> if isinstance(x.w.p, amara.ChainElement):
> print "not there yet"

Again hasattr() should be good enough for this case.

Again thanks for these thoughtful contributions. One way or another I
want to be sure I address your end use-cases. Let's keep this
discussion going.

Joel Bender

unread,

Mar 27, 2008, 9:07:32 AM3/27/08

to amara...@googlegroups.com

A minor point, when I wrote:

> x = amara.parse("<x/>")
> assert x.xml_schema.isUndefined('y')

I realized after a few more thought experiments that this should be:

doc = amara.parse("<x/>")
assert doc.x.xml_content_model.isUndefined('y')

Joel

Joel Bender

unread,

Apr 2, 2008, 12:07:29 PM4/2/08

to amara...@googlegroups.com

Uche,

> I'm still reluctant to use None to indicate it does not exist, because
> None is so prevalent in Python, and I think using it as a blanket
> return for all attributes could interfere with specialization (i.e.
> subclassing, delegation, etc.)

I think it should only return None when it has built some idea that it
is an attribute or element that is appropriate in the context, otherwise
raise AttributeError. This would be impossible to know without a model,
so the default behavior could only prescribed by the instance being parsed.

> Side note: I'll start another thread on how I think people should be
> specializing Amara nodes. It's a tricky topic.

I don't think they should be, and I say that with a wry smile :-). Once
you've started crossing the bridge between object model and
serialization you should go all the way across and not stand in the
middle and jump up and down.

> If I change my mind on None, and I'm not all the way there yet :-),
> you wouldn't get AttributeError any more for any access to a node
> object. You would always get None.

For my Python applications, 'x.y is None' means there is a chance that
x.y is meaningful, but hasn't been specified. I would be delighted if
it raised a ModelContextError error, derived from AttributeError if
you'd like, to make it clear that 'y' can't be resolved in x.xml_model.

> This would make some people happy, but wouldn't go far enough for
> others because if b does not exist, a.b would return None, meaning

> a.b.c would be back to AttributeError...

Make it easy for those that need it to plug just enough information into
the model to clear up the ambiguity:

a.xml_model.b = amara.OptionalElement()
a.xml_model.b.xml_model.c = amara.OptionalAttribute()

Or not make element specific:

doc.xml_model.OptionalElement('b')

> Again, to be sure you at last know what you have at your present
> disposal, the mapping protocol on Amara 1.x nodes is how I generally
> deal with psychotic and neurotic XML docs, such as those MS likes to
> produce.

I've been grinding my way through the functions, perhaps if I switched
to the testing framework for sample use cases I'd have better luck.

> I think the isElement vs isAttribute can be approximated well enough
> by existing facilities, and that isUndefined can be taken care of by
> Python's hasattr(). If we do need an xml_model object for more
> sophisticated needs, though, I guess it's OK to also implement the
> above, but I wonder about TIOOWTDI.

I wouldn't mind using hasattr if I new I was dealing with attributes.
For my own XML library, x.y is only used for attributes and x[n] is only
used for content, but that's because I've been working with markup and
not XML serializations of data. That's changing, since I'm going to be
dealing the XML dumps of Access tables. (gag)

>> Continuing as a child element:
>>
>> x.xml_schema.asElement('y') # redefined as an element
>> assert x.y is None # no child element yet
>
> Why is it not just an empty element at this point?

What do you mean empty? len(x.y) == 0?

> If I wanted a certain attribute converted to an element that is
> probably something I'd want to do with a parse rule. Can you give an
> example use-case for such an on-demand conversion as you suggest?

I've been working with a standards subcommittee on an XML format for
describing "objects". And here is a morass of overlapping terminology
because the original standard uses "objects" and "properties", which was
then mapped into a collection of Web Services, and is now be re-purposed
into a document. The usual "attributes of attributes" and "properties
of attributes" discussions ensue.

And I've been trying to get them to adopt the term "ontology" and
structure the content in RDF, so they can use XML or N-triples or N3,
but they're pretty stuck on the charter, "to produce an XML document..."

Sorry for the venting, I feel better now :-).

The document has things like this:

<AnalogInput display-name="Outside Air Temp">...

But if you wanted to say more about the display name, like the fact that
it was writable, you make a child element:

<AnalogInput>
<display-name value="Outside Air Temp" writable="true">
...

And there are alternative forms that allow for multiple display names in
different languages. I think the committee is moving towards changing
the labels, so 'displayName' is the attribute form and 'DisplayName' is
the child element form.

> Sounds like you're advocating a way to specify the disambiguation
> process. Again isn't this better done upon parse? Yes I agree
> Amara 1.x does not provide such parse rules, but I think we can
> remedy that for 2.x :-)

I agree that not having to provide that is a good thing, if it can be
done. When there's a conflict, which is clearly only in the mind of the
application developer :-), the library should (a) provide some feedback
of how it was resolved, (b) provide a hook to resolve the difference, or
(c) both.

> I think what I'd prefer is a way to register a factory for what is
> created if a user tries to access a non-existent attribute.

Factory is a much better term.

> I'll mull this over some more.

While you're mulling, I've been experimenting with overloading other
operators to make things that look like path expressions:

SVG = Namespace('...') # create a namespace

g = SVG['g'] # map element names into local names
rect = SVG['rect']
x = svg['x'] # map attr name

doc/g/rect.x = 12

It should be a little clearer that g is a parent of rect and x is an
attribute of rect. Likewise len(doc/g/rect) is the number of rect
elements that are children of all g's in the document,
len(doc/g[0]/rect) is the number of rect's in the first group, etc.

I've been inspired by overloading '|' to be pipeline processing and
<<op>> to build search strings. Fun stuff, even if it is considered too
much like C++ for some.