Before ranting and rambling begin, let me say thank you for an
outstanding toolkit and everything else you've contributed to the
XML/Python/RDF community.
If I can be so bold as to summarize the problem, the expression 'x.y'
needs to return something that distinguishes between "not a component",
"not there", and the empty string. When I use "component" I mean
attribute or element in the model, to distinguish it between XML
attributes and Python attributes.
As things stand now, if 'y' is a recognizable component of x, then x.y
should be None to indicate it does not exist and the empty string if it
exists and is empty. If 'y' is not a recognizable component of x, x.y
raises AttributeError.
I've been struggling with knowing if 'x.y' represents following the
attribute axis or the child element axis of a document, and to be able
to direct it to one or the other on demand. It is also a struggle to
deal with XML documents with namespaces, or at least the SVG documents
generated by Microsoft :-).
I've also been experimenting with building derived classes of Element to
provide a list of components, describing them as attributes or elements,
and namespace details if necessary, I'll start a new thread, because
this is too long already :-).
I propose a new component of bindery object, 'xml_schema'. The
xml_schema attribute has isAttribute, isElement, and isUndefined functions:
x = amara.parse("<x/>")
assert x.xml_schema.isUndefined('y')
x = amara.parse("<x y='z'/>")
assert x.xml_schema.isAttribute('y')
x = amara.parse("<x><y>z</y></x>")
assert x.xml_schema.isElement('y')
Note that both isAttribute and isElement can both be true at the same
time, and if that is the case, 'x.y' raises AttributeError, "ambiguous
attribute 'y'", unless the programmer has described how it should be
resolved.
Attributes can be defined, both when they don't already exist, and when
they do:
x = Element()
print x.y # raises AttributeError
x.xml_schema.asAttribute('y')
assert x.y is None # no attribute yet
x.y = 'z'
assert x.xml() == """<x y='z'/>"""
Continuing as a child element:
x.xml_schema.asElement('y') # redefined as an element
assert x.y is None # no child element yet
x.y = 'z'
assert x.xml() == """<x y='z'><y>z</y></x>"""
There would be a similar interface for dealing with qnames,
isAttributeNS, isElementNS, etc:
x = amara.parse("<x/>"
, prefixes={u'spam': u'http://example.com/spam'}
)
x.xml_schema.asElementNS(
u'http://example.com/spam'
, 'eggs'
)
x.spam = u'yum'
assert x.xml() == """<x xmlns:spam='http://example.com/spam'>
<spam:eggs>yum</spam:eggs>
</x>"""
The asAttributeNS and asElementNS have a third optional parameter where
there would be a simple name conflict:
x.xml_schema.asElementNS(
u'http://example.com/spam'
, 'eggs'
, 'spam_eggs'
)
x.xml_schema.asElementNS(
u'http://example.com/bacon'
, 'eggs'
, 'bacon_eggs'
)
Now I can have both x.spam_eggs and x.bacon_eggs in on the same (or
different) axis, and leave x.eggs undefined.
The xml_schema has an 'auto_chain' option which can be turned on or off.
If it is turned on, 'x.y' no longer raises AttributeError if it is
undefined, but returns an instance of a ChainElement object that builds
child element nodes on demand and can be chained to other nodes.
x = amara.parse("<x/>")
x.xml_schema.auto_chain = True
x.y.z = 'eggs'
assert x.xml() == """<x><y><z>eggs</z></y></x>"""
Since x.y in this mode no longer returns None, to test to see if the
component is already there, check the chain for being an instance of a
special class:
if isinstance(x.w.p, amara.ChainElement):
print "not there yet"
Comments?
Joel
I would like to subtly change this to xml_content_model. 'Schema' is
too overloaded.
Joel
> x = amara.parse("<x/>")
> assert x.xml_schema.isUndefined('y')
I realized after a few more thought experiments that this should be:
doc = amara.parse("<x/>")
assert doc.x.xml_content_model.isUndefined('y')
Joel
> I'm still reluctant to use None to indicate it does not exist, because
> None is so prevalent in Python, and I think using it as a blanket
> return for all attributes could interfere with specialization (i.e.
> subclassing, delegation, etc.)
I think it should only return None when it has built some idea that it
is an attribute or element that is appropriate in the context, otherwise
raise AttributeError. This would be impossible to know without a model,
so the default behavior could only prescribed by the instance being parsed.
> Side note: I'll start another thread on how I think people should be
> specializing Amara nodes. It's a tricky topic.
I don't think they should be, and I say that with a wry smile :-). Once
you've started crossing the bridge between object model and
serialization you should go all the way across and not stand in the
middle and jump up and down.
> If I change my mind on None, and I'm not all the way there yet :-),
> you wouldn't get AttributeError any more for any access to a node
> object. You would always get None.
For my Python applications, 'x.y is None' means there is a chance that
x.y is meaningful, but hasn't been specified. I would be delighted if
it raised a ModelContextError error, derived from AttributeError if
you'd like, to make it clear that 'y' can't be resolved in x.xml_model.
> This would make some people happy, but wouldn't go far enough for
> others because if b does not exist, a.b would return None, meaning
> a.b.c would be back to AttributeError...
Make it easy for those that need it to plug just enough information into
the model to clear up the ambiguity:
a.xml_model.b = amara.OptionalElement()
a.xml_model.b.xml_model.c = amara.OptionalAttribute()
Or not make element specific:
doc.xml_model.OptionalElement('b')
> Again, to be sure you at last know what you have at your present
> disposal, the mapping protocol on Amara 1.x nodes is how I generally
> deal with psychotic and neurotic XML docs, such as those MS likes to
> produce.
I've been grinding my way through the functions, perhaps if I switched
to the testing framework for sample use cases I'd have better luck.
> I think the isElement vs isAttribute can be approximated well enough
> by existing facilities, and that isUndefined can be taken care of by
> Python's hasattr(). If we do need an xml_model object for more
> sophisticated needs, though, I guess it's OK to also implement the
> above, but I wonder about TIOOWTDI.
I wouldn't mind using hasattr if I new I was dealing with attributes.
For my own XML library, x.y is only used for attributes and x[n] is only
used for content, but that's because I've been working with markup and
not XML serializations of data. That's changing, since I'm going to be
dealing the XML dumps of Access tables. (gag)
>> Continuing as a child element:
>>
>> x.xml_schema.asElement('y') # redefined as an element
>> assert x.y is None # no child element yet
>
> Why is it not just an empty element at this point?
What do you mean empty? len(x.y) == 0?
> If I wanted a certain attribute converted to an element that is
> probably something I'd want to do with a parse rule. Can you give an
> example use-case for such an on-demand conversion as you suggest?
I've been working with a standards subcommittee on an XML format for
describing "objects". And here is a morass of overlapping terminology
because the original standard uses "objects" and "properties", which was
then mapped into a collection of Web Services, and is now be re-purposed
into a document. The usual "attributes of attributes" and "properties
of attributes" discussions ensue.
And I've been trying to get them to adopt the term "ontology" and
structure the content in RDF, so they can use XML or N-triples or N3,
but they're pretty stuck on the charter, "to produce an XML document..."
Sorry for the venting, I feel better now :-).
The document has things like this:
<AnalogInput display-name="Outside Air Temp">...
But if you wanted to say more about the display name, like the fact that
it was writable, you make a child element:
<AnalogInput>
<display-name value="Outside Air Temp" writable="true">
...
And there are alternative forms that allow for multiple display names in
different languages. I think the committee is moving towards changing
the labels, so 'displayName' is the attribute form and 'DisplayName' is
the child element form.
> Sounds like you're advocating a way to specify the disambiguation
> process. Again isn't this better done upon parse? Yes I agree
> Amara 1.x does not provide such parse rules, but I think we can
> remedy that for 2.x :-)
I agree that not having to provide that is a good thing, if it can be
done. When there's a conflict, which is clearly only in the mind of the
application developer :-), the library should (a) provide some feedback
of how it was resolved, (b) provide a hook to resolve the difference, or
(c) both.
> I think what I'd prefer is a way to register a factory for what is
> created if a user tries to access a non-existent attribute.
Factory is a much better term.
> I'll mull this over some more.
While you're mulling, I've been experimenting with overloading other
operators to make things that look like path expressions:
SVG = Namespace('...') # create a namespace
g = SVG['g'] # map element names into local names
rect = SVG['rect']
x = svg['x'] # map attr name
doc/g/rect.x = 12
It should be a little clearer that g is a parent of rect and x is an
attribute of rect. Likewise len(doc/g/rect) is the number of rect
elements that are children of all g's in the document,
len(doc/g[0]/rect) is the number of rect's in the first group, etc.
I've been inspired by overloading '|' to be pipeline processing and
<<op>> to build search strings. Fun stuff, even if it is considered too
much like C++ for some.
Joel