Why is there no functional xml?

Rustom Mody

unread,

Jan 20, 2018, 11:06:29 AM1/20/18

to

Looking around for how to create (l)xml one sees typical tutorials like this:

https://www.blog.pythonlibrary.org/2013/04/30/python-101-intro-to-xml-parsing-with-elementtree/

Given the requirement to build up this xml:
<zAppointments reminder="15">
<appointment>
<begin>1181251680</begin>
<uid>040000008200E000</uid>
<alarmTime>1181572063</alarmTime>
<state></state>
<location></location>
<duration>1800</duration>
<subject>Bring pizza home</subject>
</appointment>
</zAppointments>

the way I would rather do it is thus:

[Note in actual practice the 'contents' such as 1181251680 etc would come
from suitable program variables/function-calls
]

ex = Ea("zAppointments", {'reminder':'15'},
E("appointment",
En("begin", 1181251680),
Et("uid", "040000008200E000"),
En("alarmTime", 1181572063),
E("state"),
E("location"),
En("duration",1800),
Et("subject", "Bring pizza home")))

with the following obvious definitions:

[The function names are short so that the above becomes correspondingly readable]

from lxml.etree import Element

def Ea(tag, attrib=None, *subnodes):
"xml node constructor"
root = Element(tag, attrib)
for n in subnodes:
root.append(n)
return root

def E(tag, *subnodes):
"Like E but without attributes"
root = Element(tag)
for n in subnodes:
root.append(n)
return root

def Et(tag, text):
"A pure text node"
root = E(tag)
root.text = text
return root

def En(tag, text):
"A node containing a integer"
root = E(tag)
root.text = str(text)
return root

This approach seems so obvious that I find it hard to believe its not there somewhere…
Am I missing something??

Lawrence D’Oliveiro

unread,

Jan 20, 2018, 7:08:43 PM1/20/18

to

On Sunday, January 21, 2018 at 5:06:29 AM UTC+13, Rustom Mody wrote:
> This approach seems so obvious that I find it hard to believe its not
> there somewhere…

Even better, here is a convenience routine I wrote to help with generating invoices as ODF files (using odfpy) in my own time-and-billing system:

def link_element(construct, attrs = {}, parent = None, children = None) :
"""convenience routine for building ODF structures. construct is the
constructor to call, and attrs is a dictionary of keyword arguments
to be supplied to it. children is a tuple of elements to be attached
to the resulting object by calling its addElement method, and parent
if not None, will have its addElement method called to add the new
object as a child. In any case, the new object is returned as the
function result."""
new_elt = construct(**attrs)
if children != None :
for child in children :
new_elt.addElement(child)
#end for
#end if
if parent != None :
parent.addElement(new_elt)
#end if
return new_elt
#end link_element

This lets me write calls that closely mirror the actual XML structure I am trying to create, e.g.

link_element \
(
construct = odf.text.P,
attrs = dict
(
stylename = link_element
(
construct = odf.style.Style,
attrs = dict(name = "work header", family = "paragraph"),
parent = self.doc.automaticstyles,
children =
(
odf.style.TextProperties(fontweight = "bold"),
link_element
(
construct = odf.style.ParagraphProperties,
children =
(
make_tab_stops((dict(position = "15.1cm"),)),
)
),
)
)
),
parent = self.doc.text,
children =
(
odf.text.Span(text = "Description of Work"),
odf.text.Tab(),
odf.text.Span(text = "Charge"),
)
)

Peter Otten

unread,

Jan 21, 2018, 6:21:34 AM1/21/18

to

lxml.objectify?

>>> from lxml import etree
>>> from lxml.objectify import E
>>> appointments = E.appointments(
... E.appointment(
... E.begin(1181251680),
... E.uid("040000008200E000"),
... E.alarmTime(1181572063),
... E.state(),
... E.location(),
... E.duration(1800),
... E.subject("Bring pizza home")
... ),
... reminder="15"
... )
>>> print(etree.tostring(appointments, pretty_print=True, encoding=str))
<appointments xmlns:py="http://codespeak.net/lxml/objectify/pytype"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" reminder="15">
<appointment>
<begin py:pytype="int">1181251680</begin>
<uid py:pytype="str">040000008200E000</uid>
<alarmTime py:pytype="int">1181572063</alarmTime>
<state/>
<location/>
<duration py:pytype="int">1800</duration>
<subject py:pytype="str">Bring pizza home</subject>
</appointment>
</appointments>

>>>

Personally I'd probably avoid the extra layer and write a function that
directly maps dataclasses or database records to xml using the conventional
elementtree API.

Rustom Mody

unread,

Jan 21, 2018, 10:24:57 AM1/21/18

to

Nice!
I almost liked it… Then noticed that the attribute dict is in the wrong place
Here's the same without any namespace malarky:

>>> E = objectify.ElementMaker(annotate=False,namespace=None,nsmap=None)
>>> appointments = E.appointments(
E.appointment(
E.begin(1181251680),
E.uid("040000008200E000"),
E.alarmTime(1181572063),
E.state(),
E.location(),
E.duration(1800),

E.subject("Bring pizza home")

),
reminder="15"
)

>>> print(et.tostring(appointments,pretty_print=1).decode('ascii'))
<appointments reminder="15">

<duration>1800</duration>
<subject>Bring pizza home</subject>
</appointment>

</appointments>

>>>

Its obviously easier in python to put optional/vararg parameters on the
right side rather than on the left of a parameter list.
But its not impossible to get it in the desired order — one just has to
'hand-parse' the parameter list received as a *param
Thusly:

appointments = E.appointments(
{"reminder":"15"},
E.appointment(
E.begin(1181251680),
E.uid("040000008200E000"),
E.alarmTime(1181572063),
E.state(),
E.location(),
E.duration(1800),

E.subject("Bring pizza home")

)
)

>
> Personally I'd probably avoid the extra layer and write a function that
> directly maps dataclasses or database records to xml using the conventional
> elementtree API.

I dont understand…
[I find the OO/imperative style of making a half-done node and then throwing
piece-by-piece of contents in/at it highly aggravating]

Rustom Mody

unread,

Jan 22, 2018, 10:53:48 PM1/22/18

to

On Sunday, January 21, 2018 at 4:51:34 PM UTC+5:30, Peter Otten wrote:

> Personally I'd probably avoid the extra layer and write a function that
> directly maps dataclasses or database records to xml using the conventional
> elementtree API.

Would appreciate your thoughts/comments Peter!

I find that you can get 'E' from lxml.objectify as well as lxml.builder
builder seems better in that its at least sparsely documented
objectify seems to have almost nothing beyond the original David Mertz' docs

builder.E seems to do what objectify.E does modulo namespaces

builder.E and objectify.E produce types that are different and look backwards
(at least to me — Elementbase is less base than _Element)

You seem to have some reservation against objectify, preferring the default
Element — I'd like to know what

Insofar as builder seems to produce the same type as Element unlike objectify
which seems to be producing a grandchild type, do you have the same reservations
against builder.E?
--------------
$ python3
Python 3.5.3 (default, Nov 23 2017, 11:34:05)
[GCC 6.3.0 20170406] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from lxml.etree import Element, tostring
>>> from lxml.builder import E as Eb
>>> from lxml.objectify import E as Eo

>>> e = Element("tag")
>>> tostring(e)
b'<tag/>'
>>> o = Eb.tag()
>>> o
<Element tag at 0x7f058406ec08>
>>> tostring(o)
b'<tag/>'
>>> o = Eo.tag()
>>> tostring(o)
b'<tag xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>'
>>> b = Eb.tag()
>>> tostring(b)
b'<tag/>'
>>> type(b)
<class 'lxml.etree._Element'>
>>> type(b).__bases__
(<class 'object'>,)
>>> type(e)
<class 'lxml.etree._Element'>
>>> type(o)
<class 'lxml.objectify.ObjectifiedElement'>
>>> type(o).__bases__
(<class 'lxml.etree.ElementBase'>,)
>>> type(o).__bases__[0].__bases__
(<class 'lxml.etree._Element'>,)
>>>

--------------

Peter Otten

unread,

Jan 23, 2018, 9:53:43 AM1/23/18

to

Rustom Mody wrote:

> Its obviously easier in python to put optional/vararg parameters on the
> right side rather than on the left of a parameter list.
> But its not impossible to get it in the desired order — one just has to
> 'hand-parse' the parameter list received as a *param
> Thusly:

> appointments = E.appointments(
> {"reminder":"15"},
> E.appointment(
> E.begin(1181251680),
> E.uid("040000008200E000"),
> E.alarmTime(1181572063),
> E.state(),
> E.location(),
> E.duration(1800),
> E.subject("Bring pizza home")
> )
> )

Let's not waste too much effort on minor aesthic improvements ;)

>> Personally I'd probably avoid the extra layer and write a function that
>> directly maps dataclasses or database records to xml using the
>> conventional elementtree API.
>
> I dont understand…
> [I find the OO/imperative style of making a half-done node and then
> [throwing
> piece-by-piece of contents in/at it highly aggravating]

What I meant is that once you throw a bit of introspection at it much of the
tedium vanishes. Here's what might become of the DOM-creation as part of an
actual script:

import xml.etree.ElementTree as xml
from collections import namedtuple

Appointment = namedtuple(
"Appointment",
"begin uid alarmTime state location duration subject"
)

appointments = [
Appointment(
begin=1181251680,
uid="040000008200E000",
alarmTime=1181572063,
state=None,
location=None,
duration=1800,
subject="Bring pizza home"
)
]

def create_dom(appointments, reminder):
node = xml.Element("zAppointments", reminder=str(reminder))
for appointment in appointments:
for name, value in zip(appointment._fields, appointment):
child = xml.SubElement(node, name)
if value is not None:
child.text = str(value)
return node

with open("appt.xml", "wb") as outstream:
root = create_dom(appointments, 15)
xml.ElementTree(root).write(outstream)

To generalize that to handle arbitrarily nested lists and namedtuples a bit
more effort is needed, but I can't see where lxml.objectify could make that
much easier.

Peter Otten

unread,

Jan 23, 2018, 10:04:27 AM1/23/18

to

Rustom Mody wrote:

> On Sunday, January 21, 2018 at 4:51:34 PM UTC+5:30, Peter Otten wrote:
>> Personally I'd probably avoid the extra layer and write a function that
>> directly maps dataclasses or database records to xml using the
>> conventional elementtree API.
>
> Would appreciate your thoughts/comments Peter!
>
> I find that you can get 'E' from lxml.objectify as well as lxml.builder
> builder seems better in that its at least sparsely documented
> objectify seems to have almost nothing beyond the original David Mertz'
> docs
>
> builder.E seems to do what objectify.E does modulo namespaces
>
> builder.E and objectify.E produce types that are different and look
> backwards (at least to me — Elementbase is less base than _Element)
>
> You seem to have some reservation against objectify, preferring the
> default Element — I'd like to know what

While I don't have any actual experience with it, my gut feeling is that it
simplifies something that is superfluous to begin with.

> Insofar as builder seems to produce the same type as Element unlike
> objectify which seems to be producing a grandchild type, do you have the
> same reservations against builder.E?

If I understand you correctly you are talking about implementation details.
Unfortunately I cannot comment on these -- I really just remembered
objectify because of the catchy name...

Rustom Mody

unread,

Jan 23, 2018, 10:51:12 PM1/23/18

to

On Tuesday, January 23, 2018 at 8:23:43 PM UTC+5:30, Peter Otten wrote:
> Rustom Mody wrote:

> > [I find the OO/imperative style of making a half-done node and then
> > [throwing
> > piece-by-piece of contents in/at it highly aggravating]
>
> What I meant is that once you throw a bit of introspection at it much of the
> tedium vanishes. Here's what might become of the DOM-creation as part of an
> actual script:

«snipped named-tuple magic»

> To generalize that to handle arbitrarily nested lists and namedtuples a bit
> more effort is needed, but I can't see where lxml.objectify could make that
> much easier.

You really mean that??
Well sure in the programming world and even more so in the python world
“Flat is better than nested” is a maxim

But equally programmers need to satisfy requirements…

And right now I am seeing things like this
-----------------------------------------------
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<locns:blobRetrieveResponse xmlns:locns="http://example.com/">
<REQUEST>
<TYPE>
AGGREGATION_ANALYSIS
</TYPE>
<DATASTREAM_COLLECTION>
<DATASTREAM ID="ad907z9e-982c-4491-bc26-75bf96c0d59d">
<FIELDINFO FIELD="Date Stamp" FIELDTYPE="DATE" FIELDFORMAT="m/d/y" />
<FIELDINFO FIELD="Transaction Amount" FIELDTYPE="MONEY" LOCALE="en-US"/>
</DATASTREAM>
</DATASTREAM_COLLECTION>
<COMPUTATION_COLLECTION>
<COMPUTATION ALGORITHM="VARIANCE"
DATASTREAM="ad907z9e-982c-4291-bc26-75bf96c0d59d"
FIELD="Transaction Amount"
RESULT="6ef0ce23-6637-4cb7-974c-3973d5a58942" />
<COMPUTATION ALGORITHM="MEDIAN"
DATASTREAM="ad907z9e-982c-4291-bc26-75bf96c0d59d"
FIELD="Date Stamp"
RESULT="6ef0ce23-6637-4cb7-974c-3973d5a58942" />
<COMPUTATION ALGORITHM="AVERAGE"
DATASTREAM="ad907z9e-982c-4291-bc26-75bf96c0d59d"
FIELD="Transaction Amount"
RESULT="6ef0ce23-6637-4cb7-974c-3973d5a58942" />
</COMPUTATION_COLLECTION>
<DATA_COLLECTION>
<DATA DATASTREAM="ad907c9e-982c-4491-bc26-45af96c0d59d" MODE="BLOB" TYPE="CSV">
«Big base64 blob»
</DATA>
</DATA_COLLECTION>
</REQUEST>
</locns:blobRetrieveResponse>
</soap:Body>
</soap:Envelope>

-----------------------------------------------
Thats 7 levels of nesting (assuming I can count right!)

Speaking of which another followup question:

With
# Read above xml
>>> with open('soap_response.xml') as f: inp = etree.parse(f)
# namespace dict
>>> nsd = {'soap': "http://schemas.xmlsoap.org/soap/envelope/", 'locns': "http://example.com/"}

The following behavior is observed — actual responses elided in the interest of
brevity

>>> inp.xpath('//soap:Body', namespaces = nsd)
finds/reaches the node

>>> inp.xpath('//locns:blobRetrieveResponse', namespaces = nsd)
finds

>>> inp.xpath('//locns:dtCreationDate', namespaces = nsd)
does not find

>>> inp.xpath('//dtCreationDate', namespaces = nsd)
finds

>>> inp.xpath('//dtCreationDate')
also finds

Doesnt this contradict the fact that dtCreationDate is under the locns namespace??

Any explanations??

Peter Otten

unread,

Jan 24, 2018, 3:43:45 AM1/24/18

to

Rustom Mody wrote:

>> To generalize that to handle arbitrarily nested lists and namedtuples a
>> bit more effort is needed, but I can't see where lxml.objectify could
>> make that much easier.
>
> You really mean that??
> Well sure in the programming world and even more so in the python world
> “Flat is better than nested” is a maxim
>
> But equally programmers need to satisfy requirements…
>
> And right now I am seeing things like this
> -----------------------------------------------
> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

Hm, what happens if you throw a dedicated library at the problem?
Google found zeep

http://docs.python-zeep.org/en/master/datastructures.html

Peter Otten

unread,

Jan 24, 2018, 4:01:22 AM1/24/18

to

Rustom Mody wrote:

> With
> # Read above xml
>>>> with open('soap_response.xml') as f: inp = etree.parse(f)
> # namespace dict
>>>> nsd = {'soap': "http://schemas.xmlsoap.org/soap/envelope/", 'locns':
>>>> "http://example.com/"}
>
> The following behavior is observed — actual responses elided in the
> interest of brevity
>
>>>> inp.xpath('//soap:Body', namespaces = nsd)
> finds/reaches the node
>
>>>> inp.xpath('//locns:blobRetrieveResponse', namespaces = nsd)
> finds
>
>>>> inp.xpath('//locns:dtCreationDate', namespaces = nsd)
> does not find
>
>>>> inp.xpath('//dtCreationDate', namespaces = nsd)
> finds
>
>>>> inp.xpath('//dtCreationDate')
> also finds
>
>
> Doesnt this contradict the fact that dtCreationDate is under the locns
> namespace??
>
> Any explanations??

Can you rewrite that question as a simple self-contained demo, similar to
the snippet shown under

http://lxml.de/xpathxslt.html#namespaces-and-prefixes

?

dieter

unread,

Jan 25, 2018, 2:13:57 AM1/25/18

to

> Rustom Mody wrote:
>
>> With
>> # Read above xml
>>>>> with open('soap_response.xml') as f: inp = etree.parse(f)
>> # namespace dict
>>>>> nsd = {'soap': "http://schemas.xmlsoap.org/soap/envelope/", 'locns':
>>>>> "http://example.com/"}
>>
>> The following behavior is observed — actual responses elided in the
>> interest of brevity
>>
>>>>> inp.xpath('//soap:Body', namespaces = nsd)
>> finds/reaches the node
>>
>>>>> inp.xpath('//locns:blobRetrieveResponse', namespaces = nsd)
>> finds
>>
>>>>> inp.xpath('//locns:dtCreationDate', namespaces = nsd)
>> does not find
>>
>>>>> inp.xpath('//dtCreationDate', namespaces = nsd)
>> finds
>>
>>>>> inp.xpath('//dtCreationDate')
>> also finds
>>
>>
>> Doesnt this contradict the fact that dtCreationDate is under the locns
>> namespace??

Apparently, "dtCreationDate" is not associated with the
namespace corresponding to the "locns" namespace.

Note, that the namespace association is not by default inherited
by child elements -- as least not with stand alone XML. Thus,
if you have e.g.

<nspref:parent><child/></nspref:parent>

then the element "child" does not belong to the namespace indicated
by "nspref" but to the "default namespace".

An XML schema can change this default. However, to get such an
XML schema effective, you must specify this wish when you are
parsing your XML document. Otherwise, your XML document is parsed
as a "stand alone" XML and the rules of the XML-namespace standard
apply -- which means, that the namespace association is not inherited
to child elements.

Rustom Mody

unread,

Jan 25, 2018, 6:40:49 AM1/25/18

to

I guess Dieter has cleared [thanks Dieter] that namespaces dont inherit to child
tags. I need to wrap my head around the concepts and the syntax