Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

lxml.etree, namespaces and element insertion

385 views
Skip to first unread message

hein

unread,
Jan 27, 2011, 1:16:09 PM1/27/11
to

The other day i was processing an xml tree using lxml.etree. That tree
contained a namespace. During that processing i inserted an element.
Later on
i tried to find that element using xpath. And to my suprise that
element was
not found! Maybe my suprise is just the result of my marginal
knowledge about
xml namespaces.

After some examination i found out that the reason for not finding the
element
was that i did not supply a namespace when inserting the element.

This behavior can be reproduced be the folowing code:

<code>
from lxml import etree
import sys

print "Python: ", sys.version_info
print "lxml.etree:", etree.__version__


string_data = [
'<root xmlns="NameSpace.com"><sometag/></root>',
'<root xmlns="NameSpace.com"></root>',
'<root xmlns="NameSpace.com"></root>'
]

trees = map(etree.fromstring, string_data)

print "\n Before insertion:"
for t in trees:
print etree.tostring(t, pretty_print=True)

trees[1].insert(-1, etree.Element("sometag"))
trees[2].insert(-1, etree.Element("{NameSpace.com}sometag",
nsmap={None : "NameSpace.com"}))

print "\n After insertion:"
for t in trees:
print etree.tostring(t, pretty_print=True)

print "\n Using xpath:"
for t in trees:
elements = t.xpath("//ns:sometag", namespaces={'ns':
'NameSpace.com'})
print len(elements),
if elements:
print [e.tag for e in elements]
else:
print elements
</code>

Its output is:

<output>
Python: (2, 6, 6, 'final', 0)
lxml.etree: 2.2.8

Before insertion:
<root xmlns="NameSpace.com">
<sometag/>
</root>

<root xmlns="NameSpace.com"/>

<root xmlns="NameSpace.com"/>


After insertion:
<root xmlns="NameSpace.com">
<sometag/>
</root>

<root xmlns="NameSpace.com">
<sometag/>
</root>

<root xmlns="NameSpace.com">
<sometag/>
</root>


Using xpath:
1 ['{NameSpace.com}sometag']
0 []
1 ['{NameSpace.com}sometag']
</output>

So my suprise was that in the second case the xpath result is an empty
list.

I have two questions on this:

- Is what i am seeing expected behavior?
- How am i supposed to detect a missing namespace, if there are no
differences
in the serialized representation? (That's what i initially used to
debug the
problem.)

Stefan Behnel

unread,
Jan 27, 2011, 2:33:46 PM1/27/11
to pytho...@python.org
hein, 27.01.2011 19:16:

> The other day i was processing an xml tree using lxml.etree. That tree
> contained a namespace. During that processing i inserted an element.
> Later on
> i tried to find that element using xpath. And to my suprise that
> element was
> not found! Maybe my suprise is just the result of my marginal
> knowledge about xml namespaces.
>
> After some examination i found out that the reason for not finding the
> element
> was that i did not supply a namespace when inserting the element.
> [...]

> I have two questions on this:
>
> - Is what i am seeing expected behavior?

Yes. It's a common problem for new users, though.


> - How am i supposed to detect a missing namespace, if there are no
> differences
> in the serialized representation? (That's what i initially used to
> debug the problem.)

This is a known problem of XML namespaces, which were only designed as an
add-on to XML after the fact. The only advice I can give: be careful with
the default namespace.

Stefan

Neil Cerutti

unread,
Jan 27, 2011, 4:06:49 PM1/27/11
to
On 2011-01-27, hein <iwanttog...@gmx.net> wrote:
> - How am i supposed to detect a missing namespace, if there
> are no differences in the serialized representation? (That's
> what i initially used to debug the problem.)

lxml's pretty printer is at fault, as it emits unprefixed names
whenever possible while serializing. For debugging, try using
.dump instead. Hopefully that makes the error obvious.

--
Neil Cerutti

0 new messages