lxml namespace as an attribute

Skip Montanaro

unread,

Aug 15, 2018, 5:26:35 PM8/15/18

to

Much of XML makes no sense to me. Namespaces are one thing. If I'm
parsing a document where namespaces are defined at the top level, then
adding namespaces=root.nsmap works when calling the xpath method. I
more-or-less get that.

What I don't understand is how I'm supposed to search for a tag when
the namespace appears to be defined as an attribute of the tag itself.
I have some SOAP XML I'm trying to parse. It looks roughly like this:

<s: Envelope xmlns:a="..." xmlns:s="...">
<s:Header>
...
</s:Header>
<s:Body>
<Tag xmlns="http://some/new/path">
...
</Tag>
</s:Body>

If the document is "doc", I can find the body like so:

body = doc.xpath(".//Body" namespaces=doc.nsmap)

I don't understand how to find Tag, however. When I iterate over the
body's children, printing them out, I see that Tag's name is actually:

{http://some/new/path}Tag

yet that namespace is unknown to me until I find Tag. It seems I'm
stuck in a chicken-and-egg situation. Without knowing that
http://some/new/path namespace, is there a way to cleanly find all
instances of Tag?

Thx,

Skip

Skip Montanaro

unread,

Aug 15, 2018, 5:29:31 PM8/15/18

to

Ack. Of course I meant the subject to be "XML namespace as an
attribute". I happen to be using lxml.etree. (Long day, I guess...)

S

Joseph L. Casale

unread,

Aug 15, 2018, 7:50:21 PM8/15/18

to

See https://lxml.de/tutorial.html#namespaces and
https://lxml.de/2.1/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions
for direction. I don't have Python at my current location but I trust that will point you straight.

jlc

Skip Montanaro

unread,

Aug 15, 2018, 8:52:28 PM8/15/18

to

> See https://lxml.de/tutorial.html#namespaces and
> https://lxml.de/2.1/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions
> for direction.

I had read at least the namespaces section of the tutorial. I could
see the namespace definition right there in the XML and figured
somehow, some way, it must be fully self-describing. I guess not.

S

dieter

unread,

Aug 16, 2018, 1:53:03 AM8/16/18

to

Skip Montanaro <skip.mo...@gmail.com> writes:
> Much of XML makes no sense to me. Namespaces are one thing. If I'm
> parsing a document where namespaces are defined at the top level, then
> adding namespaces=root.nsmap works when calling the xpath method. I
> more-or-less get that.
>
> What I don't understand is how I'm supposed to search for a tag when
> the namespace appears to be defined as an attribute of the tag itself.

You seem to think that you need to take the namespace definitions
from the XML document itself. This is not the case: you can
provide them from whatever soure you want.

The important part of the namespace is the namespace uri; the namespace
prefix is just an abbreviation - its exact value is of no importance;
you can use whatever you want (and there is no need that your choice
is the same as that of the XML document).

"lxml" handles "xmlns" "attributes" differently from "normal" attributes.
"Normal" attributes are accessed via a mapping interface; "xmlns" attributes
via the "nsmap" attribute. I think (but I am not sure) that the "nsmap"
of an element contains all namespace definitions "active" at the element,
not just those defined on the element itself. Thus, if you are able
to locate an element, you can get its relevant namespace definitions
via its "nsmap" (as you did with "root").

In my typical applications, I know the relevant namespace uris.
I define a namespace dict:

ns = dict(
p1=uri1,
p2=uri2,
...
)

with prefixes "p1", ... of my own choice and pass "ns"
as "namespaces" (e.g. for "xpath").

Note that the XPATH specification does not provide to search
with a local part alone for a namespace qualified element
(even if that qualification comes from the default XML namespace).
Such searches must always use a qualified (i.e. with namespace prefix)
path.

Skip Montanaro

unread,

Aug 16, 2018, 3:36:43 PM8/16/18

to

> You seem to think that you need to take the namespace definitions
> from the XML document itself. This is not the case: you can
> provide them from whatever soure you want.

I was under the impression that XML was a self-describing format. I've
been disabused of that notion.

Skip

Stefan Behnel

unread,

Aug 17, 2018, 3:38:44 AM8/17/18

to

Skip Montanaro schrieb am 15.08.2018 um 23:25:
> Much of XML makes no sense to me. Namespaces are one thing. If I'm
> parsing a document where namespaces are defined at the top level, then
> adding namespaces=root.nsmap works when calling the xpath method. I
> more-or-less get that.
>
> What I don't understand is how I'm supposed to search for a tag when
> the namespace appears to be defined as an attribute of the tag itself.

> I have some SOAP XML I'm trying to parse. It looks roughly like this:
>
> <s: Envelope xmlns:a="..." xmlns:s="...">
> <s:Header>
> ...
> </s:Header>
> <s:Body>
> <Tag xmlns="http://some/new/path">
> ...
> </Tag>
> </s:Body>
>
> If the document is "doc", I can find the body like so:
>
> body = doc.xpath(".//Body" namespaces=doc.nsmap)
>
> I don't understand how to find Tag, however. When I iterate over the
> body's children, printing them out, I see that Tag's name is actually:
>
> {http://some/new/path}Tag
>
> yet that namespace is unknown to me until I find Tag. It seems I'm
> stuck in a chicken-and-egg situation. Without knowing that
> http://some/new/path namespace, is there a way to cleanly find all
> instances of Tag?

In addition to what dieter said, let me mention that you do not need to
obey to XPath's dictate to use namespace prefixes. lxml provides two ways
of expressing searches with qualified tag names (i.e. "{namespace}tag" aka.
Clark Notation).

1) You can use the .find*() methods, which implement a subset of what XPath
can express (the same that the xml.etree.ElementTree library supports,
improvements welcome), but are simpler to use and faster than the XPath
engine. If you need only the first occurrence of a tag, you can say

doc.find(".//{http://some/namespace}Body")

and there is also an .iterfind() method for incremental searches and
.findall() to return all matches as a list.

2) You can use the XPath subclass "ETXPath", which internally translates
qualified tag names to a prefix mapping for you and passes them on into the
normal XPath engine. So this gives you the expressiveness of XPath without
having to care about prefixes.

Stefan