Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

lxml and xpath(?)

51 views
Skip to first unread message

Doug OLeary

unread,
Oct 24, 2016, 5:41:16 PM10/24/16
to
Hey;

Reasonably new to python and incredibly new to xml much less trying to parse it. I need to identify cluster nodes from a series of weblogic xml configuration files. I've figured out how to get 75% of them; now, I'm going after the edge case and I'm unsure how to proceed.

Weblogic xml config files start with namespace definitions then a number of child elements some of which have children of their own.

The element that I'm interested in is <server> which will usually have a subelement called <listen-address> containing the hostname that I'm looking for.

Following the paradigm of "we love standards, we got lots of them", this model doesn't work everywhere. Where it doesn't work, I need to look for a subelement of <server> called <machine>. That element contains an alias which is expanded in a different root child, at the same level as <server>.

So, picture worth a 1000 words:

<?xml version='1.0' encoding='UTF-8'?>
< [[ heinous namespace xml snipped ]] >
<name>[[text]]</name>
...
<server>
<name>EDIServices_MS1</name>
...
<machine>EDIServices_MC1</machine>
...
</server>
<server>
<name>EDIServices_MS2</name>
...
<machine>EDIServices_MC2</machine>
...
</server>
<machine xsi:type="unix-machineType">
<name>EDIServices_MC1</name>
<node-manager>
<name>EDIServices_MC1</name>
<nm-type>SSL</nm-type>
<listen-address>host001</listen-address>
<listen-port>7001</listen-port>
</node-manager>
</machine>
<machine xsi:type="unix-machineType">
<name>EDIServices_MC2</name>
<node-manager>
<name>EDIServices_MC2</name>
<listen-address>host002</listen-address>
<listen-port>7001</listen-port>
</node-manager>
</machine>
</domain>

So, running it on 'normal' config, I get:

$ ./lxml configs/EntsvcSoa_Domain_config.xml
EntsvcSoa_CS => host003.myco.com
EntsvcSoa_CS => host004.myco.com

Running it against the abi-normal config, I'm currently getting:

$ ./lxml configs/EDIServices_Domain_config.xml
EDIServices_CS => EDIServices_MC1
EDIServices_CS => EDIServices_MC2

Using the examples above, I would like to translate EDIServices_MC1 and EDIServices_MC2 to host001 and host002 respectively.

The primary loop is:

for server in root.findall('ns:server', namespaces):
cs = server.find('ns:cluster', namespaces)
if cs is None:
continue
# cluster_name = server.find('ns:cluster', namespaces).text
cluster_name = cs.text
listen_address = server.find('ns:listen-address', namespaces)
server_name = listen_address.text
if server_name is None:
machine = server.find('ns:machine', namespaces)
if machine is None:
continue
else:
server_name = machine.text

print("%-15s => %s" % (cluster_name, server_name))

(it's taken me days to write 12 lines of code... good thing I don't do this for a living :) )

Rephrased, I need to find the <listen-address> under the <machine> child who's name matches the name under the corresponding <server> child. From some of the examples on the web, I believe xpath might help but I've not been able to get even the simple examples working. Go figure, I just figured out what a namespace is...

Any hints/tips/suggestions greatly appreciated especially with complete noob tutorials for xpath.

Thanks for your time.

Doug O'Leary

Peter Otten

unread,
Oct 26, 2016, 9:57:45 AM10/26/16
to
You tend to get more efficient when you read the tutorial before you start
writing code. Hard-won advice that I still not always follow myself ;)

>
> Rephrased, I need to find the <listen-address> under the <machine> child
> who's name matches the name under the corresponding <server> child. From
> some of the examples on the web, I believe xpath might help but I've not
> been able to get even the simple examples working. Go figure, I just
> figured out what a namespace is...
>
> Any hints/tips/suggestions greatly appreciated especially with complete
> noob tutorials for xpath.

Use your favourite search engine. One advantage of XPath is that it's not
limited to Python.

I did not completely follow your question, so the example below is my
interpretation of what you are asking for. It may still help you get
started...

$ cat lxml_translate_host.py
from lxml import etree

s = """\
<?xml version='1.0' encoding='UTF-8'?>
<domain>
<name>text</name>
<server>
<name>EDIServices_MS1</name>
<machine>EDIServices_MC1</machine>
</server>
<server>
<name>EDIServices_MS2</name>
<machine>EDIServices_MC2</machine>
</server>
<machine type="unix-machineType">
<name>EDIServices_MC1</name>
<node-manager>
<name>EDIServices_MC1</name>
<nm-type>SSL</nm-type>
<listen-address>host001</listen-address>
<listen-port>7001</listen-port>
</node-manager>
</machine>
<machine type="unix-machineType">
<name>EDIServices_MC2</name>
<node-manager>
<name>EDIServices_MC2</name>
<listen-address>host002</listen-address>
<listen-port>7001</listen-port>
</node-manager>
</machine>
</domain>
""".encode()

root = etree.fromstring(s)
for server in root.xpath("./server"):
servername = server.xpath("./name/text()")[0]
print("server", servername)
if not servername.isidentifier():
raise ValueError("Kind regards to Bobby Tables' Mom")
machine = server.xpath("./machine/text()")[0]
print("machine", machine)
path = ("../machine[name='{}']/node-manager/"
"listen-address/text()").format(machine)
host = server.xpath(path)[0]
print("host", host)
print()
$ python3 lxml_translate_host.py
server EDIServices_MS1
machine EDIServices_MC1
host host001

server EDIServices_MS2
machine EDIServices_MC2
host host002

$


Pete Forman

unread,
Oct 27, 2016, 2:56:57 AM10/27/16
to
Peter Otten <__pet...@web.de> writes:

> root = etree.fromstring(s)
> for server in root.xpath("./server"):
> servername = server.xpath("./name/text()")[0]

When working with lxml I prefer to use this Python idiom.

servername, = server.xpath("./name/text()")

That enforces a single result. The original code will detect a lack of
results but if the query returns multiple results when only one is
expected then it silently returns the first.

--
Pete Forman

Peter Otten

unread,
Oct 27, 2016, 3:19:03 AM10/27/16
to
Good suggestion, and for those who find the trailing comma easy to overlook:

[servername] = server.xpath("./name/text()")


dieter

unread,
Nov 2, 2016, 4:27:59 AM11/2/16
to
Doug OLeary <dkol...@olearycomputers.com> writes:
> ...
> Any hints/tips/suggestions greatly appreciated especially with complete noob tutorials for xpath.

You can certainly do it with "XPath" (look for the "following-sibling" axis).

You can also use Python (with "lxml"). If you have an element "e", then
"e.getnext()" gives you the sibling following "e" (or `None`).

0 new messages