How to get parent tag while using XPaths in xml.etree.elementree

6,880 views
Skip to first unread message

PBLN RAO

unread,
Nov 1, 2012, 3:23:11 AM11/1/12
to python_in...@googlegroups.com
 

I am trying to retrieve data from xml.etree.cElementTree.

i have the following code

Code Snippet


import xml.etree.cElementTree as ET

xmldata ="""
<pipeline>
    <ep_150>
        <stage name="lay" longname="layout" department="layout" process="production">
            <review name="R1" reviewer="sridhar reddy" role="supervisor" id="p1234">
            </review>
        </stage>
        <stage name="lip" longname="lipsync" department="lipsync" process="production">
            <review name="R2" reviewer="someone" role="supervisor" id="p2345">
            </review>
        </stage>
        <stage name="blk" longname="blocking" department="animation" process="production">
            <review name="R3" reviewer="sudeepth" role="supervisor" id="p4645" dependson='R1'>
            </review>
            <review name="R4" reviewer="chandu" role="director" id="p5678">
            </review>
        </stage>
        <stage name="pri" longname="primary" department="animation" process="production">
            <review name="R5" reviewer="sudeepth" role="supervisor" id="p4645" style="dep" >
            </review>
            <review name="R6" reviewer="sudeepth" role="bld_supervisor" id="p2556" style="dep">
            </review>
        </stage>
        <stage name="sec" longname="secondary" department="animation" process="production">
            <review name="R7" reviewer="sha" role="supervisor" id="p1234" style="dep">
            </review>
            <review name="R8" reviewer="chandu" role="director" id="p5678">
            </review>
        </stage>
    </ep_150>
</pipeline>
"""
root = ET.fromstring(xmldata)

review = root.findall("./ep_150/stage/review")        

print '\n\nreviews for id=p4645\n'

for rev in review:

    if rev.attrib['id']=='p4645':
        print (rev.attrib['name'])

with the above code i am getting the result as below

reviews for id=p4645

R3

R5

But i need the output as

reviewes for id=p4645

blk - R3

pri - R5

i.e, i need the parent tag along with the element using Xpaths.

I know its possible if i iterate through entire tree. But need to use more loops. 

damon shelton

unread,
Nov 1, 2012, 12:28:53 PM11/1/12
to python_in...@googlegroups.com

perhaps loop through each stage and check its children


root = ET.fromstring(xmldata)

stages = root.findall("./ep_150/stage")

print '\n\nreviews for id=p4645\n'

for stage in stages:

children = stage.getchildren()

for child in children:

if child.attrib['id']=='p4645':

print('%s - %s' % (stage.attrib['name'], child.attrib['name']))



Justin Israel

unread,
Nov 1, 2012, 1:18:55 PM11/1/12
to python_in...@googlegroups.com
I'm not very experienced with xpath, but this link suggests a way to build a parent-child map once that you can use:

root = ET.fromstring(xmldata)
parent_map = dict((c, p) for p in root.getiterator() for c in p)
...
for rev in review:
    parent = parent_map[rev]
    ...

If you are using the newest cElementTree from pypi then you can also modify your xpath to directly filter the id attribute and save the loop:

review = root.findall('./ep_150/stage/review[@id="p4645"]')
[" - ".join((parent_map[r].get('name'), r.get('name'))) for r in review]
# ['blk - R3', 'pri - R5']


And if you have lxml installed, it seems the Element objects let you directly access the parent:

review = root.xpath('./ep_150/stage/review[@id="p4645"]')
[" - ".join((r.getparent().get('name'), r.get('name'))) for r in review]
# ['blk - R3', 'pri - R5']


-- justin

PBLN RAO

unread,
Nov 2, 2012, 5:09:39 AM11/2/12
to python_in...@googlegroups.com
thx justin,

parent map worked for me.

but i think your 2nd option below dont work in python 2.6.
bcoz the link which gave is cElementree 1.0.2 and in the site it says cElementree 1.0.5 comes with python 2.5 onwards. i am using python 2.6.4.
even in python 2.6 xpath did not work directly to filter the attributes.

i have posted a question on this on stackoverflow


If you are using the newest cElementTree from pypi then you can also modify your xpath to directly filter the id attribute and save the loop:

review = root.findall('./ep_150/stage/review[@id="p4645"]')
[" - ".join((parent_map[r].get('name'), r.get('name'))) for r in review]
# ['blk - R3', 'pri - R5']


Justin Israel

unread,
Nov 2, 2012, 12:20:36 PM11/2/12
to python_in...@googlegroups.com
Ya, I think I wasn't clear about that part. The cElementTree that is shipped with python isn't the one that has the full xpath support I mentioned. You just need to install the newer one (or lxml if you want that).

You can use easy_install or pip to get them from pypi

easy_install cElementTree
pip install cElementTree
pip install lxml

If you don't have easy_install you can get it by downloading this:

And doing:
python ez_setup.py

Then you will have easy_install. You could further get pip and just use that if you want:
easy_install pip
Reply all
Reply to author
Forward
0 new messages