xml aggregator

kepioo

unread,

Jul 9, 2006, 5:39:46 AM7/9/06

to

Hi all,

I am trying to write an xml aggregator, but so far, i've been failing
miserably.

what i want to do :

i have entries, in a list format :[[key1,value],[key2,value],[
key3,value]], value]

example :
[["route","23"],["equip","jr2"],["time","3pm"]],"my first value"]
[["route","23"],["equip","jr1"],["time","3pm"]],"my second value"]
[["route","23"],["equip","jr2"],["time","3pm"]],"my third value"]
[["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"]
[["route","25"],["equip","jr2"],["time","3pm"]],'"my fifth value"]

the tree i want in the end would be :
<results>
<route id="23">
<equip id=jr2">
<time id="3pm">
<data>my first value</data>
<data>my third value</data>
</time>
</equip>
<equip id=jr1">
<time id="3pm">
<data>my second value</data>
</time>
</equip>
<route id="24">
<equip id=jr2">
<time id="3pm">
<data>my fourthvalue</data>
</time>
</equip>
<route id="25">
<equip id=jr2">
<time id="3pm">

<data>my fifth value</data>
</time>
</equip>
</results>

If anyone has an idea of implemetation or any code ( i was trying with
ElementTree...

thank you so much

Gerard Flanagan

unread,

Jul 9, 2006, 10:55:54 AM7/9/06

to

kepioo wrote:
> Hi all,
>
> I am trying to write an xml aggregator, but so far, i've been failing
> miserably.
>
> what i want to do :
>
> i have entries, in a list format :[[key1,value],[key2,value],[
> key3,value]], value]
>
> example :
> [["route","23"],["equip","jr2"],["time","3pm"]],"my first value"]
> [["route","23"],["equip","jr1"],["time","3pm"]],"my second value"]
> [["route","23"],["equip","jr2"],["time","3pm"]],"my third value"]
> [["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"]
> [["route","25"],["equip","jr2"],["time","3pm"]],'"my fifth value"]
>

[snip example data]

>
>
> If anyone has an idea of implemetation or any code ( i was trying with
> ElementTree...
>

(You should have posted the code you tried)

The code below might help (though you should test it more than I have).
The 'findall' function comes from here:

http://gflanagan.net/site/python/elementfilter/elementfilter.py

it's not the elementtree one.

Gerard

----------------------------------

X = [[[["route","23"],["equip","jr2"],["time","3pm"]],"my first
value"],
[[["route","23"],["equip","jr1"],["time","3pm"]],"my second value"],
[[["route","23"],["equip","jr2"],["time","3pm"]],"my third value"],
[[["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"],
[[["route","25"],["equip","jr2"],["time","3pm"]],"my fifth value"],
[[["route","25"],["equip","jr2"],["time","4pm"]],"my sixth value"]]

# reshape the data
records = []
for info, data in X:
record = []
for attr, val in info:
record.append(val)
record.append( data )
records.append( record )

for r in records:
print r

from elementtree.ElementTree import Element, SubElement, tostring
from elementfilter import findall

results = Element('results')

for r in records:
routeid, equipid, timeid, data = r
route, equip, time = None, None, None
existing_route = findall(results, "route[@id=='%s']" % routeid)
if existing_route:
route = existing_route[0]
existing_equip = findall(route, "equip[@id=='%s']" % equipid)
if existing_equip:
equip = existing_equip[0]
existing_time = findall(equip, "time[@id=='%s']" % timeid)
if existing_time:
time = existing_time[0]
route = route or SubElement(results, 'route', id=routeid)
equip = equip or SubElement(route, 'equip', id=equipid)
time = time or SubElement(equip, 'time', id=timeid)
data = SubElement(time,'data')
data.text = item

print tostring(results)

-------------------------------------------

Gerard Flanagan

unread,

Jul 9, 2006, 12:21:29 PM7/9/06

to

Gerard Flanagan wrote:
> kepioo wrote:
> > Hi all,
> >
> > I am trying to write an xml aggregator, but so far, i've been failing
> > miserably.
> >
> > what i want to do :
> >
> > i have entries, in a list format :[[key1,value],[key2,value],[
> > key3,value]], value]
> >
> > example :
> > [["route","23"],["equip","jr2"],["time","3pm"]],"my first value"]
> > [["route","23"],["equip","jr1"],["time","3pm"]],"my second value"]
> > [["route","23"],["equip","jr2"],["time","3pm"]],"my third value"]
> > [["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"]
> > [["route","25"],["equip","jr2"],["time","3pm"]],'"my fifth value"]
> >
>
> [snip example data]
>
> >
> >
> > If anyone has an idea of implemetation or any code ( i was trying with
> > ElementTree...
> >
>
> (You should have posted the code you tried)
>
> The code below might help (though you should test it more than I have).
> The 'findall' function comes from here:
>
> http://gflanagan.net/site/python/elementfilter/elementfilter.py
>
> it's not the elementtree one.
>

Sorry, elementfilter.py was a bit broken - fixed now. Use the current
one and change the code I posted to:

[...]
existing_route = findall(results, "route[@id==%s]" % routeid)
#changed line

if existing_route:
route = existing_route[0]
existing_equip = findall(route, "equip[@id=='%s']" % equipid)
if existing_equip:

[...]

ie. don't quote the route id since it's numeric.

Gerard

kepioo

unread,

Jul 10, 2006, 8:18:41 AM7/10/06

to

thanks a lot for the code.

It was not working the first time (do not recognize item and
existing_time --> i changed item by r[-1] and existing_time by
existing_equip).

however, it is not producing the result i expected, as in it doesn't
group by same category the elements, it creates a new block of xml

<data>my first value</data>

</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="23">
-
<EQUIPMENT id="jr1">
-
<TIMESTAMP id="3pm">

<data>my second value</data>

</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="23">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">

<data>my third value</data>

</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="24">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">
<data>my fourth value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="25">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">

<data>my fifth value</data>

</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="25">
<EQUIPMENT id="jr2">
<TIMESTAMP id="4pm">
<data>my sixth value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
</results>

The idea behind all that is :

i want to create an xml file that'll have a XSL instructions.

The xsl will sort the entries and display something like :

Route 23:
*jr1
*3pm
value
value
value
*5pm
value
value
*jr2
*3pm
value
value
value
*5pm
value
value
Route 29
*jr1
*3pm
value
value
value
*5pm
value
value
*jr2
*3pm
value
value
value
*5pm
value
value

I know this is feasible with XSL2 , but i need something compatible
with quite old browser, and XSL2 is not even working on my comp( i
could upgrade but i cannot ask all the users to do so). That's why I
thought rearranging the xml would do it.

Do you have other idea? Do u think it is the best choice?

More information abt the application I am writing : i am parsing a
feed, extracting some data and producing reports. the application is
running on spyce, so i don't have to produce a file in output, just
print the xml to the screen and it is automatically wrting to the html
page we view.

Thanks again for your help.

Gerard Flanagan

unread,

Jul 10, 2006, 9:20:58 AM7/10/06

to

kepioo wrote:
> thanks a lot for the code.
>
> It was not working the first time (do not recognize item and
> existing_time --

Apologies, I ran the code from PythonWin which remembers names that
were previously declared though deleted - should have run it as a
script.

> i changed item by r[-1] and existing_time by
> existing_equip).
>

'item' was wrong but not the other two. (I'm assuming your data is
regular - ie. all the records have the same number of fields)

change the for loop to the following:

8<------------------------------------------------------

for routeid, equipid, timeid, data in records:

route, equip, time = None, None, None

existing_route = findall(results, "route[@id==%s]" % routeid)

if existing_route:
route = existing_route[0]

existing_equip = findall(route, "equip[@id==%s]" % equipid)

if existing_equip:
equip = existing_equip[0]

existing_time = findall(equip, "time[@id==%s]" % timeid)

if existing_time:
time = existing_time[0]
route = route or SubElement(results, 'route', id=routeid)
equip = equip or SubElement(route, 'equip', id=equipid)
time = time or SubElement(equip, 'time', id=timeid)

dataitem = SubElement(time,'data')
dataitem.text = data

8<------------------------------------------------------

> however, it is not producing the result i expected, as in it doesn't
> group by same category the elements, it creates a new block of xml
>

[...]

the changes above should give you what you want - remember, as I wrote
in the previous post, it should be:

"[@id==%s]"

not

"[@id=='%s']"

ie. no single quotes needed.

With the above amended code I get:

<data>my first value</data>

<data>my third value</data>

</time>
</equip>
<equip id="jr1">
<time id="3pm">

<data>my second value</data>

</time>
</equip>
</route>
<route id="24">
<equip id="jr2">
<time id="3pm">

<data>my fourth value</data>

</time>
</equip>
</route>
<route id="25">
<equip id="jr2">
<time id="3pm">

<data>my fifth value</data>

</time>
<time id="4pm">

<data>my sixth value</data>

</time>
</equip>
</route>
</results>
------------------------------------

all the best

Gerard

ps. this newsgroup prefers that you don't top-post.

kepioo

unread,

Jul 10, 2006, 11:44:31 AM7/10/06

to

Thank you so much, it works and it rocks !

bad thing i need ot figure out is why mozilla cannot parse my xsl
sheet, but it works in IE ( most of my users are using IE)

so the module u wrote is to top up element tree with Xpath
capabilities, is it? Does the new element tree does that? which one is
the most appropriate?

btw, are u french?

Regards,

Nassim

Gerard Flanagan

unread,

Jul 10, 2006, 2:42:46 PM7/10/06

to

kepioo wrote:
> Thank you so much, it works and it rocks !
>

Great! Glad I could help.

> bad thing i need ot figure out is why mozilla cannot parse my xsl
> sheet, but it works in IE ( most of my users are using IE)
>

you could try transforming the xml on the server and send straight HTML
to the client - if you were to use CherryPy (http://www.cherrypy.org),
there is a filter which does this called picket:

http://www.cherrypy.org/wiki/Picket

you would also need to install 4Suite (http://4suite.org)

> so the module u wrote is to top up element tree with Xpath
> capabilities, is it?

it minimally extends the functionality of elementtree's existing
'findall' function - and it hasn't been put to much use, so let me know
if you run into problems.

the idea came from reading about the 'Specification Pattern':

pdf - http://www.martinfowler.com/apsupp/spec.pdf

good luck!

Gerard

Fredrik Lundh

unread,

Jul 10, 2006, 5:53:01 PM7/10/06

to pytho...@python.org

kepioo wrote:
> Hi all,
>
> I am trying to write an xml aggregator, but so far, i've been failing
> miserably.
>
> what i want to do :
>
> i have entries, in a list format :[[key1,value],[key2,value],[
> key3,value]], value]
>
> example :
> [["route","23"],["equip","jr2"],["time","3pm"]],"my first value"]
> [["route","23"],["equip","jr1"],["time","3pm"]],"my second value"]
> [["route","23"],["equip","jr2"],["time","3pm"]],"my third value"]
> [["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"]
> [["route","25"],["equip","jr2"],["time","3pm"]],'"my fifth value"]
>
> the tree i want in the end would be :

assuming that the actual order of the subelements doesn't matter, you
could simply sort the array, and use groupby to group related tags:

import elementtree.ElementTree as ET
import itertools, operator

data = [
([["route","23"],["equip","jr2"],["time","3pm"]],"my first value"),
([["route","23"],["equip","jr1"],["time","3pm"]],"my second value"),
([["route","23"],["equip","jr2"],["time","3pm"]],"my third value"),
([["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"),
([["route","25"],["equip","jr2"],["time","3pm"]],"my fifth value")
]

def group(data, index):
return itertools.groupby(sorted(data), lambda x: x[0][index])

root = ET.Element("result")

for key, items in group(data, 0):
route = ET.SubElement(root, key[0], id=key[1])
for key, items in group(items, 1):
equip = ET.SubElement(route, key[0], id=key[1])
for key, items in group(items, 2):
time = ET.SubElement(equip, key[0], id=key[1])
for data in items:
ET.SubElement(time, "data").text = data[1]

ET.dump(root)

if you want prettyprinted output, use this function

http://effbot.org/zone/element-lib.htm#prettyprint

on the resulting tree.

</F>