I am trying to write an xml aggregator, but so far, i've been failing
miserably.
what i want to do :
i have entries, in a list format :[[key1,value],[key2,value],[
key3,value]], value]
example :
[["route","23"],["equip","jr2"],["time","3pm"]],"my first value"]
[["route","23"],["equip","jr1"],["time","3pm"]],"my second value"]
[["route","23"],["equip","jr2"],["time","3pm"]],"my third value"]
[["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"]
[["route","25"],["equip","jr2"],["time","3pm"]],'"my fifth value"]
the tree i want in the end would be :
<results>
<route id="23">
<equip id=jr2">
<time id="3pm">
<data>my first value</data>
<data>my third value</data>
</time>
</equip>
<equip id=jr1">
<time id="3pm">
<data>my second value</data>
</time>
</equip>
<route id="24">
<equip id=jr2">
<time id="3pm">
<data>my fourthvalue</data>
</time>
</equip>
<route id="25">
<equip id=jr2">
<time id="3pm">
<data>my fifth value</data>
</time>
</equip>
</results>
If anyone has an idea of implemetation or any code ( i was trying with
ElementTree...
thank you so much
[snip example data]
>
>
> If anyone has an idea of implemetation or any code ( i was trying with
> ElementTree...
>
(You should have posted the code you tried)
The code below might help (though you should test it more than I have).
The 'findall' function comes from here:
http://gflanagan.net/site/python/elementfilter/elementfilter.py
it's not the elementtree one.
Gerard
----------------------------------
X = [[[["route","23"],["equip","jr2"],["time","3pm"]],"my first
value"],
[[["route","23"],["equip","jr1"],["time","3pm"]],"my second value"],
[[["route","23"],["equip","jr2"],["time","3pm"]],"my third value"],
[[["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"],
[[["route","25"],["equip","jr2"],["time","3pm"]],"my fifth value"],
[[["route","25"],["equip","jr2"],["time","4pm"]],"my sixth value"]]
# reshape the data
records = []
for info, data in X:
record = []
for attr, val in info:
record.append(val)
record.append( data )
records.append( record )
for r in records:
print r
from elementtree.ElementTree import Element, SubElement, tostring
from elementfilter import findall
results = Element('results')
for r in records:
routeid, equipid, timeid, data = r
route, equip, time = None, None, None
existing_route = findall(results, "route[@id=='%s']" % routeid)
if existing_route:
route = existing_route[0]
existing_equip = findall(route, "equip[@id=='%s']" % equipid)
if existing_equip:
equip = existing_equip[0]
existing_time = findall(equip, "time[@id=='%s']" % timeid)
if existing_time:
time = existing_time[0]
route = route or SubElement(results, 'route', id=routeid)
equip = equip or SubElement(route, 'equip', id=equipid)
time = time or SubElement(equip, 'time', id=timeid)
data = SubElement(time,'data')
data.text = item
print tostring(results)
-------------------------------------------
Sorry, elementfilter.py was a bit broken - fixed now. Use the current
one and change the code I posted to:
[...]
existing_route = findall(results, "route[@id==%s]" % routeid)
#changed line
if existing_route:
route = existing_route[0]
existing_equip = findall(route, "equip[@id=='%s']" % equipid)
if existing_equip:
[...]
ie. don't quote the route id since it's numeric.
Gerard
It was not working the first time (do not recognize item and
existing_time --> i changed item by r[-1] and existing_time by
existing_equip).
however, it is not producing the result i expected, as in it doesn't
group by same category the elements, it creates a new block of xml
<results>
-
<LINK id="23">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">
<data>my first value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="23">
-
<EQUIPMENT id="jr1">
-
<TIMESTAMP id="3pm">
<data>my second value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="23">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">
<data>my third value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="24">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">
<data>my fourth value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="25">
-
<EQUIPMENT id="jr2">
-
<TIMESTAMP id="3pm">
<data>my fifth value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
-
<LINK id="25">
<EQUIPMENT id="jr2">
<TIMESTAMP id="4pm">
<data>my sixth value</data>
</TIMESTAMP>
</EQUIPMENT>
</LINK>
</results>
The idea behind all that is :
i want to create an xml file that'll have a XSL instructions.
The xsl will sort the entries and display something like :
Route 23:
*jr1
*3pm
value
value
value
*5pm
value
value
*jr2
*3pm
value
value
value
*5pm
value
value
Route 29
*jr1
*3pm
value
value
value
*5pm
value
value
*jr2
*3pm
value
value
value
*5pm
value
value
I know this is feasible with XSL2 , but i need something compatible
with quite old browser, and XSL2 is not even working on my comp( i
could upgrade but i cannot ask all the users to do so). That's why I
thought rearranging the xml would do it.
Do you have other idea? Do u think it is the best choice?
More information abt the application I am writing : i am parsing a
feed, extracting some data and producing reports. the application is
running on spyce, so i don't have to produce a file in output, just
print the xml to the screen and it is automatically wrting to the html
page we view.
Thanks again for your help.
kepioo wrote:
> thanks a lot for the code.
>
> It was not working the first time (do not recognize item and
> existing_time --
Apologies, I ran the code from PythonWin which remembers names that
were previously declared though deleted - should have run it as a
script.
> i changed item by r[-1] and existing_time by
> existing_equip).
>
'item' was wrong but not the other two. (I'm assuming your data is
regular - ie. all the records have the same number of fields)
change the for loop to the following:
8<------------------------------------------------------
for routeid, equipid, timeid, data in records:
route, equip, time = None, None, None
existing_route = findall(results, "route[@id==%s]" % routeid)
if existing_route:
route = existing_route[0]
existing_equip = findall(route, "equip[@id==%s]" % equipid)
if existing_equip:
equip = existing_equip[0]
existing_time = findall(equip, "time[@id==%s]" % timeid)
if existing_time:
time = existing_time[0]
route = route or SubElement(results, 'route', id=routeid)
equip = equip or SubElement(route, 'equip', id=equipid)
time = time or SubElement(equip, 'time', id=timeid)
dataitem = SubElement(time,'data')
dataitem.text = data
8<------------------------------------------------------
> however, it is not producing the result i expected, as in it doesn't
> group by same category the elements, it creates a new block of xml
>
[...]
the changes above should give you what you want - remember, as I wrote
in the previous post, it should be:
"[@id==%s]"
not
"[@id=='%s']"
ie. no single quotes needed.
With the above amended code I get:
<results>
<route id="23">
<equip id="jr2">
<time id="3pm">
<data>my first value</data>
<data>my third value</data>
</time>
</equip>
<equip id="jr1">
<time id="3pm">
<data>my second value</data>
</time>
</equip>
</route>
<route id="24">
<equip id="jr2">
<time id="3pm">
<data>my fourth value</data>
</time>
</equip>
</route>
<route id="25">
<equip id="jr2">
<time id="3pm">
<data>my fifth value</data>
</time>
<time id="4pm">
<data>my sixth value</data>
</time>
</equip>
</route>
</results>
------------------------------------
all the best
Gerard
ps. this newsgroup prefers that you don't top-post.
bad thing i need ot figure out is why mozilla cannot parse my xsl
sheet, but it works in IE ( most of my users are using IE)
so the module u wrote is to top up element tree with Xpath
capabilities, is it? Does the new element tree does that? which one is
the most appropriate?
btw, are u french?
Regards,
Nassim
Great! Glad I could help.
> bad thing i need ot figure out is why mozilla cannot parse my xsl
> sheet, but it works in IE ( most of my users are using IE)
>
you could try transforming the xml on the server and send straight HTML
to the client - if you were to use CherryPy (http://www.cherrypy.org),
there is a filter which does this called picket:
http://www.cherrypy.org/wiki/Picket
you would also need to install 4Suite (http://4suite.org)
> so the module u wrote is to top up element tree with Xpath
> capabilities, is it?
it minimally extends the functionality of elementtree's existing
'findall' function - and it hasn't been put to much use, so let me know
if you run into problems.
the idea came from reading about the 'Specification Pattern':
pdf - http://www.martinfowler.com/apsupp/spec.pdf
good luck!
Gerard
assuming that the actual order of the subelements doesn't matter, you
could simply sort the array, and use groupby to group related tags:
import elementtree.ElementTree as ET
import itertools, operator
data = [
([["route","23"],["equip","jr2"],["time","3pm"]],"my first value"),
([["route","23"],["equip","jr1"],["time","3pm"]],"my second value"),
([["route","23"],["equip","jr2"],["time","3pm"]],"my third value"),
([["route","24"],["equip","jr2"],["time","3pm"]],"my fourth value"),
([["route","25"],["equip","jr2"],["time","3pm"]],"my fifth value")
]
def group(data, index):
return itertools.groupby(sorted(data), lambda x: x[0][index])
root = ET.Element("result")
for key, items in group(data, 0):
route = ET.SubElement(root, key[0], id=key[1])
for key, items in group(items, 1):
equip = ET.SubElement(route, key[0], id=key[1])
for key, items in group(items, 2):
time = ET.SubElement(equip, key[0], id=key[1])
for data in items:
ET.SubElement(time, "data").text = data[1]
ET.dump(root)
if you want prettyprinted output, use this function
http://effbot.org/zone/element-lib.htm#prettyprint
on the resulting tree.
</F>