Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problem parsing XML

43 views
Skip to first unread message

Juge

unread,
Sep 12, 2019, 1:14:54 AM9/12/19
to
I have a following short example:
set XML {<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='query.xsl'?>
<objList label="Key Result" numPages="1" page="1" pageSize="1000"
total="84" type="KeyResult" version="1.5.1">
<query>1;Project;\[name=='/Data/Project1'\].sdmObjects:Variant\[phase.name=='Early'\].models:inputFile\[simulationDef.scenario.label=='SimulationSet1'\].results.keyResults\[type.name=='PPTDocument'\]</query>
<view>
<field label="inputFile" list="false" name="inputFile" type="DbObject"/>
<field label="Files" list="false" name="files" type="Document"/>
</view>
<obj oid="lCcAAAAByA2xNM:TXFMiw" type="PPTDocument">
<attr name="inputFile" oid="AAAABwnTnv8:B1M">Dataset_v019_GFZ</attr>
<attr name="files">
<file
name="Report_Dataset_v018d_GFZ_vs_Dataset_v019_GFZ.pptx"
role="role_1" size="26035115"/>
</attr>
</obj>
<obj oid="ey8AAAAB6xzKp8:TXFMiw" type="PPTDocument">
<attr name="inputFile" oid="AAAAB6F0PcM:B1M">Dataset_F20EG040</attr>
<attr name="files">
<file
name="Report_Dataset_F20EG039_vs_Dataset_F20EG040.pptx"
role="role_1" size="34545819"/>
</attr>
</obj>
</objList>
}
set doc [dom parse $XML]
set root [$doc documentElement]

# Since there are more than one order nodes a Tcl list will be returned from the selectNodes method.
set nodeList [$root selectNodes /objList/obj]

# Parse node1 from the returned list.
set item [lindex $nodeList 0]

puts [$item asText]
set oid [lindex [$item selectNodes {@oid}] 0 1]
set deckName [[$item selectNodes {attr[@name='inputFile']}] asText]
set fileName [$item selectNodes ".attributes/files"]
set file [$fileName nodeValue]
puts "$oid $deckName $file"



I would like get both oid(s) for PPTDocument and inputFile which I think I can manage. I have, however, huge problems navigating down and getting the file name of the powerpoint. I was able to get there directly by
starting with [$doc selectNodes {/objList/obj/attr/file}]
set name [lindex [$node selectNodes {@name}] 0 1]
set role [lindex [$node selectNodes {@role}] 0 1]

but starting earlier own I do not know how to get down a branch - I really suck in this XML stuff

heinrichmartin

unread,
Sep 12, 2019, 4:39:17 AM9/12/19
to
On Thursday, September 12, 2019 at 7:14:54 AM UTC+2, Juge wrote:
> I would like get both oid(s) for PPTDocument and inputFile which I think I can manage. I have, however, huge problems navigating down and getting the file name of the powerpoint. I was able to get there directly by
> starting with [$doc selectNodes {/objList/obj/attr/file}]
> set name [lindex [$node selectNodes {@name}] 0 1]
> set role [lindex [$node selectNodes {@role}] 0 1]

Not so sure what exactly you are trying to achieve and what is the problem. [$doc selectNodes {/objList/obj/attr/file}] returns two nodes, assumingly the other commands are in a loop.

From my experience, this is the preferred way. While XPath is quite mighty, you can easily pick the wrong parts of an XML, e.g. if the XML does not comply with your expectations. Therefore, it is a good idea to pin the context element and issue further queries from there.

selectNodes (assuming tdom here, which you did not mention) is a bit nasty as it has several return formats depending on the node type (see typeVar). You can also use getAttribute, see doc (e.g. http://www.tdom.org/index.html/artifact/6332872254ffbc69).

Having that said, are you looking for queries like this?

multiple attributes: $doc selectNodes {/objList/obj[@type='PPTDocument']/attr[@name='files']/file/@*[name()='name' or name()='role']}

extract values: lmap a [$doc selectNodes {/objList/obj[@type='PPTDocument']/attr[@name='files']/file/@name}] {lindex $a 1}

> but starting earlier own I do not know how to get down a branch - I really suck in this XML stuff

I like the XPath doc at https://www.w3schools.com/xml/xpath_intro.asp and following pages. Don't forget https://www.w3schools.com/xml/xsl_functions.asp.

Juge

unread,
Sep 12, 2019, 5:15:11 AM9/12/19
to
I think I found the solution (yes tdom), for going down the branch I can do following:
set oid [lindex [$item selectNodes {@oid}] 0 1]
set deckName [[$item selectNodes {attr[@name='inputFile']}] asText]
set fileNd [$item selectNodes "./attr/file"]
set role [lindex [$fileNd selectNodes {@role}] 0 1]
set fileName [lindex [$fileNd selectNodes {@name}] 0 1]

Like I said I could hook up directly to name and role of the file I was looking with
[$doc selectNodes {/objList/obj/attr/file}]

but I needed some of the information higher up under /objList/obj
but this seems to do the trick...

heinrichmartin

unread,
Sep 12, 2019, 5:18:12 AM9/12/19
to
On Thursday, September 12, 2019 at 11:15:11 AM UTC+2, Juge wrote:
> but I needed some of the information higher up under /objList/obj

You can navigate "up" by using axes in XPath.

Rich

unread,
Sep 12, 2019, 6:39:56 AM9/12/19
to
Juge <jyrki.m...@gmail.com> wrote:
> I would like get both oid(s) for PPTDocument and inputFile which I
> think I can manage. I have, however, huge problems navigating down
> and getting the file name of the powerpoint.

I read the above to mean you want the oid of the input file, the oid of
the PPTDocument node, and the filename hanging out underneath the
PPTDocument node. So, starting just after your
"set root [$doc documentElement]" line, this code retreives those three
datapoints:

foreach objNode [$root selectNodes /objList/obj] {
set pptoid [$objNode getAttribute oid]
set ipfoid [$objNode selectNodes {string(attr[1]/@oid)}]
set name [$objNode selectNodes {string(attr[2]/file/@name)}]
puts "ppt oid=$pptoid\nipf oid=$ipfoid\nname=$name\n"
}
0 new messages