WJG <
wjgid...@googlemail.com> writes:
> I have a simple XML file which I'm parsing in tdom and looking to
> retrieve node-ids using the xpath. But, I'm running into issues. I may
> have overlooked something simple here, any advice welcome. The version
> of tdom installed is 0.8.3-1
You'd better use 0.9.1 (released July 2018, more than a year ago).
> The following works fine..
>
> set xml {
> <office:document>
> <office:meta/>
> <office:settings/>
> <office:script/>
> <office:font-face-decls/>
> <office:styles/>
> <office:automatic-styles/>
> <office:body>
> <office:text/>
> </office:body>
> </office:document>
> }
>
> # valid XML, works...
> set doc [dom parse $xml]
[Just a terminology note, I think I got what you mean: ym well-formed.
Valid in XML speech means the XML conforms to some schema.]
Well, actually ... depends. And for your use case: no.
See, there are two different types of XML.
XML, as specified by the w3c XML recommendation, doesn't know anything
about XML namespaces. This spec allows xml the colon as a name character
inside element names (not as start character).
Nowadays, most people expect an XML document to follow the w3c XML _and_
the w3c XML Namespaces recommendation (without explicitly saying so).[1]
Your example document is well-formed according to the XML
recommendation. But it is not according to the rules of the XML and the
XML Namespaces recommendation.
Since you want to use XPath expressions (this is, what you do, if you
use the selectNodes method) and since XPath is per recommendation an XML
Namespaces aware query language you need a document which is well-formed
according to the XML as well as the XML Namespaces recommendation, if
you want to use this with success.
So, correct your input and this should start to work. E.g.:
set xml {
<office:document xmlns:office="the_office_namespace_uri">
<office:meta/>
<office:settings/>
<office:script/>
<office:font-face-decls/>
<office:styles/>
<office:automatic-styles/>
<office:body>
<office:text/>
</office:body>
</office:document>
}
> # this works...
> set root [$doc documentElement]
>
> # this too, lists the nodes...
> foreach n [$root childNodes] { puts >[$n nodeName]< }
Yes. This methods are XML namespace agnostic and work on both types of
XML (in a probably "expected" way).
> But, the following line...
>
> set node [$root selectNodes /office:document/office:body]
>
> ..comes up with the error...
>
> Prefix doesn't resolve
> while executing
> "$root selectNodes /office:document/office:body"
> invoked from within
> "set node [$root selectNodes /office:document/office:body]"
Sure. XPath is XML namespace aware. That means, if tDOMs XPath parser
sees something looking like a XML namespace prefix in the XPath
expression it expects that prefix to be bound to an XML namespace. Which
isn't with your original input.
BTW: It won't help, to bind your prefix to a namespace URI with $doc
selectNodesNamespaces or the -namespaces {prefix URI ...} option of the
selectNodes method, as another poster mentioned.
With that the prefix will resolve (and the XPath parser doesn't raise
error anymore) but you still don't get the result you want. The engine
will search for nodes in that namespace, but since the nodes of your
original XML input are not in any namespace after parsing, the call will
return an empty node set.
rolf
1) Because of this newer (than 0.8.3) versions of tDOM by default expect
XML input to be compliant to both the XML and the XML Namespaces
recommendation. So, with a more recent tDOM your
set doc [dom parse $xml]
will already raise the error:
Namespace prefix is not defined, referenced at line 2 character 18
If you really want to parse XML input conforming (only) to the XML but
not the XML Namespaces recommendation use the -ignorexmlns option of the
parse method. But be warned: you will have all kinds of problems to
select any prefixed element directly with selectNodes/XPath.