I'm a C++ developer, and like to learn the XML technology. Any advice on a
good book would be greatly appreciated.
Thanks,
Mourad
O'Reilly's XML in a Nutshell.
--
<Matt/>
/|| ** Founder and CTO ** ** http://axkit.com/ **
//|| ** AxKit.com Ltd ** ** XML Application Serving **
// || ** http://axkit.org ** ** XSLT, XPathScript, XSP **
// \\| // ** mod_perl news and resources: http://take23.org **
\\//
//\\
// \\
Be very carefull on chapters 8, 9, 19 and 20 (at least but not only,
chapter 11 has also some big bugs), there are many errors (I posted
something like 3 or 4 "serious technical mistake" erratas and as many minor
mistakes and suggestions for the first 3 above chapters)
For example, the book doesn't make any difference between XPath expressions
and XSLT patterns. It says an XSLT pattern unprefixed name node test
matches nodes in the default namespace (whereas it actually matches nodes
not within any namespace - with a null namespace URI). It states XPath
allows scientific notation for numbers whereas it's "only" a requirement
for the future XPath 2.0. It says relative location paths in XPointers are
relative to the node containing the XPointer whereas the context node is
actually set to the root node of the linked document, you must use here()
or origin() to refer to the XPointers' container node.
...among many others...
I'm currently translating these chapter and it's a real plague...
French customers, rejoice, all these errors (except those needing a
complete rewrite) will be corrected!
Apart from that, yes, this is a good, well-written book (chapters 6 and 7
are really, really interesting ; I didn't look at parts 1 and 3 in details)
Tom.
My prefered one is O'Reilly's XML Pocket Reference. However it doesn't deal
with programming issues (SAX and DOM interfaces for instance).
Tom.
I'm sure that's not true, but I'd have to check.
> It says an XSLT pattern unprefixed name node test
> matches nodes in the default namespace (whereas it actually matches nodes
> not within any namespace - with a null namespace URI).
Grr, that's frustrating. I corrected this in the earlier drafts!
> It states XPath
> allows scientific notation for numbers whereas it's "only" a requirement
> for the future XPath 2.0.
True, although most XSLT and XPath engines allow sci notation today.
Matt (one of the two tech editors on the book).
For instance (among many others):
· page 133, Templates: "This element has a match attribute that contains
an XPath pattern identifying the input it matches"
· page 139, The Default Template Rule for Text and Attribute Nodes:
"The text() function is an XPath expression matching all text nodes"
Also note node type tests are called "functions"
· page 150, The Root Location Path: "The simplest location path is the one
that selects the document's root node [...] For example, this XSLT
template uses the XPath pattern / to match the entire input document tree
and wrap it in an html element:
<xsl:template match="/">
<html><xsl:apply-templates/></html>
</xsl:template>
The forward slash / is an absolute location path because no matter what
the context node is, no matter where you were in the input tree when this
template was applied, it always means the same thing: the root node of
the document."
This is the worst example one could find (in the book) I guess.
· etc.
>> It states XPath allows scientific notation for numbers whereas it's
>> "only" a requirement for the future XPath 2.0.
>
> True, although most XSLT and XPath engines allow sci notation today.
That doesn't justify the statement. The book should at least say it's not
standard and will only be in the future XPath version.
Have a look to my message on the libxml list
<http://mail.gnome.org/archives/xml/2001-April/msg00086.html> and you'll
see sci notation support isn't consistant at all through the different
implementations (tested ones were Xalan-J, Xalan-C, SAXON, Sablotron).
Users not subscribed to the list may be pleased to learn that sci notation
has only been added to libxml a few days ago (April 21) and is subject to
change (most probably a flag for switching between XPath 1.0 and future
XPath 2.0 conformance) during next weeks.
> Matt (one of the two tech editors on the book).
So maybe you could tell me why the errata related to the above HUGE errors
are still in the "unconfirmed" list since I submit them almost one month
ago, why none of the authors contacted me if they needed some more
explaination and why they (the errors) haven't been corrected in the second
edition?
Tom.
It *is* an XPath "pattern". It's just not an XPath "expression". It's
could be argued that it's not XPath because the XSLT spec doesn't refer
to it as such explicitly, even though it refers to the XPath spec in
hyperlinks, but it's a very fine line.
> · page 139, The Default Template Rule for Text and Attribute Nodes:
> "The text() function is an XPath expression matching all text nodes"
> Also note node type tests are called "functions"
That's correct too, apart from the use of "function". A correct phrase
would be "The text() node test is an XPath expression matching all text
nodes".
> · page 150, The Root Location Path: "The simplest location path is the one
> that selects the document's root node [...] For example, this XSLT
> template uses the XPath pattern / to match the entire input document tree
> and wrap it in an html element:
> <xsl:template match="/">
> <html><xsl:apply-templates/></html>
> </xsl:template>
> The forward slash / is an absolute location path because no matter what
> the context node is, no matter where you were in the input tree when this
> template was applied, it always means the same thing: the root node of
> the document."
Can you be more specific what you see as wrong with that, as apart from
being a bit of a simplification, it seems accurate to me.
> This is the worst example one could find (in the book) I guess.
Doesn't seem too bad then :-)
I don't know how many books you've worked on, but every single one I
have worked on has a *lot* of errors. In fact I was speaking to an
O'Reilly editor just the other week, and she was saying the exact same
thing.
> >> It states XPath allows scientific notation for numbers whereas it's
> >> "only" a requirement for the future XPath 2.0.
> >
> > True, although most XSLT and XPath engines allow sci notation today.
>
> That doesn't justify the statement. The book should at least say it's not
> standard and will only be in the future XPath version.
I don't disagree.
> > Matt (one of the two tech editors on the book).
>
> So maybe you could tell me why the errata related to the above HUGE errors
> are still in the "unconfirmed" list since I submit them almost one month
> ago, why none of the authors contacted me if they needed some more
> explaination and why they (the errors) haven't been corrected in the second
> edition?
I don't have any editorial control over the book, or have any direct
contact with the authors (apart from seeing Elliotte at various XML
conferences), sorry.
Yes, it's a pattern, but there isn't any notion of pattern and matching in
XPath, only selecting nodes.
The pattern notion is introduced in XSLT.
> It's could be argued that it's not XPath because the XSLT spec doesn't
> refer to it as such explicitly, even though it refers to the XPath spec
> in hyperlinks, but it's a very fine line.
XSLT patterns and XPath expressions have similar syntax but totally
different semantics (every XSLT processor I know of makes this difference :
libxslt uses 2 different parsers and in-memory representations, SAXON uses
different classes, as well as Xalan).
Consider this expression: foo/bar[@baz]
When evaluating it in a "selection" context, you start by selecting all foo
child elements of the context node in a node-set. For each selected node,
you select its bar child element node, then for each one you check whether
it has a baz attribute.
When evaluating it in a "matching" context, you start by checking whether
the current node has the expanded name "bar", if it does, you can check
whether it has a baz attribute, then whether its parent node is named "foo".
You can't consider XPath expressions and XSLT patterns the same way thus
can hardly describe them in a single chapter, at least the way it's done
here.
>> · page 139, The Default Template Rule for Text and Attribute Nodes:
>> "The text() function is an XPath expression matching all text nodes"
>> Also note node type tests are called "functions"
>
> That's correct too, apart from the use of "function". A correct phrase
> would be "The text() node test is an XPath expression matching all text
> nodes".
Look above. An "XPath expression" selects nodes, it doesn't matches them
(you said that yourself: «It *is* an XPath "pattern". It's just not an
XPath "expression"»)
>> · page 150, The Root Location Path: "The simplest location path is the
>> one
>> that selects the document's root node [...] For example, this XSLT
>> template uses the XPath pattern / to match the entire input document
>> tree and wrap it in an html element:
>> <xsl:template match="/">
>> <html><xsl:apply-templates/></html>
>> </xsl:template>
>> The forward slash / is an absolute location path because no matter
>> what the context node is, no matter where you were in the input tree
>> when this template was applied, it always means the same thing: the
>> root node of the document."
>
> Can you be more specific what you see as wrong with that, as apart from
> being a bit of a simplification, it seems accurate to me.
1. A location path _selects_ nodes, not matches them (or it is a location
path pattern)
2. In the last phrase, the problem is not when and where the template is
instanciated but when and where the root node is selected, with an
<xsl:apply-templates select="/" /> element (which will most probably
lead to an endless loop, but that's not what matters for now).
The above template cannot be applied without a previous selection of the
root node (either explicitly or implicitly), what matters is selection,
not matching.
3. The / location path is an absolute location path because it always
_selects_ the root node, independantly from where and when it is
evaluated.
>> This is the worst example one could find (in the book) I guess.
>
> Doesn't seem too bad then :-)
It should have used an xsl:apply-template or xsl:value-of element, not a
xsl:template one.
> I don't know how many books you've worked on, but every single one I
> have worked on has a *lot* of errors. In fact I was speaking to an
> O'Reilly editor just the other week, and she was saying the exact same
> thing.
That's not for reassuring me ;o)
What annoys me here is that I sometimes really wonder if the author
(Elliott Rusty Harold?) understands what he speaks of.
Isn't such an author supposed to be an expert in that domain? Why does it
seem to me I know more about these topics that E. R. Harold does ?
Sometimes that's not knowledge but only reading of specifications (though
all my knowledge is based on that). Aren't tech editors (sorry) also
supposed to check accuracy (it's said somewhere that an XPointer is
relative to the node containing it ; that chocked me and it took 10 seconds
to check the spec and become sure it was a bug in the book ; who's supposed
to do some checks? authors and editors, who should be some kind of experts,
or readers, who normally don't know many about the topic?)?
What annoys me a lot too is these confusions are quite common among
"newbies". I only know of few XSLT tutorials/references that don't lead to
such confusions. When you teach something to someone, everything must be
clearly described and "packaged". If there is a way of being confused about
a notion or else, the reader (not all readers but a quite large part I
guess) will be confused.
Please forgive me if I appear a bit aggressive, I'm not.
>> >> It states XPath allows scientific notation for numbers whereas it's
>> >> "only" a requirement for the future XPath 2.0.
>> >
>> > True, although most XSLT and XPath engines allow sci notation today.
>>
>> That doesn't justify the statement. The book should at least say it's not
>> standard and will only be in the future XPath version.
>
> I don't disagree.
Phew! ;o)
Tom.
Yes I'm aware of the difference. But you're arguing that "XPath pattern"
is invalid, and it should be "XSLT pattern", but I (and I'm guessing
Elloitte does too) think "XPath pattern" is clearer and easier to
understand, because to most people, the match patterns are XPaths, if
only a subset (and yes, they are a valid subset of XPath).
> >> · page 139, The Default Template Rule for Text and Attribute Nodes:
> >> "The text() function is an XPath expression matching all text nodes"
> >> Also note node type tests are called "functions"
> >
> > That's correct too, apart from the use of "function". A correct phrase
> > would be "The text() node test is an XPath expression matching all text
> > nodes".
>
> Look above. An "XPath expression" selects nodes, it doesn't matches them
> (you said that yourself: «It *is* an XPath "pattern". It's just not an
> XPath "expression"»)
*shrug*. I say it's a fine line. Look at XML::XPath (my Perl
implementation of XPath). You can do match or find (select) with exactly
the same code. I think the term "pattern" was used in the correct
places, and certainly good enough to not be confusing. If you want it
down to the level of detail you're after you know where to find the
specs.
You seem to have focused on this issue of match vs select for your bugs
here. I'm sorry but I disagree with your bugs - "XPath pattern" is an OK
term by me, and I think the authors did a great job of fitting XSLT into
such a short space without confusing the reader.
> > I don't know how many books you've worked on, but every single one I
> > have worked on has a *lot* of errors. In fact I was speaking to an
> > O'Reilly editor just the other week, and she was saying the exact same
> > thing.
>
> That's not for reassuring me ;o)
Try writing a book. You'll soon find out ;-)
> What annoys me here is that I sometimes really wonder if the author
> (Elliott Rusty Harold?) understands what he speaks of.
> Isn't such an author supposed to be an expert in that domain? Why does it
> seem to me I know more about these topics that E. R. Harold does ?
> Sometimes that's not knowledge but only reading of specifications (though
> all my knowledge is based on that). Aren't tech editors (sorry) also
> supposed to check accuracy (it's said somewhere that an XPointer is
> relative to the node containing it ; that chocked me and it took 10 seconds
> to check the spec and become sure it was a bug in the book ; who's supposed
> to do some checks? authors and editors, who should be some kind of experts,
> or readers, who normally don't know many about the topic?)?
There's lots of trade offs in writing a book, between pressures of
getting it out on time, and making it 100% accurate, and making it up to
date (especially a problem with XML). We can't be experts on everything,
much as I wish I could be. Not only that, but tech reviewing something
like XMLiaN takes *many* hours of careful reading (bear in mind that
you're reading the *corrected* version), and the compensation for a tech
reviewer certainly doesn't cover the hours put into it.
Not in my opinion, but let's go with "XPath pattern" however.
> (and yes, they are a valid subset of XPath).
XSLT syntax is a subset of XPath syntax, but XPath isn't onyl a syntax, it
has semantics about its evaluation process. XSLT pattern semantics are not
the same as XPath ones at all.
> Look at XML::XPath (my Perl implementation of XPath). You can do match
> or find (select) with exactly the same code.
Of course it's possible since only evaluation step differs from patterns to
expressions. The problem's not there, the problem is that the book doesn't
give clear explainations of:
· what's an expression
· what's a pattern
· how and why they are different
· how an expression is evaluated, step by step (this is well done)
· how a pattern is evaluated, step by step
That's why I said it doesn't make any difference between expressions and
patterns. Switching from expression to pattern and pattern to expression is
quite confusing (see items 2 and 3 in my previous post, and the four last
lines of the related book excerpt ; to me, this sounds confusing)
The question so is "is it 'logical' using a single parsed XPath to select
or match? Wouldn't it be more 'logical' to compute a pattern *and* an
expression (two objects), from the same expression string, and then use one
or the other when you want to match or select?"
> You seem to have focused on this issue of match vs select for your bugs
> here. I'm sorry but I disagree with your bugs - "XPath pattern" is an OK
> term by me, and I think the authors did a great job of fitting XSLT into
> such a short space without confusing the reader.
Ok, forget that match vs. select issue. I didn't even reported any errata
about that.
An important errata was about unqualified names in patterns (still
"unconfirmed" since I posted it in early/mid March). You said it's a bug in
the "drafts versionning".
Another one was related to relative location paths in XPointers.
A third one about exponential notation of numbers.
Elliotte R. Harold seems to be convinced that abbreviated/unabbreviated
syntax is for whole paths, not step-by-step driven. This leads to
statements like "this axis cannot be used in an abbreviated location path"
(chapter 19, page 303). This is quite confusing for the reader. I
personnally reinforced the step-by-step basis of the choosen syntax.
I also totally disagree with the presentation of the // "operator". In page
302, one can read:
"/name
The element or elements with the specified name that are descendants
of the context node or the context node itself"
In a first attempt, I didn't understand the "/name" notation, thinking it
was an absolute location path. This is inconsistent with chapter 9 which
uses // and spells it so.
The text has a mistake as well: // si the abbreviation of
/descendant-or-self::node()/, thus //name si the abbreviation of
/descendant-or-self::node()/child::name. The first location step selects
the context nodes and all its descendant, the second step then follows the
child axis for each of these selected nodes. So //name *cannot* select the
context node, only it's descendant. As long as you don't use predicates
(see the note in the XSLT spec), //name is more or less equivalent to
/descendant::name, not /descendant-or-self::name (as it is suggested on
page 303 in the descendant-or-self axis description: "This axis may be
abbreviated as the forward slash / (which is always preceded by another
forward slash representing the root node or separating this step from the
previous step)").
While translating, I just removed every reference to abbreviated syntax in
pages 303 and 304, change /name on page 302 into // and add a "This is the
abbreviated form of self::node()" -like sentences to each of the five
abbreviated steps descriptions.
This also solves other "not-really-accurate" statements about . and ..
abbreviated steps (the books says the self axis may be abbreviated by a dot
but . actually stands for a complete step self::node(): self::name isn't
equivalent to . for example, as well as parent::name against ..). These are
details but I really don't like such inaccuracy in the name of
understandability: there's always a way of teaching things to newbies
without loosing accuracy.
I found confusing, too, the alternate function signatures using "object"
arguments in pages 306 to 314 while it has already been said XPath is
weakly typed and been explained how an XPath processor converts arguments
to the appropriate, expected data type. Every function description repeats
this too ("Non-strings may be passed to this function as well, in which
case thy're converted to strings automatically as if by the string()
function.")
>> > I don't know how many books you've worked on, but every single one I
>> > have worked on has a *lot* of errors. In fact I was speaking to an
>> > O'Reilly editor just the other week, and she was saying the exact same
>> > thing.
>>
>> That's not for reassuring me ;o)
>
> Try writing a book.
Chiche!? ;o)
(in french, something between "am I on?", "you're on!" and "I bet you I do
it!")
> and making it up to date (especially a problem with XML).
Oh yeah! *sigh*
I proposed adding a paragraph or two about Sun's pattent on XPointer's
technologies and XPointer's xmlns() scheme added in the January version of
the spec.
> We can't be experts on everything,
Of course, that's why, IMO, you must double check every tutorial or
reference you write against official specs and have someone else (tech
reviewer) check one more time after that.
> tech reviewing [books]
That's something I'd like to do!
Thats for this conversation however.
Tom.
I read your on-line Perldoc and didn't find any mention of matching nodes.
Can you enlighten me and/or point to the right direction?
Or maybe we don't have the same notion of matching nodes. In my mind,
"matching" implies having a node to match against a pattern, the result is
true if the node matches the pattern, false otherwise. Or eventually, given
a node-set and a pattern, apply the above scheme to each node in the
node-set and return a new node-set containing each node matching the
pattern.
Any precision is welcome.
Tom.
It's not documented because I'm still unsure if the implementation is
correct :-)
You'll have to look at the source code.
So Matt said something like "matching nodes is implemented in XML::XPath
but not documented because I'm not sure the implementation is correct".
Well, first I don't quite understand what's supposed to be the third,
$context param to XML::XPath->matches(), it doesn't seem to do anything (I
tried using the node to match [how do I retrieve its parent node? I'm not
used to Perl and $node->[node_parent] raises errors], in vain).
I just modified the "xpath" sample, packaged with the module, to select
nodes in a document then for each check whether it matches a pattern.
I just call the script like this "match filename query pattern".
Here is a snippet of code:
if ($nodes->size) {
my $num = 0;
my $pattern = shift @ARGV;
foreach my $node ($nodes->get_nodelist) {
if ($xpath->matches($node, $pattern)) {
$num++;
print "++ ", $node->toString, "\n";
}
else {
print "-- ", $node->toString, "\n";
}
}
print "Found ", $num, " nodes.\n";
}
When using the file provided along with the "xpath" example, I get this
output:
$ ./match test.xml '//employee//*' 'name'
-- <name>
<forename>Matt</forename>
<surname>Sergeant</surname>
</name>
-- <forename>Matt</forename>
-- <surname>Sergeant</surname>
-- <department>Development IT</department>
Found 0 nodes.
Of course, the first node (name) matches the 'name' pattern!
I guess this would work if we pass a "correct" $context but there's no
reason such an argument even exists (or I missed something).
However, if I use an absolute location path pattern, the check's successful:
$ ./match test.xml '//employee//*' '//employee/name'
++ <name>
<forename>Matt</forename>
<surname>Sergeant</surname>
</name>
-- <forename>Matt</forename>
-- <surname>Sergeant</surname>
-- <department>Development IT</department>
Found 1 nodes.
I guess using a relative location path pattern (something like
'employee/name') would fail too, even with a "correct" $context.
Matt, I told you evaluation of a match pattern is totally different from
evaluation of a select expression, it appears here I was right ;o)
Well, actually you can use XPath (select) evaluation for matching nodes:
just apply the pattern as a select expression with context being the parent
node of the node to match, its parent, etc. up to the root node. Then
compare each selected node with the node to match. This is quite costly and
an evaluation process like the one I described in a previous post would be
far more performant.
Tom.