[Given that there were little Perl-specific matter in this
subthread, cross-posting back to news:comp.text.xml, and setting
Followup-To: there.]
>> However, AIUI, the code above implies that the XML tree is to be
>> traversed multiple times.
> First off, I'd suggest that you consider XSLT or XQuery, which are
> specifically designed for this kind of find-and-process operation.
I see little advantage in using XSLT for my task (and I'm not
familiar with XQuery), as XML is not the only data source I need
to interface. (E. g., I'm also accessing an SQLite database.)
The usual benefits of XSLT -- the existence of browser-based
implementations and its "Lisp-like" nature (in that it uses the
same syntax for both the code and data) -- do not seem to apply.
> What you're looking for is a "streaming processor" -- one which
> rewrites the complete set of operations into a state machine which
> can produce its results in a single pass over the nodes.
Indeed, thanks for clarification!
> There are XPath/XSLT/XQuery systems which attempt to do this for a
> subset of the query language -- I think Xerces
Is it Apache Xerces [1]? It doesn't seem to include either XSLT
or XQuery.
[1]
https://xerces.apache.org/
> and the IBM XML parser
Which is?
> have streaming-subset XPath evaluators, and I know the DataPower "xml
> appliance" machines have some limited XSLT streaming capability --
> but even as subsets, those are fairly rare, and while they may be
> able to reduce storage by not keeping the entire document model in
> memory they may not reduce computational load. If you're looking for
> something off-the-shelf, that's where I'd start.
ACK, thanks. My XMLs are rather small, so I'm more interested
in reducing computational load than memory usage. But even that
is not a priority right now. Rather, I'm looking for the ways
to avoid total code rewrite at some later point.
I guess I should check XML::Twig. Or, given that the conditions
that I currently need to consider are rather simple, a
straight-forward ->childNodes ()-based, no-XPath implementation
may be possible.
[...]
> (I'm one of the authors of a patent on that topic, actually -- US
> 8,120,789 B2 -- but unfortunately our group didn't get the funding to
> finish a product-quality implementation of that logic so it isn't
> available for use. If someone wants to license the patent, I'm sure
> IBM would be delighted to talk to you...)
I believe that I may be under a jurisdiction which has no notion
of software patents. (Subject to the reading of TRIPS, though.)