XML parser (also I'm lazy)

Nils Oberg

unread,

Oct 7, 2000, 3:00:00 AM10/7/00

to

Hello,

I'm looking for an XML parser that is fast, memory effiecient (it will be
used under mod_perl), easy to use, and (obviously) written in perl.

By 'easy,' I mean a module where I don't have to write some code to use
it. I need it to treat an XML file as a database and pull out 'records.'
Example:

I have

<dataset did="setid">
<setinfo>
<upper>75263</upper>
<lower>235</lower>
<limit from="45090" to="75263">
</setinfo>
<storyset sid="235">
<author>...
...

I would like to retrieve information like this (pseudocode):

get sid 235 as well as the contents of the author tag
get sid 70000 as well as the contents of the summary tag
...

Before you ask, no, I don't have the budget (or the means) to use a
professional-quality database (i.e. Oracle, mSQL, mySQL, etc.). And I
don't really want to use the CSV driver for DBI.

Any suggestions?

Many thanks,

Nils the Lazy

Martien Verbruggen

unread,

Oct 8, 2000, 1:19:19 AM10/8/00

to

On Sat, 7 Oct 2000 23:06:43 -0500,
Nils Oberg <nob...@students.uiuc.edu> wrote:
> Hello,
>
> I'm looking for an XML parser that is fast, memory effiecient (it will be
> used under mod_perl), easy to use, and (obviously) written in perl.
>
> By 'easy,' I mean a module where I don't have to write some code to use
> it. I need it to treat an XML file as a database and pull out 'records.'
> Example:

You will have to write _some_ code to use it. I suspect you mean you
don't want to write code that handles stuff while parsing it. Instead
you'd lie to have something that gives you a tree of nodes that you can
query, right?

http://search.cpan.org/

search for XML

Have a look at XML::DOM, XML::Grove, XML::Twig, and maybe XML::Simple,
if you don't mind the uglyish data structures it exposes to the user.
XML::QL might be of interest, even though it is still immature.

If you are not interested in a tree, but want to just pull out elements
while parsing the file, try XML::Node, or XML::Parser directly.
You'd be surprised how easy it is to write some code for XML::Parser,
even if you're determined to be a totally lazy slob.

>
> I have
>
> <dataset did="setid">
> <setinfo>
> <upper>75263</upper>
> <lower>235</lower>
> <limit from="45090" to="75263">
> </setinfo>
> <storyset sid="235">
> <author>...
> ...
>
> I would like to retrieve information like this (pseudocode):
>
> get sid 235 as well as the contents of the author tag
> get sid 70000 as well as the contents of the summary tag

You mean, get the storyset with sid 235, right? There is no sid element.

> Nils the Lazy

While laziness can be good, you should at least take the trouble to do
some searching yourself. Learn to use CPAN. learn to find code.

Even with all this information, you will still need to go out and try
some of the modules to find what suits you best. For simple tasks, I
normally just use XML::Parser, or some code I have written, that's built
on a C library that uses expat (sorry, can't share it, company IP) and
do it myself.

Martien
--
Martien Verbruggen | Since light travels faster than
Interactive Media Division | sound, isn't that why some people
Commercial Dynamics Pty. Ltd. | appear bright until you hear them
NSW, Australia | speak?

Bart Lateur

unread,

Oct 8, 2000, 3:00:00 AM10/8/00

to

Nils Oberg wrote:

>I'm looking for an XML parser that is fast, memory effiecient (it will be
>used under mod_perl), easy to use, and (obviously) written in perl.

Er... apart from that last requirement, I'd check out XML::Parser. That
is an XML Parser with core written in C, so it should be pretty damn
fast.

It's been ported to any platform, it's even part of the basic
installation under Win32. I hope that installation can be simple.

>By 'easy,' I mean a module where I don't have to write some code to use
>it. I need it to treat an XML file as a database and pull out 'records.'

>Before you ask, no, I don't have the budget (or the means) to use a

>professional-quality database (i.e. Oracle, mSQL, mySQL, etc.). And I
>don't really want to use the CSV driver for DBI.
>
>Any suggestions?

Well... DBD::RAM can use an XML file as a database (this functionality
is based on XML::Parser, see above). So I've read. I've not actually
tried it out, so I can't garantee that it will do precisely what you
want. But I think that, as far as using an XML file as a database, in an
easy manner, that this module might come as close as possible.

HTH,
Bart.

Jeff Zucker

unread,

Oct 8, 2000, 3:00:00 AM10/8/00

to

Bart Lateur wrote:
>
> Nils Oberg wrote:
>
> >I'm looking for an XML parser that is fast, memory effiecient (it will be
> >used under mod_perl), easy to use, and (obviously) written in perl.

> ...

> >By 'easy,' I mean a module where I don't have to write some code to use
> >it. I need it to treat an XML file as a database and pull out 'records.'
>

> Well... DBD::RAM can use an XML file as a database (this functionality
> is based on XML::Parser, see above). So I've read. I've not actually
> tried it out, so I can't garantee that it will do precisely what you
> want.

It will do what Nils asked for in his posting. Whether what he asked
for is synonomous with what he wants is another question. :-)

Nils, see the .sig for a simple example. If you do try it out and have
difficulties, let me know.

--
Jeff
perl -MDBI -e "$d=DBI->connect('dbi:RAM:');$d->func({data_type=>'XML',
data_source=>'<phrase><w1>Just</w1><w2>Another</w2><w3>Perl</w3><w4>
Hacker</w4></phrase>',record_tag=>'phrase',col_names=>'w1,w2,w3,w4'},
'import');print join ' ',$d->selectrow_array('SELECT * FROM table1')"

Matt Sergeant

unread,

Oct 9, 2000, 3:00:00 AM10/9/00

to

Nils Oberg wrote:

> Hello,

>
> I'm looking for an XML parser that is fast, memory effiecient (it will be
> used under mod_perl), easy to use, and (obviously) written in perl.
>

> By 'easy,' I mean a module where I don't have to write some code to use
> it. I need it to treat an XML file as a database and pull out 'records.'

> Example:

>
> I have
>
> <dataset did="setid">
> <setinfo>
> <upper>75263</upper>
> <lower>235</lower>
> <limit from="45090" to="75263">
> </setinfo>
> <storyset sid="235">
> <author>...
> ...
>
> I would like to retrieve information like this (pseudocode):
>
> get sid 235 as well as the contents of the author tag
> get sid 70000 as well as the contents of the summary tag

> ...

>
> Before you ask, no, I don't have the budget (or the means) to use a
> professional-quality database (i.e. Oracle, mSQL, mySQL, etc.). And I
> don't really want to use the CSV driver for DBI.
>
> Any suggestions?

Given the following:

<?xml version="1.0"?>
<!DOCTYPE dataset [
<!ATTLIST dataset did ID #IMPLIED>
<!ATTLIST storyset sid ID #IMPLIED>
]>

You can use XML::XPath as follows:

use XML::XPath;
my $xp = XML::XPath->new();

my ($storyset) = $xp->findnodes('id("235")');
my ($author) = $storyset->findnodes('author');

print "Author is: ", $author->string_value(), "\n";

Hope that helps.

(and yes, I use XML::XPath under mod_perl every day, although it is an
in-memory model, so it does use a fair amount of memory).

Matt.

Thorbjørn Ravn Andersen

unread,

Oct 9, 2000, 3:00:00 AM10/9/00

to

Bart Lateur wrote:

> Er... apart from that last requirement, I'd check out XML::Parser. That
> is an XML Parser with core written in C, so it should be pretty damn
> fast.

Nope. Java implementations beat Perl easily on this one.

--
Thorbjørn Ravn Andersen "...plus...Tubular Bells!"
http://bigfoot.com/~thunderbear

Martien Verbruggen

unread,

Oct 9, 2000, 3:00:00 AM10/9/00

to

On Mon, 09 Oct 2000 20:45:13 +0200,
Thorbjørn Ravn Andersen <thund...@bigfoot.com> wrote:
> Bart Lateur wrote:
>
> > Er... apart from that last requirement, I'd check out XML::Parser. That
> > is an XML Parser with core written in C, so it should be pretty damn
> > fast.
>
> Nope. Java implementations beat Perl easily on this one.

For various values of 'easily'.

http://www.xml.com/pub/Benchmark/exec.html

As you can see, The Java versions only 'beat' XML::Parser for large
documents. And even then the difference isn't that large. And you may
have specific applications where the Perl implementation would be much
faster than a Java one. One such environment may be mod_perl.

The benchmark is also slightly flawed: In all cases the startup time
of the programs is taken into account. In all cases the benchmark has
only been run on a single platform.

Personal experience shows me that in many cases that I've compared
things, parsing XML with Java is slower than parsing XML with
XML::Parser.

Besides all that, I've found that often the parsing time of XML
documents gets lost in the things one does with the results of the
parsing.

\begin{offtopic}

However, The C implementations beat the others by really significant
amounts.

Bottom line: If you need execution speed, use C (or maybe very well
written C++). And that is almost a general rule, not limited to XML
parsing.

\end{offtopic}

Do you have benchmarks that support your assertion that 'Java
implementations beat Perl easily' in general?

Martien
--
Martien Verbruggen |
Interactive Media Division | If at first you don't succeed,
Commercial Dynamics Pty. Ltd. | destroy all evidence that you tried.
NSW, Australia |