Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Get XML content using XML::Twig

4 views
Skip to first unread message

alwaysonnet

unread,
Apr 21, 2010, 8:35:32 AM4/21/10
to
Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
<B>get the values of Sender, Receiver</B>
<B>get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP</B>

<CODE>
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP
</CODE>
<P>Here is the XML content....</P>
<CODE>
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<ConnectionList>
<Connection>
<Sender>BRADD</Sender>
<Receiver>SHANE</Receiver>
<FileItemList>
<FileItem>
<FileID>378910</FileID>
<Tmstp>2009-01-16T16:59:07+01:00</Tmstp>
<FileType>
<InitTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</InitTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380582</FileID>
<Tmstp>2009-01-20T18:00:00+01:00</Tmstp>
<FileType>
<ReTxTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<RefRAPSeqNo>00044</RefRAPSeqNo>
<RefRAPID>380573</RefRAPID>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-20T18:00:00+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</ReTxTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380573</FileID>
<Tmstp>2009-01-16T20:34:45+01:00</Tmstp>
<FileType>
<FatalRAP>
<RAPSeqNo>00044</RAPSeqNo>
<RAPStatus>Exchanged</RAPStatus>
<RefTAPSeqNo>00083</RefTAPSeqNo>
<RefTAPID>378910</RefTAPID>
<RAPCreatTmstp>2009-01-16T20:21:30+01:00</
RAPCreatTmstp>
<RAPAvailTmstp>2009-01-16T20:21:30+01:00</
RAPAvailTmstp>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>-39</TotalNoOfCalls>
<TotalNetCharge>-11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</FatalRAP>
</FileType>
</FileItem>
</FileItemList>
</Connection>
</ConnectionList>
</Data>
</CODE>

Tad McClellan

unread,
Apr 21, 2010, 9:13:50 AM4/21/10
to
alwaysonnet <kalyanr...@gmail.com> wrote:

> I'm trying to parse the XML using XML::Twig Module


What have you tried so far?

If you show us your broken code we will help you fix it.

Have you seen the Posting Guidelines that are posted here frequently yet?


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.

John Bokma

unread,
Apr 21, 2010, 9:49:55 AM4/21/10
to
alwaysonnet <kalyanr...@gmail.com> writes:

> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple. Please help me out of how to
> print the values based on the following...
> <B>get the values of Sender, Receiver</B>
> <B>get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP</B>

For very simple things like this I would (probably, based on what I just
read) use XML::SAX or (even) XML::Parser. Regarding the latter,
http://johnbokma.com/perl/ has some simple examples under "XML
Processing using Perl"

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development

Klaus

unread,
Apr 21, 2010, 1:06:14 PM4/21/10
to
On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple. Please help me out of how to
> print the values based on the following...
>  <B>get the values of Sender, Receiver</B>
>  <B>get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP</B>
>
> <CODE>
>  get the values of Sender, Receiver
>  get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP
> </CODE>

What Tad McClellan and John Bokma suggested should be your first path
of investigation.

However, let me bring in a shameless plug:

You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

This module is specifically designed to handle very big XML files, it
only uses the memory it needs to have one XML element at a time in
memory (plus a small additional memory for buffering, which is
independent of the size of the XML file)

Here is a sample program:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/Sender', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/Receiver', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
] },
);

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
else {
my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};
my ($type, $seqno) = defined $InitTAP ? ('InitTAP',
$InitTAP)
: defined $ReTxTAP ? ('ReTxTAP',
$ReTxTAP)
: defined $FatalRAP ? ('FatalRAP',
$FatalRAP)
: ('???', '???');

printf "Sender: %-5s, Receiver: %-5s, Type: %-8s, Seqno: %s
\n",
$sender, $receiver, $type, $seqno;
}
}

__DATA__

=======
Here is the output:

Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044

s...@netherlands.com

unread,
Apr 21, 2010, 2:07:55 PM4/21/10
to
On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <kla...@gmail.com> wrote:

>On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
>> Hello all,
>> I'm trying to parse the XML using XML::Twig Module as my XML could be
>> very large to handle using XML::Simple. Please help me out of how to
>> print the values based on the following...
>>  <B>get the values of Sender, Receiver</B>
>>  <B>get the FileType. In this case possible values are
>> InitTAP,FatalRAP,ReTxTAP</B>
>>
>> <CODE>
>>  get the values of Sender, Receiver
>>  get the FileType. In this case possible values are
>> InitTAP,FatalRAP,ReTxTAP
>> </CODE>
>
>What Tad McClellan and John Bokma suggested should be your first path
>of investigation.
>
>However, let me bring in a shameless plug:
>
>You could also use my module XML::Reader
>http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

Indeed shameless.


>
>This module is specifically designed to handle very big XML files, it
>only uses the memory it needs to have one XML element at a time in
>memory (plus a small additional memory for buffering, which is
>independent of the size of the XML file)

Is memory at a premium?


>
>Here is a sample program:
>
>use strict;
>use warnings;
>use XML::Reader;
>
>my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
> { root => '/Data/ConnectionList/Connection/Sender', branch =>
>[ '/' ] },
> { root => '/Data/ConnectionList/Connection/Receiver', branch =>
>[ '/' ] },
> { root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
>FileType', branch => [
> '/InitTAP/TAPSeqNo',
> '/ReTxTAP/TAPSeqNo',
> '/FatalRAP/RAPSeqNo',

^^^^^^^^^^^^
What do these have to do with it?


> ] },
> );
>
>my ($sender, $receiver);
>
>while ($rdr->iterate) {
> if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
> elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
> else {
> my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};

^^^^^^^^^^^^^^^^^^^^^^^^^^^
Again, what do these have to do with it?
[snip]


>=======
>Here is the output:
>
>Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
>Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
>Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044

Thats nice. Lets say he generally said "in this case its:"
InitTAP ReTxTAP FatalRAP
Why? Because its the file type.
Maybe he wants all file types of the sender/reciever's.
But its hard to know what the OP wants isin't it.

-sln

Klaus

unread,
Apr 21, 2010, 2:48:59 PM4/21/10
to
On 21 avr, 20:07, s...@netherlands.com wrote:

> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <klau...@gmail.com> wrote:
> >On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> >> Hello all,
> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> >> very large to handle using XML::Simple. Please help me out of how to
> >> print the values based on the following...
> >>  <B>get the values of Sender, Receiver</B>
> >>  <B>get the FileType. In this case possible values are
> >> InitTAP,FatalRAP,ReTxTAP</B>

> Thats nice. Lets say he generally said "in this case its:"


> InitTAP  ReTxTAP  FatalRAP
> Why? Because its the file type.
> Maybe he wants all file types of the sender/reciever's.

in that case you use XML::Reader->newhd(... {filter => 2});

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
$sender = $rdr->value;
}
elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
$receiver = $rdr->value;
}
elsif ($rdr->is_start
and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
FileItemList/FileItem/FileType/ (\w+) \z}xms) {
printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
$sender, $receiver, $1;
}
}

Here is the output

Sender: BRADD, Receiver: SHANE, Type: InitTAP

Sender: BRADD, Receiver: SHANE, Type: ReTxTAP

Sender: BRADD, Receiver: SHANE, Type: FatalRAP

s...@netherlands.com

unread,
Apr 21, 2010, 8:31:48 PM4/21/10
to

This is pretty good. I assume it does attribute/value as well.
It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.

Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

-sln

Klaus

unread,
Apr 22, 2010, 3:39:33 AM4/22/10
to
On 22 avr, 02:31, s...@netherlands.com wrote:

> On Wed, 21 Apr 2010 11:48:59 -0700 (PDT), Klaus <klau...@gmail.com> wrote:
> >On 21 avr, 20:07, s...@netherlands.com wrote:
> >> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <klau...@gmail.com> wrote:
> >> >On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> >> >> Hello all,
> >> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> >> >> very large to handle using XML::Simple. Please help me out of how to
> >> >> print the values based on the following...
> >> >>  <B>get the values of Sender, Receiver</B>
> >> >>  <B>get the FileType. In this case possible values are
> >> >> InitTAP,FatalRAP,ReTxTAP</B>

> This is pretty good. I assume it does attribute/value as well.

Yes it does, just put an '@' symbol in the path, for example
'/InitTAP/ChargeInfo/@attrib1'

> It appears to be a lot of regex work, the more unknown the
> elements become, but thats a tree stack.
>
> It would be good though to have a capture mechanism, where
> xml capture can be triggered on/off by the user, later to
> be regurgitated to the user (on demand), and given to an
> xml::simple style mechanism to turn it into filtered records.

For simple structures where you know exactly what you are looking for,
you can use {filter => 5} like so

use strict;
use warnings;
use XML::Reader;

use Data::Dumper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},


{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',

'/InitTAP/ChargeInfo/@attrib1',
'/InitTAP/ChargeInfo/TAPCurrency',
'/ReTxTAP/ChargeInfo/TAPCurrency',
'/FatalRAP/ChargeInfo/TAPCurrency',
] },
);

while ($rdr->iterate) {
print Dumper($rdr->rvalue), "\n";
}

> It wouldn't change the simple, low memmory stream parsing at all,
> just the source would be captured (appended) on/off to a named buffer,
> on demand.
> Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
> nested capture's, single data pool. I think I've done this before.

For general capture into a buffer, you would use {filter => 3, using
=> '/Data/ConnectionList/Connection/FileItemList/FileItem/FileType'}

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {
my $indentation = ' ' x ($rdr->level - 1);

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = '';
}
elsif ($rdr->is_end) {
print "\n\n buffer ==>\n", $buffer, "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= $indentation.'<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).
'>'."\n";
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $indentation.' '.$rdr->value."\n";
}

if ($rdr->is_end) {
$buffer .= $indentation.'</'.$rdr->tag.'>'."\n";
}
}

alwaysonnet

unread,
Apr 22, 2010, 4:24:11 AM4/22/10
to

My intention is to ~

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

any help or suggestions are appreciated.


Klaus

unread,
Apr 22, 2010, 4:29:42 AM4/22/10
to
On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple.

Klaus <klau...@gmail.com> wrote:
> However, let me bring in a shameless plug:
> You could also use my module XML::Reader
> http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

s...@netherlands.com wrote:
> > Indeed shameless.
> >
> > [...]


> >
> > It would be good though to have a capture mechanism, where
> > xml capture can be triggered on/off by the user, later to
> > be regurgitated to the user (on demand), and given to an
> > xml::simple style mechanism to turn it into filtered records.

Here is an example of how to use XML::Reader to capture sub-trees from
a (potentially very big) XML file into a buffer and pass that buffer
to XML::Simple:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = qq{<?xml version="1.0" encoding="UTF-8"?
><FileType>};
}
if ($rdr->is_end) {
$buffer .= qq{</FileType>};

use XML::Simple;
use Data::Dumper;

my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= '<'.$rdr->tag.


join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).

'>';
}

if ($rdr->type eq 'T' and $rdr->value ne '') {

$buffer .= $rdr->value;
}

if ($rdr->is_end) {
$buffer .= '</'.$rdr->tag.'>';
}
}

Klaus

unread,
Apr 22, 2010, 5:08:24 AM4/22/10
to
On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple.

On Wed, 21 Apr 2010 10:06:14, Klaus <klau...@gmail.com> wrote:
> What Tad McClellan and John Bokma suggested should be your first
> path of investigation.
> However, let me bring in a shameless plug:
> You could also use my module XML::Reader
> http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

On 21 avr, 20:07, s...@netherlands.com wrote:
> Indeed shameless.

On 22 avr, 10:24, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> My intention is to ~
> - Get each sender and receiver
> - Get the filetype ( could be InitTAP, FatalRAP etc )
> - For each of filetype get the TAPSeqNo, NoofCalls etc....
>
> Basically I want all the information in place for processing the
> data....
>
> Also, apart from XML::Twig, is there any module which can handle
> larger XML files..

As I said before, take the advice of Tad McClellan and John Bokma
first.

If, for whatever reason, you can't follow their advice, (and, for
whatever reason, you can't use XML::Twig either) there is always my
"shameless plug" XML::Reader:

There are, in my opinion, two scenarios:

Scenario 1:
You already know how to parse your XML with XML::Simple, but the XML
file is too big to fit entirely into memory.
In that case, I suggest you follow my example (with XML::Reader) that
I gave in this thread today (where I said: "...Here is an example of
how to use XML::Reader to capture sub-trees...)
see http://groups.google.com/group/comp.lang.perl.misc/msg/4bb3a769d96c1b2e

Scenario 2:
You know the general rules of your XML parsing, but you don't know
which XML module to use (and you can't follow the advice from Tad
McClellan and from John Bokma).
In that case I suggest you follow my example (with XML::Reader) that I
gave in this thread yesterday (where I said: "...use XML::Reader-
>newhd(... {filter => 2})...")
see http://groups.google.com/group/comp.lang.perl.misc/msg/762534f342f939e6

RedGrittyBrick

unread,
Apr 22, 2010, 5:34:54 AM4/22/10
to
On 22/04/2010 09:24, alwaysonnet wrote:
> On Apr 22, 12:39 pm, Klaus<klau...@gmail.com> wrote:
>>
>> [XML::Reader examples and discussion omitted]

>>
>
> My intention is to ~
>
> - Get each sender and receiver
> - Get the filetype ( could be InitTAP, FatalRAP etc )
> - For each of filetype get the TAPSeqNo, NoofCalls etc....
>
> Basically I want all the information in place for processing the
> data....
>
> Also, apart from XML::Twig, is there any module which can handle
> larger XML files..

Well there's the XML::Reader that Klaus has thoughtfully spent time
explaining and providing examples for. You didn't say whether there is
some reason you'd not use that.

>
> any help or suggestions are appreciated.
>

For very arge XML files, the obvious approach to consider is any SAX
parser. Perl SAX modules I've used before include XML::Parser and XML::SAX.

Have you Googled for "Perl SAX" and searched CPAN for SAX?

--
RGB

RedGrittyBrick

unread,
Apr 22, 2010, 5:44:16 AM4/22/10
to

alwaysonnet

unread,
Apr 22, 2010, 7:28:25 AM4/22/10
to
On Apr 22, 2:34 pm, RedGrittyBrick <RedGrittyBr...@spamweary.invalid>
wrote:

I do find XML::Reader quite helpful for me.

I'm comparing my existing code with 40MB of XML file with XML::Simple
and XML::Reader to find out what fits by bill..

alwaysonnet

unread,
Apr 22, 2010, 12:00:18 PM4/22/10
to
I'll post my observations in my next post regarding the comparison
times between XML::Simple and XML::Reader modules...

Anyway, it is good to use Storable module to store my datastructure on
the disk or use it directly. I know this is an irrelevant question in
this context, but I'm trying to understand the possible ways for
parsing the XML file..

>>Code i've tried so far...
use strict;
use XML::Simple;
use Storable;
use Data::Dumper;

my ($XML_FILE) = "sample.xml";

my $mldata = XMLin($XML_FILE);

store \$mldata, 'file';
my $hashref = retrieve('file');

#print Dumper($hashref);

Klaus

unread,
Apr 26, 2010, 4:13:24 PM4/26/10
to
On 22 avr, 10:29, Klaus <klau...@gmail.com> wrote:
> On 21 avr, 14:35, alwaysonnet <kalyanrajsi...@gmail.com> wrote:
> > Hello all,
> > I'm trying to parse the XML using XML::Twig Module as my XML could be
> > very large to handle using XML::Simple.
> Klaus <klau...@gmail.com> wrote:
> > However, let me bring in a shameless plug:
> > You could also use my module XML::Reader
> >http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm
> s...@netherlands.com wrote:
> > > Indeed shameless.
>
> > > [...]
>
> > > It would be good though to have a capture mechanism, where
> > > xml capture can be triggered on/off by the user, later to
> > > be regurgitated to the user (on demand), and given to an
> > > xml::simple style mechanism to turn it into filtered records.
>
> use XML::Reader;
> my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
>     using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
> FileType'});

I have now released XML::Reader 0.34
http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm

This new version allows to write the same program (...the program that
uses XML::Reader to capture sub-trees from a potentially very big XML
file into a buffer and pass that buffer to XML::Simple...) even
shorter:

use strict;
use warnings;
use XML::Reader 0.34;

use XML::Simple;
use Data::Dumper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => '*' },
);

while ($rdr->iterate) {
my $buffer = $rdr->rval;

s...@netherlands.com

unread,
Apr 26, 2010, 5:58:28 PM4/26/10
to

Good job on this.

my $buffer = '';

while ($rdr->iterate) {
$buffer .= $rdr->rval;
}

if (length $buffer) {
my $ref = XMLin('<FileItem>'.$buffer.'</FileItem>');
print Dumper($ref), "\n\n";
}

-sln

John Bokma

unread,
Apr 26, 2010, 8:01:43 PM4/26/10
to
Klaus <kla...@gmail.com> writes:

> my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},

To me filter is very unclear. I understand that it are options to the
program, but just 5 is very confusing. Maybe split "filter" in several
options which combined result in 1,2,3,4,5 ?

why is the constructor called newhd?

anyway, thanks for mentioning this module, I will check it out when I
have more time.

Klaus

unread,
Apr 27, 2010, 2:50:32 AM4/27/10
to
On 26 avr, 23:58, s...@netherlands.com wrote:
> my $buffer = '';
>
> while ($rdr->iterate) {
>    $buffer .= $rdr->rval;
>
> }
>
> if (length $buffer) {
>    my $ref = XMLin('<FileItem>'.$buffer.'</FileItem>');
>    print Dumper($ref), "\n\n";
>
> }

If memory is not important, than you can use use XML::Reader 0.34
qw(slurp_xml):

use strict;
use warnings;
use XML::Reader 0.34 qw(slurp_xml);

use XML::Simple;
use Data::Dumper;

my $root = '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType';
my $lref = slurp_xml(\*DATA, {root => $root, branch => '*'});
my $buffer = join '', map {$$_} @{$lref->[0]};
my $ref = XMLin("<Item>$buffer</Item>");

print Dumper($ref), "\n\n";

Klaus

unread,
Apr 27, 2010, 3:10:06 AM4/27/10
to
On 27 avr, 02:01, John Bokma <j...@castleamber.com> wrote:

> Klaus <klau...@gmail.com> writes:
> > my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
>
> To me filter is very unclear. I understand that it are options to the
> program, but just 5 is very confusing. Maybe split "filter" in several
> options which combined result in 1,2,3,4,5 ?

"filter => 2,3,4,5" is just a construction that has historically grown
inside XML::Reader.

But I agree very much with you, I also find that "filter => 2,3,4,5"
is not expressive at all. I will think of a better way to select the
mode of operation for XML::Reader.

> why is the constructor called newhd?

Thanks for the question.

That, again, is a historic accident. ==> Back in the old days of
XML::Reader ver 0.01, there used to be an option {filter => 1} and the
constructor back then was called new() and defaulted to {filter => 1}.

Then, in version 0.03 (or so) I decided to have the constructor
default to {filter => 2}, but I didn't want to break code that already
used the old default, so I came up with a second constructor called
newhd() that defaults to {filter => 2}.

At some version of XML::Reader the {filter => 1} and its use of the
constructor new() had disappeared. Therefore it is possible now to
rename newhd() back into new(). I think I will go back to constructor
new() in a future version of XML::Reader.

Klaus

unread,
Apr 29, 2010, 3:47:50 PM4/29/10
to
On 27 avr, 09:10, Klaus <klau...@gmail.com> wrote:
> On 27 avr, 02:01, John Bokma <j...@castleamber.com> wrote:
>
> > Klaus <klau...@gmail.com> writes:
> > > my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
>
> > To me filter is very unclear. I understand that it are options to the
> > program, but just 5 is very confusing. Maybe split "filter" in several
> > options which combined result in 1,2,3,4,5 ?
>
> I will think of a better way to select the
> mode of operation for XML::Reader.
>
> > why is the constructor called newhd?
>
> [...] I think I will go back to constructor

> new() in a future version of XML::Reader.

I have now released a new version of XML::Reader (ver
0.35) with some bug fixes, warts removed, relicensing, etc...
http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm

The line I wrote in my previous post (which was for XML::Reader ver
0.34) was:

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},

With the new version 0.35 of XML::Reader, the same line would be
spelled:

my $rdr = XML::Reader->new(\*DATA, {mode => 'branches'},

0 new messages