tree frobbing facilities in Perl6?

Rich Morin

unread,

Dec 24, 2002, 3:02:09 AM12/24/02

to perl6-l...@perl.org

I find myself frobbing trees a lot these days: read in some XML,
wander around in tree-land for a while, then output either more XML
or somesuch. And, quite frankly, it's a bit of a pain.

The issue, as I see it, is that Perl has no "power tools" for dealing
with trees. I will admit that I don't know what these should look
like, but if Perl has them, it's news to me. Here's an example:

Let's say that I've got a daemon which is running ps(1) on a regular
basis and logging the results. A brute force approach would be to
save the raw ASCII output, but these days I'm trying to use XML. So,
I write out the output as (informal) XML:

A bit bulky, bit nicely tagged and serialized. Now, I want to do
something with it. OK, the first thing I do is read it in as a tree.
I use my own SAX handler, because I want a pure Perl way to load in
a tree, preserving order. It loads in something like this:

[ 'log', {},
[ 'ps', { time => 123456789 },
[ 'process', {},
[ 'pid', {}, '123' ],
[ 'pcpu', {}, '4.6' ],
[ 'stat', {}, 'SN+' ],
...
],
],
...
]

The problem is that, although the data structure I've loaded in is a
tree, I generally want to use it as something else. For example, let's
say that I want to "boil down" these log files a bit. This means I
have to pick up the static values (e.g., pid), tally the distribution
of the flag values (e.g., stat), and average the numeric snapshots, as:

foreach $time (sort(keys(%ps))) {
$pid = $ps{$time}{pid} unless defined ($pid);
$pcpu += $ps{$time}{pcpu};
$stat{$ps{$time}{stat}}++;
...
}

My approach to this, currently, is to walk the tree, creating the data
structure I'd _like_ to have, before I try to do the actual work. This
isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see.

More to the point, let's say that I simply want to transform the data
into a different order. In a multiply subscripted array, this is just
a matter of swapping subscripts on the output loop(s). Turning the tree
above into something like:

is not something I want to try in XSLT. I can do it in Perl, of course,
but I end up writing a lot of code. Am I missing something? And, to
bring the posting back on topic, will Perl6 bring anything new to the
campfire?

-r
--
email: r...@cfcl.com; phone: +1 650-873-7841
http://www.cfcl.com/rdm - my home page, resume, etc.
http://www.cfcl.com/Meta - The FreeBSD Browser, Meta Project, etc.
http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series
http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection

Michael G Schwern

unread,

Dec 24, 2002, 4:29:42 AM12/24/02

to Rich Morin, perl6-l...@perl.org

I'm going to take a left turn in replying and say that your approach to the
problem is causing the problem. This is diverging from the question of tree
manipulation, but I don't think that's what you really need.

Anyhow, on with the show...

On Tue, Dec 24, 2002 at 12:02:09AM -0800, Rich Morin wrote:
> Let's say that I've got a daemon which is running ps(1) on a regular
> basis and logging the results. A brute force approach would be to
> save the raw ASCII output, but these days I'm trying to use XML. So,
> I write out the output as (informal) XML:
>
> <log>
> <ps time=123456789>
> <process>
> <pid>123</>
> <pcpu>4.6</>
> <stat>SN+</>
> ...
> </process>
> </ps>
> ...
> </log>

So with simple data like this, I'd just use YAML. This isn't really
important, just a YAML plug. :) But it does have a better resulting data
structure as we'll see below.

- time: 123456789
processes:
- pid: 123
pcpu: 4.6
stat: SN+
- pid: 234
pcpu: 2.3
stat: R
- time: 234567890
processes:
- pid: 123
pcpu: 2.4
stat: R
- pid: 456
pcpu: 3.4
stat: SN

(I've eliminated the redundant "log" and "ps" parts)

> A bit bulky, bit nicely tagged and serialized. Now, I want to do
> something with it. OK, the first thing I do is read it in as a tree.
> I use my own SAX handler, because I want a pure Perl way to load in
> a tree, preserving order. It loads in something like this:
>
> [ 'log', {},
> [ 'ps', { time => 123456789 },
> [ 'process', {},
> [ 'pid', {}, '123' ],
> [ 'pcpu', {}, '4.6' ],
> [ 'stat', {}, 'SN+' ],
> ...
> ],
> ],
> ...
> ]
>
> The problem is that, although the data structure I've loaded in is a
> tree, I generally want to use it as something else.

And there's your problem. The data struture you've created above is not
really a comfortable one in Perl. You're trying to create a Tree-like
structure using array references as nodes. This is awkward. Instead, use
hashes. Here's how YAML dumps the structure:

my @ps_snapshots = [
{
'processes' => [
{
'stat' => 'SN+',
'pcpu' => '4.6',
'pid' => '123'
},
{
'stat' => 'R',
'pcpu' => '2.3',
'pid' => '234'
}
],
'time' => '123456789'
},
{
'processes' => [
{
'stat' => 'R',
'pcpu' => '2.4',
'pid' => '123'
},
{
'stat' => 'SN',
'pcpu' => '3.4',
'pid' => '456'
}
],
'time' => '234567890'
}
]

Since YAML itself is made up of hashes and arrays, it maps very well into
Perl. The XML tree structure comes off awkward because Perl has no native
tree handling.

At this point you've got a fairly straightforward hash of list style
structure rather than the oddly put together set of array refs as tree
nodes.

> For example, let's
> say that I want to "boil down" these log files a bit. This means I
> have to pick up the static values (e.g., pid), tally the distribution
> of the flag values (e.g., stat), and average the numeric snapshots, as:
>
> foreach $time (sort(keys(%ps))) {
> $pid = $ps{$time}{pid} unless defined ($pid);
> $pcpu += $ps{$time}{pcpu};
> $stat{$ps{$time}{stat}}++;
> ...
> }

I'm not sure I follow the code above, but I'll do something similar. I'll
tally up all the flag values.

for @ps_snapshots -> $snap {
for @$snap{processes} -> $process {
%stats{$proc{stat}}++;
}
}

> My approach to this, currently, is to walk the tree, creating the data
> structure I'd _like_ to have, before I try to do the actual work. This
> isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see.

Basically, we're just manipulating a straight-forward list of hashes of
lists. The already naturally formatted structure by YAML avoids the
necessity to create the intermediate structure. Despite my use of Perl 6,
you can do the same in Perl 5.

That sort of look I've written above can probably better be done using
hyper-operators, but I'll let someone else take a stab at that. I'm also
not sure what the slicing syntax is, so I made something up.

> More to the point, let's say that I simply want to transform the data
> into a different order. In a multiply subscripted array, this is just
> a matter of swapping subscripts on the output loop(s). Turning the tree
> above into something like:
>
> <process pid="123">
> <time>123456789,...</>
> <pcpu>4.6,...</>
> <stat>SN+,...</>
> </process>

Sort of an odd structure, but ok. Here's how I'd flip around the YAML
structure (again with the caveat about hyperoperators).

for @ps_shapshots -> $snapshot {
my $time = $snapshot{time};

for @$snapshot{processes} -> $proc {
my $pid = $proc{pid};
push @%procs{$pid}{time}, $time;

for qw(stat pcpu pid) -> $key {
push @%procs{$pid}{$key}, $proc{$key};
}
}
}

YAML::Dump(%procs);

This would produce something like:

123:
time: [123456789, 234567890]
pcpu: [4.6, 2.4]
stat: [SN+, R]
234:
time: [123456789]
pcpu: [2.3]
stat: [R]
456:
time: [234567890]
pcpu: [3.4]
stat: [SN]

> is not something I want to try in XSLT. I can do it in Perl, of course,
> but I end up writing a lot of code. Am I missing something?

I think your external format (XML which is a tree) is not mapping well to
your internal format (Perl which uses hashes,arrays and scalars) causing you
to have to shuffle your awkward XML->tree structure into something more
Perlish. By picking an external format, YAML, which maps better to your
internal format you can avoid the intermediate step.

Alternatively, I'm sure you can rewrite your XML parser to produce a
structure similar to that which YAML produces. The point being to pull in
your data in a way which better fits Perl.

> And, to bring the posting back on topic, will Perl6 bring anything
> new to the campfire?

Hyperoperators will help. A simplified slicing syntax, especially when
dealing with references, will help. A simplified reference syntax helps,
too.

And, of course, Perl 6 will hopefully ship with a YAML parser. ;)

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One
My enormous capacity for love is being WASTED on YOU guys
-- http://www.angryflower.com/497day.gif

Simon Cozens

unread,

Dec 24, 2002, 10:42:16 AM12/24/02

to perl6-l...@perl.org

r...@cfcl.com (Rich Morin) writes:
> I find myself frobbing trees a lot these days

So that's where the ents came from.

--
Within a computer, natural language is unnatural.

Rich Morin

unread,

Dec 24, 2002, 12:51:07 PM12/24/02

to perl6-l...@perl.org

At 1:29 AM -0800 12/24/02, Michael G Schwern wrote:
>I'm going to take a left turn in replying and say that your approach to the
>problem is causing the problem. This is diverging from the question of tree
>manipulation, but I don't think that's what you really need.

Well-meant suggestions are always welcome!

>So with simple data like this, I'd just use YAML. This isn't really
>important, just a YAML plug. :) But it does have a better resulting data
>structure as we'll see below.

I went to a talk on YAML and was quite impressed, overall. My main issue
with it is that it isn't "buzzword-compliant". As I'm hoping to have other
folks write programs to read my files at some point, this may be an issue.

> - time: 123456789
> processes:
> - pid: 123

This is definitely cleaner-looking than my XML!

>And there's your problem. The data structure you've created above is not

>really a comfortable one in Perl. You're trying to create a Tree-like
>structure using array references as nodes. This is awkward. Instead, use
>hashes. Here's how YAML dumps the structure:
>
>my @ps_snapshots = [
> {
> 'processes' => [
> {
> 'stat' => 'SN+',
> 'pcpu' => '4.6',
> 'pid' => '123'
> },

...

I can see that this structure would be far easier to traverse.

>Sort of an odd structure, but ok. Here's how I'd flip around the YAML
>structure (again with the caveat about hyperoperators).

Again, this seems reasonable, largely because you're able to use arrays
and hashes.

>I think your external format (XML which is a tree) is not mapping well to
>your internal format (Perl which uses hashes,arrays and scalars) causing you
>to have to shuffle your awkward XML->tree structure into something more
>Perlish. By picking an external format, YAML, which maps better to your
>internal format you can avoid the intermediate step.

I can see this, but the issue of buzzword compliance is still there. Maybe
I should just emit XML upon request, like M$ (:-).

>Alternatively, I'm sure you can rewrite your XML parser to produce a
>structure similar to that which YAML produces. The point being to pull in
>your data in a way which better fits Perl.

Part of the problem in using a single example is that it can't show the
entire range of possibilities. However, because YAML is a closer match
than XML to Perl data structures, it should always be at least as
comfortable a fit to a given problem.

>Hyperoperators will help. A simplified slicing syntax, especially when
>dealing with references, will help. A simplified reference syntax helps,
>too.

I will be interested to see how all these turn out (:-).

>And, of course, Perl 6 will hopefully ship with a YAML parser. ;)

Cool!

Michael G Schwern

unread,

Dec 24, 2002, 4:54:29 PM12/24/02

to Rich Morin, perl6-l...@perl.org

On Tue, Dec 24, 2002 at 09:51:07AM -0800, Rich Morin wrote:
> >So with simple data like this, I'd just use YAML. This isn't really
> >important, just a YAML plug. :) But it does have a better resulting data
> >structure as we'll see below.
>
> I went to a talk on YAML and was quite impressed, overall. My main issue
> with it is that it isn't "buzzword-compliant". As I'm hoping to have other
> folks write programs to read my files at some point, this may be an issue.

FWIW there's Perl, Ruby and Python implementations. A C library is in the
works and I think someone's doing a Java one.

And, of course, you can always just...

$ xyx ps.yml > ps.xml
$ cat ps.xml
<ps>
<processes>
<stat>SN+</stat>
<pcpu>4.6</pcpu>
<pid>123</pid>
</processes>
<processes>
<stat>R</stat>
<pcpu>2.3</pcpu>
<pid>234</pid>
</processes>
<time>123456789</time>
</ps>
<ps>
<processes>
<stat>R</stat>
<pcpu>2.4</pcpu>
<pid>123</pid>
</processes>
<processes>
<stat>SN</stat>
<pcpu>3.4</pcpu>
<pid>456</pid>
</processes>
<time>234567890</time>
</ps>

(For this example I put the top level "ps:" back into the YAML so it would
translate better into XML)

xyx is just a really thin wrapper around YAML.pm and XML::Simple.

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One

It wasn't false, just differently truthful.
-- Abhijit Menon-Sen in <2001110818...@lustre.dyn.wiw.org>

Dave Whipp

unread,

Dec 24, 2002, 6:30:09 PM12/24/02

to perl6-l...@perl.org, Rich Morin

Rich Morin wrote:
> is not something I want to try in XSLT. I can do it in Perl, of course,
> but I end up writing a lot of code. Am I missing something? And, to
> bring the posting back on topic, will Perl6 bring anything new to the
> campfire?

I think that one of the things that Perl6 will bring is continuations.
This will enable you to treat a tree traversal in the same way as any
other list.

For example:

for $tree.depth_first_traversal("process") -> $node
{
...
}

There would be no need to obscure the client-code with the details of
hierarchical navigation. (Question: can I use C<yield> inside a
recursive implementation of the iterator?)

Dave.