Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

a new Tree::Interval-like module on top of Tree::RB?

4 views
Skip to first unread message

Ivan Shmakov

unread,
Jun 13, 2013, 1:13:28 PM6/13/13
to
So, I've crafted the base for a Tree::Interval-like module,
implemented on top of Tree::RB, and I'm open to suggestions of
what'd be a suitable name for it?

The obvious candidates are Tree::RB::Interval and
Tree::Interval::RB, but I'm uncertain if it'd be nice to
infiltrate someone's else CPAN namespace in this case.

FWIW, Tree::Interval::RB seems more appropriate, as the code as
it is requires only the compare, delete, insert and lookup
methods from the "backend," and thus could probably be not that
hard to generalize. (Although the lookup method is expected to
allow for "imprecise" matching, and then it demands a way to
iterate onwards from the node found...)

A nice feature of this code is that it relies on the tree's own
"compare" function, which could be rather arbitrary. Thus, it's
possible to, say:

my $map
= XXX->new (...);
$map->set ("from", "to", "value");
my $a
= $map->lookup ("fuzzy");
## => "value"

(provided that the tree is built around the "cmp" function.)

TIA.

--
FSF associate member #7257 np. faceday.mod

Ivan Shmakov

unread,
Jun 27, 2013, 1:27:43 AM6/27/13
to
>>>>> Ivan Shmakov <onei...@gmail.com> writes:

> So, I've crafted the base for a Tree::Interval-like module,
> implemented on top of Tree::RB, and I'm open to suggestions of what'd
> be a suitable name for it?

[...]

> FWIW, Tree::Interval::RB seems more appropriate, as the code as it is
> requires only the compare, delete, insert and lookup methods from the
> "backend," and thus could probably be not that hard to generalize.

So, I've ended up with Tree::Range::RB, and Tree::Range::base
for the ->get_range () and ->range_set () methods. Both are now
released as Tree::Range 0.1. The CPAN page for the version is:

http://search.cpan.org/~onegray/Tree-Range-0.1/

[...]

PS. The namespace registration request is filed and pending approval.

Ben Morrow

unread,
Jun 27, 2013, 2:47:20 AM6/27/13
to

Quoth Ivan Shmakov <onei...@gmail.com>:
>
> So, I've ended up with Tree::Range::RB, and Tree::Range::base
> for the ->get_range () and ->range_set () methods. Both are now
> released as Tree::Range 0.1. The CPAN page for the version is:
>
> http://search.cpan.org/~onegray/Tree-Range-0.1/
>
> [...]
>
> PS. The namespace registration request is filed and pending approval.

The namespace registration process is pretty-much defunct at this point.
Both Tree::Range::RB and Tree::Range::base are already as 'properly
registered' as they need to be; that is, they are listed in
02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
will find your distributions for those package names.

However, it goes against convention to have a distribution called
Tree-Range that does not contain a module Tree::Range. You should
consider either renaming Tree::Range::base or at least providing a
documentation-only Tree::Range explaining why there isn't a module
there.

Ben

Ivan Shmakov

unread,
Jun 27, 2013, 4:12:08 AM6/27/13
to
>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:

>> So, I've ended up with Tree::Range::RB, and Tree::Range::base for
>> the ->get_range () and ->range_set () methods. Both are now
>> released as Tree::Range 0.1. The CPAN page for the version is:

>> http://search.cpan.org/~onegray/Tree-Range-0.1/

>> PS. The namespace registration request is filed and pending
>> approval.

> The namespace registration process is pretty-much defunct at this
> point. Both Tree::Range::RB and Tree::Range::base are already as
> 'properly registered' as they need to be; that is, they are listed in
> 02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
> will find your distributions for those package names.

Yes.

Still, it's recommended by [1] (though feel free to file a bug
report if you think it shouldn't be):

[...] A registration is not a prerequisite for uploading. It is
just recommended for better searchability of the CPAN and to avoid
namespace clashes. [...]

Besides, it'll add some useful bits of information to [2].

[1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod
[2] http://search.cpan.org/~onegray/

> However, it goes against convention to have a distribution called
> Tree-Range that does not contain a module Tree::Range. You should
> consider either renaming Tree::Range::base

Won't this go against the example set by Digest::base? (which I
tend to think as a good enough to stick to.)

> or at least providing a documentation-only Tree::Range explaining why
> there isn't a module there.

I guess I'd consider the latter for 0.2. (Which I hope'll be
the release fixing only the documentation and the metadata.)

The other option would be to have Tree::Range->new () behave
like Digest->new ().

Ben Morrow

unread,
Jun 27, 2013, 6:24:35 AM6/27/13
to

Quoth Ivan Shmakov <onei...@gmail.com>:
> >>>>> Ben Morrow <b...@morrow.me.uk> writes:
>
> > The namespace registration process is pretty-much defunct at this
> > point. Both Tree::Range::RB and Tree::Range::base are already as
> > 'properly registered' as they need to be; that is, they are listed in
> > 02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
> > will find your distributions for those package names.
>
> Yes.
>
> Still, it's recommended by [1] (though feel free to file a bug
> report if you think it shouldn't be):
>
> [...] A registration is not a prerequisite for uploading. It is
> just recommended for better searchability of the CPAN and to avoid
> namespace clashes. [...]

See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/master/
lancaster-consensus.md#module-registration> (and indeed the rest of that
document). While this is a 'new' agreement, in the Perl world agreements
and specs usually come after implementation, so it is describing the
way things currently work in practice.

> Besides, it'll add some useful bits of information to [2].

...which noone but you will ever look at :).

> > However, it goes against convention to have a distribution called
> > Tree-Range that does not contain a module Tree::Range. You should
> > consider either renaming Tree::Range::base
>
> Won't this go against the example set by Digest::base? (which I
> tend to think as a good enough to stick to.)

It's not a convention I've seen before, but there's nothing wrong with
it. (It would be more usual to be ::Base, at least; the analogy with
base.pm is not really relevant any more.)

> > or at least providing a documentation-only Tree::Range explaining why
> > there isn't a module there.
>
> I guess I'd consider the latter for 0.2. (Which I hope'll be
> the release fixing only the documentation and the metadata.)
>
> The other option would be to have Tree::Range->new () behave
> like Digest->new ().

Yeah, perhaps. I haven't really got a sense of what this module's useful
for, so I can't comment.

Ben

Ivan Shmakov

unread,
Jun 27, 2013, 7:05:31 AM6/27/13
to
>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:
>>>>> Ben Morrow <b...@morrow.me.uk> writes:

[...]

>> Still, it's recommended by [1] (though feel free to file a bug
>> report if you think it shouldn't be):

>> [...] A registration is not a prerequisite for uploading. It is
>> just recommended for better searchability of the CPAN and to avoid
>> namespace clashes. [...]

> See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/
> master/lancaster-consensus.md#module-registration> (and indeed the
> rest of that document). While this is a 'new' agreement, in the Perl
> world agreements and specs usually come after implementation, so it
> is describing the way things currently work in practice.

So, I guess [1] is going to be updated within a few months from
now. Thanks.

>> [1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod

[...]

>>> However, it goes against convention to have a distribution called
>>> Tree-Range that does not contain a module Tree::Range. You should
>>> consider either renaming Tree::Range::base

>> Won't this go against the example set by Digest::base? (which I tend
>> to think as a good enough to stick to.)

> It's not a convention I've seen before, but there's nothing wrong
> with it. (It would be more usual to be ::Base, at least; the analogy
> with base.pm is not really relevant any more.)

AIUI, the "capital-case" names under Digest:: are reserved for
the actual message digest algorithms. For instance,
Digest->new ("SHA-1") ends up in a call to Digest::SHA->new (1).

>>> or at least providing a documentation-only Tree::Range explaining
>>> why there isn't a module there.

>> I guess I'd consider [it] for 0.2. (Which I hope'll be the release
>> fixing only the documentation and the metadata.)

>> The other option would be to have Tree::Range->new () behave like
>> Digest->new ().

> Yeah, perhaps. I haven't really got a sense of what this module's
> useful for, so I can't comment.

One of the possible uses of Tree::Range::RB specifically is to
operate on .newsrc data. Consider, e. g.:

### gxmj7tsetytsnbnp85yhxyjror.perl -*- Perl -*-

### Ivan Shmakov, 2013

### Code:

use common::sense;

require Tree::Range::RB;

sub ranges_parse {
my ($in, @ranges) = @_;
foreach my $range (@ranges) {
my ($a, $b)
= ($range =~ /^([0-9]+)(?:-([0-9]+))?$/)
or die ($range, ": is not a vaild range");
$in->range_set ($a, 1 + ($b // $a), 1);
}
## .
$in;
}

my @comp_lang_perl_misc_s = qw {
1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
27638-27996
27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
28036-28093
};

my $comp_lang_perl_misc
= Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
"equal-p" => sub { $_[0] eq $_[1]; } });
ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);

{
local ($,, $\)
= (", ", "\n");

## check if 28007 was read? (i. e., what range 28007 belongs to?)
print ($comp_lang_perl_misc->get_range (28007));
## => 1, 28006, 28009

## now the user has marked some more articles as read
ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

## check if 28011 was read? (i. e., what range 28011 belongs to?)
print ($comp_lang_perl_misc->get_range (28011));
## => 1, 28004, 28014

## note how the range was extended as the gaps were filled
}

### gxmj7tsetytsnbnp85yhxyjror.perl ends here

Similarly, by utilizing the Tree::RB's own iterator, a simple
and efficient serializer for this kind of data can be built.

Ben Morrow

unread,
Jun 27, 2013, 7:50:48 AM6/27/13
to

Quoth Ivan Shmakov <onei...@gmail.com>:
>
> One of the possible uses of Tree::Range::RB specifically is to
> operate on .newsrc data. Consider, e. g.:
>
>
> my @comp_lang_perl_misc_s = qw {
> 1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
> 27638-27996
> 27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
> 28036-28093
> };
>
> my $comp_lang_perl_misc
> = Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
> "equal-p" => sub { $_[0] eq $_[1]; } });
> ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);
>
> {
> local ($,, $\)
> = (", ", "\n");
>
> ## check if 28007 was read? (i. e., what range 28007 belongs to?)
> print ($comp_lang_perl_misc->get_range (28007));
> ## => 1, 28006, 28009
>
> ## now the user has marked some more articles as read
> ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

Is this importantly different from Set::IntSpan/::Fast/::XS?

Ben

Ivan Shmakov

unread,
Jun 27, 2013, 9:11:19 AM6/27/13
to
>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:

>> One of the possible uses of Tree::Range::RB specifically is to
>> operate on .newsrc data. Consider, e. g.:

[...]

>> my $comp_lang_perl_misc
>> = Tree::Range::RB->new ({ "cmp" => sub { $_[0] <=> $_[1]; },
>> "equal-p" => sub { $_[0] eq $_[1]; } });

[...]

>> print ($comp_lang_perl_misc->get_range (28007));
>> ## => 1, 28006, 28009

[...]

> Is this importantly different from Set::IntSpan/::Fast/::XS?

Yes, although not in the case of .newsrc.

First of all, each range (span, interval, etc.) may have an
associated value. It's "1" (or "undef", for the "complementary"
ranges) in the example above, but it needs not to be.

Moreover, unlike Set::IntSpan, Tree::Range::RB is applicable to
an arbitrary ordered set. (Note the "cmp" option above.)

Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
documentation) implemented on top of an ordered list seems to
imply the insertion and deletion times of O (N), while for a
red-black tree they're both O (log N). Though ::XS may still
outperform a pure-Perl tree-based implementation on reasonable
use cases.

Ivan Shmakov

unread,
Jul 2, 2013, 3:37:37 PM7/2/13
to
>>>>> Ivan Shmakov <onei...@gmail.com> writes:

[...]

> Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
> documentation) implemented on top of an ordered list seems to imply
> the insertion and deletion times of O (N), while for a red-black tree
> they're both O (log N). Though ::XS may still outperform a pure-Perl
> tree-based implementation on reasonable use cases.

So, I've ran a simplistic benchmark over the three, which is,
roughly (the exact code is as shown below):

# my ($size, $iter) = (4194304, 5);
my @vec
= (map { int (rand ($size)); } (1 .. $size));

my $set
= Set::IntSpan::Fast->new ();
$set->add ($_)
foreach (@vec);

It took awhile to complete, but the results seem to suggest that
Tree::Range::RB is indeed faster than both Set::IntSpan::Fast
and Set::IntSpan, should the problem size be sufficiently large.

Size: 4194304 Iterations: 5 (negative is time in seconds)
Tree::Range::RB: 8003 wallclock secs (7944.34 usr 5.88 sys + 0.00 cusr 0.00 csys = 7950.22 CPU) @ 0.00/s (n=5)
Set::IntSpan: 17759 wallclock secs (17556.78 usr 10.12 sys + 0.00 cusr 0.00 csys = 17566.90 CPU) @ 0.00/s (n=5)
Set::IntSpan::Fast: 19318 wallclock secs (19060.33 usr 9.25 sys + 0.00 cusr 0.00 csys = 19069.58 CPU) @ 0.00/s (n=5)
s/iter Set::IntSpan::Fast Set::IntSpan Tree::Range::RB
Set::IntSpan::Fast 3814 -- -8% -58%
Set::IntSpan 3513 9% -- -55%
Tree::Range::RB 1590 140% 121% --
44565.74user 26.12system 12:31:28elapsed 98%CPU (0avgtext+0avgdata 972212maxresident)k
224inputs+0outputs (1major+316600minor)pagefaults 0swaps

Or did I make some silly mistake with that?

TIA.

### Code:

use common::sense;

use Benchmark qw (cmpthese timethis);
use Scalar::Util qw (looks_like_number);

require Set::IntSpan;
require Set::IntSpan::Fast;
require Tree::Range::RB;

my ($size, $iter)
= (map { (/^0/ ? oct ($_) : $_); } (@ARGV));
$size
//= 0x100000;
$iter
//= -521;
looks_like_number ($size)
or die ($size, ": Size given is not a number\n");
looks_like_number ($iter)
or die ($iter, ": Iterations count given is not a number\n");
warn ("Size: ", 0 + $size, " Iterations: ", 0 + $iter,
" (negative is time in seconds)\n");
my @vec
= (map { int (rand ($size)); } (1 .. $size));

sub xcmp {
## .
$_[0] <=> $_[1];
}

sub xeq {
## .
$_[0] eq $_[1];
}

my $trr
= timethis ($iter,
sub {
my $rat_opt = {
"cmp" => \&xcmp,
"equal-p" => \&xeq
};
my $value
= [ "*value*" ];
my $rat
= Tree::Range::RB->new ($rat_opt);
$rat->range_set ($_, 1 + $_, $value)
foreach (@vec);
},
"Tree::Range::RB", "all");
my $sis
= timethis ($iter,
sub {
my $set
= Set::IntSpan->new ();
$set->insert ($_)
foreach (@vec);
},
"Set::IntSpan", "all");
my $sisf
= timethis ($iter,
sub {
my $set
= Set::IntSpan::Fast->new ();
$set->add ($_)
foreach (@vec);
},
"Set::IntSpan::Fast", "all");

cmpthese ({
"Set::IntSpan" => $sis,
"Set::IntSpan::Fast" => $sisf,
"Tree::Range::RB" => $trr
},
"all");

--
FSF associate member #7257 np. ybjsaf.it
0 new messages