On 2014-12-24 at 09:55:46 -0500, Chris Marshall wrote:
> Very cool! Thanks for expanding the space of perl and PDL computation! In
> your work, did you determine anything PDL3 would need to do a better job to
> support using R from perl?
>
Sure, there were a couple things that would have been nice to have:
For Data::Frame,
- It's a small thing, but a way to "plug-in" to the stringification for
PDL subclasses would make implementing subclasses easier. Right
now, PDL's `string` method is a bit of a black-box because it
stringifies all the elements at once. Instead, I had to write my own
string1d function [^stringifiable].
- Make a hash-based PDL the default. While using the `initialize` function
combined with `FOREIGNBUILDARGS` is an easy way to get PDL working
with Moo[se], it is extra code [^moo-hash-pdl].
- It might be useful to have annotations of all functions that do not
change the values of elements. I am using that for enum-like data
where I want the levels (the possible values of the enum) to be copied
over to new enum-like PDLs. So I wrap the following methods:
around qw(slice uniq dice) => sub { ... };
but I'm not sure if that covers everything [^around-enum].
My thoughts on this: perhaps the PDL class has too many methods by
default. There should be a way to pare that down using roles, but
deciding what goes in each role does not seem straightforward to me at
this time.
For Statistics::NiceR,
- The way that R stores data is inside a SEXP C structure. You can reach
inside and get at the data by using a macro which points to the memory
address like:
SEXP r_sexp_integer, r_sexp_real;
INTEGER(r_sexp_integer)[ idx ] /* access the int32_t value at idx */
REAL(r_sexp_real)[ idx ] /* access the double value at idx */
Currently, I'm just using memcpy() to get the R data into a PDL. I
haven't used pdl_wrap() on the R data yet, but I plan to soon. But
what I'm wondering is: can I change the way PDL allocates data so that
it will create the R's SEXP C structure in the background — perhaps
limited to a scope? This might be YAGNI, but it might have
implications for things like GPU support. Instead of having to
explicitly create GPU arrays all the time, there should be a way of
indicating that a piece of code will be using a different allocator
than usual.
- Speaking of different allocation types, it might be useful to look at
how other tools extend their built-in types. I'll give some R
examples:
- R's bigmemory <
http://cran.r-project.org/web/packages/bigmemory/index.html>,
<
http://www.stat.yale.edu/~mjk56/temp/bigmemory-vignette.pdf>,
<
http://2013.hpcs.ca/wp-content/uploads/2013/07/HPCS2013-Parallel-Work-with-R.pdf>.
Not only does this support mmap'ed files (like PDL::IO::{FastRaw,FlexRaw}),
but they also have associated packages that have specialised
versions things like linear regression (in biglm) and k-means
clustering (in biganalytics).
- R's GMP <
http://cran.r-project.org/web/packages/gmp/index.html>.
It's a wrapper for the GMP library for big integers/rationals, but
it also lets you create matrices of big numbers which can be used
for solving a system of equations (solve.bigz).
[^stringifiable]: Role that lets elements stringify themselves
<
https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Stringifiable.pm>.
[^moo-hash-pdl]: <
https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Factor.pm> has the following code:
use Moo;
extends 'PDL';
around new => sub {
my $orig = shift;
my ($class, @args) = @_;
# snip...
unshift @args, _data => $enum;
my $self = $orig->($class, @args);
# snip...
}
sub FOREIGNBUILDARGS {
my ($self, %args) = @_;
( $args{_data} );
}
sub initialize {
bless { PDL => PDL::null() }, shift;
}
[^around-enum]: <
https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Enumerable.pm#L46>.
Cheers,
- Zaki Mughal
> --Chris
> > _______________________________________________
> > Perldl mailing list
> >
Per...@jach.hawaii.edu
> >
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >