recent activity

22 views
Skip to first unread message

A.J. Rossini

unread,
Feb 26, 2014, 7:26:15 AM2/26/14
to lisp-stat
Dear all -

Here's a quick summary of lisp-stat related activities.

1. thinking about modelframes.    So the reason I like R so much is that dataframes and model specifications make statistical modeling and analysis so "obvious and clear".  The reason I am less than satisfied is that dataframes and R's models are so 1980...

On my CLS local branch (pushed to github) have a new file in src/data/ modelframe.lisp which starts to implement this.  Basically, a model frame is a column-typed array (or a collection of column-stores, or a....) which can be accessed and computed with through variable names and subject IDs.  More later as I think think through them. 

Goals are to have a means to create it, and a means to compute row-wise (i.e. for computing likelihoods) and column-wise (for computing regression steps, i.e. projections onto subspaces :) :)  ).

2. CLUNIT testing.  See initial code in src/unittests2  , but it is completely useless.

3. revisting reproducible probability streams.  This is critical.  If we can have a means of associating a stream via an associated computational work  unit, then reproducible statistical research will be possible.   Platform independence is an eventual requirement (more critical than speed -- remember, I'm coming out of the clinical development domain!).   

I've added 2 more files to the src/probability distributions, and really need to settle in on a reproducible probability computation package (ie ONLY generation, denisity/mass functions,  PDF, CDF, quantile  functions, ways to have a calculus for them (multivariate, combinations, slicing/barriers, mixtures, etc) and ways to compute dens/mass, PDF, CDF, quantile on the combinations....

4. Took a quick look at Julia.  Didn't like it for what I want.  But it looks like a nice technical language.  But I want a statistical system, which is usable by research statisticians (and educated and educatable non-statisticians :) )

More coding later!
 
best,
-tony

blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes" (AJR, 4Jan05).

Drink Coffee:  Do stupid things faster with more energy!

Tamas Papp

unread,
Feb 26, 2014, 7:50:28 AM2/26/14
to lisp...@googlegroups.com
On Wed, Feb 26 2014, A.J. Rossini <blind...@gmail.com> wrote:

> Dear all -
>
> Here's a quick summary of lisp-stat related activities.
>
> 1. thinking about modelframes. So the reason I like R so much is that
> dataframes and model specifications make statistical modeling and analysis
> so "obvious and clear". The reason I am less than satisfied is that
> dataframes and R's models are so 1980...

I am working on a project now in R (basically because of Rstan, I have
some CL interface but there is a deadline and I don't have time to
fiddle with it). On the one hand data.frames are nice, but they have
painful limitations. For me, the most important one stems from R's
inability to have vectors of vectors (except a list of vectors, but
data.frame won't accept that). So I had to write a whole suite of
functions to work with posterior simulations in Rstan (which are lists
of arrays, not lists of vectors).

My ideal data.frame in CL would be a basically a sequence of name-column
pairs. A column would have a type, which would guarantee that all
elements are subtype of that. This could default to T, but some types
would get special treatment: eg (simple-array x n) would be stored in a
`(simple-array x ,(1+ n)) etc.

I have something usable (which does not implement the above
functionality yet) at https://github.com/tpapp/cl-data-frame . Examples
are in the unit tests.

Best,

Tamas

A.J. Rossini

unread,
Feb 26, 2014, 7:59:41 AM2/26/14
to lisp-stat
Dear Tamas -

I am aware of your work (spent the last few days, and probably many more, reviewing your work while coding my own!).

That (your issue with dataframes, which I really share) is basically the point (and you can see such, if you see the documentation comments in the modelframe local-branch commit), that I should be able to have a column of time-series, or networks, or other complex structures.   So for example, should I have:

5 subjects

and want to have data about gender, their PK trajectories (longitudinal/time series data, say 4 to 9 observations, varying per individual), and daily body temperature for 7 days, I should only need 3 columns.

And have 2 "temporally inhomogeneous short time series" and one column of factor-coded variables.

So that data should be a 5x3 array.  The only constraint is that columns should have a lisp-created type, and be restricted.

(there is the missing/censored/coarsened data coding issue, but that is separate but related).

This is what I'm looking at, and will be looking more at your code related to that.

best,-
tony




--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+...@googlegroups.com.
To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/groups/opt_out.



--
Reply all
Reply to author
Forward
0 new messages