status, 21 Jan 2013... (summary: merge done, not yet complete, more graphics to look at)

57 views
Skip to first unread message

A.J. Rossini

unread,
Jan 21, 2013, 1:03:22 AM1/21/13
to lisp...@googlegroups.com
Dear all -

So, I managed to merge David's branch into the mainline, but it's not quite all there.   Current issues to tackle this early week:

1. dataframes: continue to merge and improve David's work, as well as similar to David, look at Mirko and Tamas' current status for integration and game-changing.

One thing that has changed, is that while originally I wanted to build in statistical concepts to the data frame and processing (independence, correlation, notion of rows being conditionally independent given the dataset), David changed it back to "rows/columns" from "cases/variables".   I'm re-thinking this a bit, and think that I probably will just need to macro-ize what I want at the end, rather than having it built in from the beginning, as we explore how this system can be used and how the "up-take" and learning works.

2. graphics:  David McClain posted a series of graphics and numerical tools on github which are for audio processing ("time series work") but are LispWorks-centric.  However, they look like good studying material (the graphics API) and independent comparators (numerics).  See:   https://github.com/dbmcclain

In addition, an older framework popped up on the quicklisp list.  http://common-lisp.net/project/clnuplot/

So, I'll continue to rectify David's work (little things like ensure that the right packages are loaded for reading CSV files), study Tamas and Mirko's work, and hopefully will have more progress next Monday morning!

Another distraction last week was automatic differentiation (want, want....) and Radford Neal's chapter on MCMC with Hamiltonian Dynamics (am evaluating Andrew Gelman's STAN MCMC system at work).  There's code for autodiff on github, and at somepoint, I want a package which does calculus for probability functions (compiling and hyperoptimizing a final prob/likelihood for use in bayesian computations, but NOT before, since I need flexible subjective-prior specification, since we are being semi-subjective Bayesians at work :-).    Thus, frameworks (UNOPTIMIZED) for quick bayesian explorations.   This I need to think through a bit.

best,
-tony

David Hodge

unread,
Jan 27, 2013, 10:17:15 PM1/27/13
to lisp...@googlegroups.com
On Mon, Jan 21, 2013 at 2:03 PM, A.J. Rossini <blind...@gmail.com> wrote:
Dear all -

So, I managed to merge David's branch into the mainline, but it's not quite all there.   Current issues to tackle this early week:

1. dataframes: continue to merge and improve David's work, as well as similar to David, look at Mirko and Tamas' current status for integration and game-changing.

One thing that has changed, is that while originally I wanted to build in statistical concepts to the data frame and processing (independence, correlation, notion of rows being conditionally independent given the dataset), David changed it back to "rows/columns" from "cases/variables".   I'm re-thinking this a bit, and think that I probably will just need to macro-ize what I want at the end, rather than having it built in from the beginning, as we explore how this system can be used and how the "up-take" and learning works.

So, there I am not so sure about that -  in that i personally think in terms of variables and observations. The lower level routines i wrote do operate columns mainly, but at a high level I agree that we we should be thinking in terms of independent variables, for sure.

And this leads to the observation that the current dataframe implementation is just not that useful. I actually tried to use it for some  small tasks and it got in the way, more than it helped. That was the driver for me to explore alternative ideas and Mirko, Tamas, and I all had somewhat similar ideas.

So, its timely to have a conversation about dataframes then. There is no doubt that columns should be typed in someway. The PCL approach is quite attractive, in that it provides for interning variables and saving a little memory (important with several million rows) and giving a nice base for extension. Tamas's approach is also nice and simple - its probable that the final incarnation of CLS might actually have pluggable dataframes, each suited to meeting specific requirements.( eg huge datasets, optimized querying etc)

And the several million rows is also driving me to think about databases as well and have a dataframe backed by SQL which is in the long term plan as far as i can see. That might be one of the things to look at once i get back looking at infrastructure.


2. graphics:  David McClain posted a series of graphics and numerical tools on github which are for audio processing ("time series work") but are LispWorks-centric.  However, they look like good studying material (the graphics API) and independent comparators (numerics).  See:   https://github.com/dbmcclain

I looked at this briefly, very LW specific and a bit hard to figure out how it all hangs together.  

In addition, an older framework popped up on the quicklisp list.  http://common-lisp.net/project/clnuplot/

Thats actually very easy to use and seems to do the job. 

There is also something called after query for browser based delivery which builds on d3 and has a nice clean interface and approach.

Not so useful for explorations, but very useful for presentation.

I have actually been spending time writing models and thinking less about infrastructure, so I am happy to wait till Tony get things into a clean state and we can bless that as CLS-V0.1b or something.


 

So, I'll continue to rectify David's work (little things like ensure that the right packages are loaded for reading CSV files), study Tamas and Mirko's work, and hopefully will have more progress next Monday morning!

rectify! :) I look forward to seeing that.

 
Another distraction last week was automatic differentiation (want, want....) and Radford Neal's chapter on MCMC with Hamiltonian Dynamics (am evaluating Andrew Gelman's STAN MCMC system at work).  There's code for autodiff on github, and at somepoint, I want a package which does calculus for probability functions (compiling and hyperoptimizing a final prob/likelihood for use in bayesian computations, but NOT before, since I need flexible subjective-prior specification, since we are being semi-subjective Bayesians at work :-).    Thus, frameworks (UNOPTIMIZED) for quick bayesian explorations.   This I need to think through a bit.

Baysesian is extremely interesting to me  too, Let me know how that turns out

Cheers


best,
-tony

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To post to this group, send email to lisp...@googlegroups.com.
To unsubscribe from this group, send email to lisp-stat+...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat?hl=en.
 
 

Mirko Vukovic

unread,
Feb 3, 2013, 8:39:46 AM2/3/13
to lisp...@googlegroups.com


On Monday, January 21, 2013 1:03:22 AM UTC-5, A.J. Rossini wrote:
Dear all -

So, I managed to merge David's branch into the mainline, but it's not quite all there.   Current issues to tackle this early week:

1. dataframes: continue to merge and improve David's work, as well as similar to David, look at Mirko and Tamas' current status for integration and game-changing.


A two comments on data-frames.  I have been using my code for about a month now, and here are some experiences:
  • data-frames for storing other data-frames: I am using one data-frame as a master index table for an experiment, with one or more of it's columns containing dataframes with the raw or processed data
  • data-frames (including the nested one from privous point) as the basic data structure for the Grammar of Graphics.  This comes from query and selection capabilities.
  • Storing non-numeric values.  I already mentioned a data-frame column storing other data-frames.  But I also have an instance of a column storing functions.

I wish for an ability to save and load a data-frame.  Using an existing underlying structure with such capabilities is appealing -- a few mentioned data-bases.  But how to save objects such as functions?  Saving the source code for the function and recompile during load?



2. graphics:  David McClain posted a series of graphics and numerical tools on github which are for audio processing ("time series work") but are LispWorks-centric.  However, they look like good studying material (the graphics API) and independent comparators (numerics).  See:   https://github.com/dbmcclain

In addition, an older framework popped up on the quicklisp list.  http://common-lisp.net/project/clnuplot/

So, I'll continue to rectify David's work (little things like ensure that the right packages are loaded for reading CSV files), study Tamas and Mirko's work, and hopefully will have more progress next Monday morning!

Another distraction last week was automatic differentiation (want, want....) and Radford Neal's chapter on MCMC with Hamiltonian Dynamics (am evaluating Andrew Gelman's STAN MCMC system at work).  There's code for autodiff on github, and at somepoint, I want a package which does calculus for probability functions (compiling and hyperoptimizing a final prob/likelihood for use in bayesian computations, but NOT before, since I need flexible subjective-prior specification, since we are being semi-subjective Bayesians at work :-).    Thus, frameworks (UNOPTIMIZED) for quick bayesian explorations.   This I need to think through a bit.


I think Macsyma is asdf-loadable.  Would that be a solution?
best,
-tony

Mirko

A.J. Rossini

unread,
Feb 4, 2013, 2:06:15 AM2/4/13
to lisp...@googlegroups.com
On Sun, Feb 3, 2013 at 2:39 PM, Mirko Vukovic <mirko....@gmail.com> wrote:
>
>
> On Monday, January 21, 2013 1:03:22 AM UTC-5, A.J. Rossini wrote:
>>
>> Dear all -
>>
>> So, I managed to merge David's branch into the mainline, but it's not
>> quite all there. Current issues to tackle this early week:
>>
>> 1. dataframes: continue to merge and improve David's work, as well as
>> similar to David, look at Mirko and Tamas' current status for integration
>> and game-changing.
>>
>
> A two comments on data-frames. I have been using my code for about a month
> now, and here are some experiences:
>
> data-frames for storing other data-frames: I am using one data-frame as a
> master index table for an experiment, with one or more of it's columns
> containing dataframes with the raw or processed data
> data-frames (including the nested one from privous point) as the basic data
> structure for the Grammar of Graphics. This comes from query and selection
> capabilities.
> Storing non-numeric values. I already mentioned a data-frame column storing
> other data-frames. But I also have an instance of a column storing
> functions.
>
> I wish for an ability to save and load a data-frame. Using an existing
> underlying structure with such capabilities is appealing -- a few mentioned
> data-bases. But how to save objects such as functions? Saving the source
> code for the function and recompile during load?
>

Good points, I think I need to expound on the differences and
activities surrounding what I'm looking for in a dataframe, in a clear
way. The "nesting" part is critical (though I'd prefer to think in
terms of more specialized structures from the data analysis domain)
and I have a solid justification in terms of getting the assumptions
surrounding the data locked into the analytics.

I'm assuming the code is checked into the same place that you had it
last time? (I can't recall, other than I can pull from a bunch of
places, one of which ought to have it if checked in).



>> So, I'll continue to rectify David's work (little things like ensure that
>> the right packages are loaded for reading CSV files), study Tamas and
>> Mirko's work, and hopefully will have more progress next Monday morning!

There is progress, but not where I want it to be, sigh.

>> Another distraction last week was automatic differentiation (want,
>> want....) and Radford Neal's chapter on MCMC with Hamiltonian Dynamics (am
>> evaluating Andrew Gelman's STAN MCMC system at work). There's code for
>> autodiff on github, and at somepoint, I want a package which does calculus
>> for probability functions (compiling and hyperoptimizing a final
>> prob/likelihood for use in bayesian computations, but NOT before, since I
>> need flexible subjective-prior specification, since we are being
>> semi-subjective Bayesians at work :-). Thus, frameworks (UNOPTIMIZED) for
>> quick bayesian explorations. This I need to think through a bit.
>>
>
> I think Macsyma is asdf-loadable. Would that be a solution?

That gives symbolic tools, and I do want to use it as part of a
demonstration of feasibility for prototyping, but that is a different
point. Automatic differentiation is a different beast (i.e. should
be able to differentiate through code which uses a crazy loop macro,
for example).


best,
-tony

blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we
can easily roll-back your mistakes" (AJR, 4Jan05).

Drink Coffee: Do stupid things faster with more energy!

Mirko Vukovic

unread,
Feb 4, 2013, 10:17:16 AM2/4/13
to lisp...@googlegroups.com
 
It would be nice to have a set of sample problems with pseudo-code that would illustrate desired behavior.
 
I'm assuming the code is checked into the same place that you had it
last time?  (I can't recall, other than I can pull from a bunch of
places, one of which ought to have it if checked in).



It is, but it is an ugly mess.  Organic code growth does not result in a pretty garden.

I refactored out the vector of vectors into a separate package (also on github), nested-vectors. One neat thing in it (though Tamas' might have similar capability) is a row accessor object that can be used to inspect and modify row contents as well as iterate over them in a transparent manner.  It uses spooky action at a distance mechanisms (clojures) that woud give Einstein a fit (EPR paradox).



>> So, I'll continue to rectify David's work (little things like ensure that
>> the right packages are loaded for reading CSV files), study Tamas and
>> Mirko's work, and hopefully will have more progress next Monday morning!

There is progress, but not where I want it to be, sigh.

>> Another distraction last week was automatic differentiation (want,
>> want....) and Radford Neal's chapter on MCMC with Hamiltonian Dynamics (am
>> evaluating Andrew Gelman's STAN MCMC system at work).  There's code for
>> autodiff on github, and at somepoint, I want a package which does calculus
>> for probability functions (compiling and hyperoptimizing a final
>> prob/likelihood for use in bayesian computations, but NOT before, since I
>> need flexible subjective-prior specification, since we are being
>> semi-subjective Bayesians at work :-).    Thus, frameworks (UNOPTIMIZED) for
>> quick bayesian explorations.   This I need to think through a bit.
>>
>
> I think Macsyma is asdf-loadable.  Would that be a solution?

That gives symbolic tools, and I do want to use it as part of a
demonstration of feasibility for prototyping, but that is a different
point.  Automatic differentiation is a different beast  (i.e. should
be able to differentiate through code which uses a crazy loop macro,
for example).


You are not modest in your goals, are you :-)

Tamas Papp

unread,
Feb 4, 2013, 10:27:47 AM2/4/13
to lisp...@googlegroups.com
On Mon, Feb 04 2013, Mirko Vukovic <mirko....@gmail.com> wrote:

> I refactored out the vector of vectors into a separate package (also on
> github), nested-vectors. One neat thing in it (though Tamas' might have
> similar capability) is a row accessor object that can be used to inspect
> and modify row contents as well as iterate over them in a transparent
> manner. It uses spooky action at a distance mechanisms (clojures) that
> woud give Einstein a fit (EPR paradox).

Nope, CL-DATA-FRAME has nothing like that. IMO these features sound
nifty, but they can lead to very obfuscated code and subtle bugs, so I
am unlikely to include anything like this.

However, I also feel the constant lure of `clever' solutions and
features, but lately I have been learning to discipline myself. BTW, I
am currently rewriting CL-NUM-UTILS and LLA, weeding out a lot of the
`too clever' stuff; reviewing my code it turns out that

1. I rarely ever need these features,

2. they do not provide the advantages I initially imagine (clarity or
speed),

3. they are a nightmare to maintain.

My 2 cents,

Tamas

Mirko Vukovic

unread,
Feb 4, 2013, 1:31:14 PM2/4/13
to lisp...@googlegroups.com


On Monday, February 4, 2013 10:27:47 AM UTC-5, Tamas Papp wrote:
On Mon, Feb 04 2013, Mirko Vukovic <mirko....@gmail.com> wrote:

> I refactored out the vector of vectors into a separate package (also on
> github), nested-vectors. One neat thing in it (though Tamas' might have
> similar capability) is a row accessor object that can be used to inspect
> and modify row contents as well as iterate over them in a transparent
> manner.  It uses spooky action at a distance mechanisms (clojures) that
> woud give Einstein a fit (EPR paradox).

Nope, CL-DATA-FRAME has nothing like that.  IMO these features sound
nifty, but they can lead to very obfuscated code and subtle bugs, so I
am unlikely to include anything like this.

However, I also feel the constant lure of `clever' solutions and
features, but lately I have been learning to discipline myself.  BTW, I
am currently rewriting CL-NUM-UTILS and LLA, weeding out a lot of the
`too clever' stuff; reviewing my code it turns out that

True.  Although I tried to document the code and the external interface (using slot accessors only, not slot-value), they system is not bullet-proof.  A few well placed (setf (slot-value ...) ...) would definitely lead to "undefined behavior".

But regarding bugs ... This object allowed me to considerably simplify the data-frame queries and selections code.  Whereas before, I was passing row indices, and then explicitly extracting nested vector elements, now I pass the row object and use its external interface to get access to values in that row.  Now the queries and selections part is cleaner, and hopefully that code is easier to follow and debug.  So it is a trade-off.

But admittedly, the main reason I did it was to see if it could be done :-)
 

1. I rarely ever need these features,

2. they do not provide the advantages I initially imagine (clarity or
speed),

3. they are a nightmare to maintain.


 
My 2 cents,

Tamas
 
 I assume that you are referring to euro's cents.  How would compare to my two US cents?  At current exchange rates, your 2 cents carry more weight.

Tamas Papp

unread,
Feb 4, 2013, 1:43:37 PM2/4/13
to lisp...@googlegroups.com
On Mon, Feb 04 2013, Mirko Vukovic <mirko....@gmail.com> wrote:

> On Monday, February 4, 2013 10:27:47 AM UTC-5, Tamas Papp wrote:
>>
>> However, I also feel the constant lure of `clever' solutions and
>> features, but lately I have been learning to discipline myself. BTW, I
>> am currently rewriting CL-NUM-UTILS and LLA, weeding out a lot of the
>> `too clever' stuff; reviewing my code it turns out that
>>
> But regarding bugs ... This object allowed me to considerably simplify the
> data-frame queries and selections code. Whereas before, I was passing row
> indices, and then explicitly extracting nested vector elements, now I pass
> the row object and use its external interface to get access to values in
> that row. Now the queries and selections part is cleaner, and hopefully
> that code is easier to follow and debug. So it is a trade-off.

I am happy that it works for you, my comments were not meant to imply
anything about your code, they were only a reflection on my experience
with my own projects. Usually when I implement something like this I am
happy when I start using it (I usually write these things with a purpose
in mind), but 6-12 months later I end up wondering why I did what,
especially if I am rewriting it. But hey, I am an amateur programmer
with no formal CS training, so I am not the best benchmark.

>> My 2 cents,
>
> I assume that you are referring to euro's cents. How would compare to my
> two US cents? At current exchange rates, your 2 cents carry more weight.

A US cent weights 2.5g while an euro cent is 2.3g, so it is best not to
jump to conclusions :-)

Best,

Tamas
Reply all
Reply to author
Forward
0 new messages