lisp-matrix & generic math vs antik

46 views
Skip to first unread message

Harvey Stein

unread,
May 28, 2015, 12:26:20 AM5/28/15
to lisp...@googlegroups.com
I'm going through code I want to run in common-lisp-statistics and collecting & implementing the bits & pieces that were in xlispstat & aren't yet in cls.  Before I start trying to stick these things into cls, I'm hoping someone can fill me in a little on the architecture of the package.  In particular, cls has matrix handling from lisp-matrix & implements vectorized arithmetic internally.  However, it also uses antik, and antik also implements this functionality.  So what's the rationale for not just pulling this functionality from antik?  Is it just because it started pulling in antik at a later date, or is there another reason?  In any case, is there any reason not to replace all of that with the corresponding functionality from antik?

Thanks,
harvey

A.J. Rossini

unread,
May 28, 2015, 12:59:01 AM5/28/15
to lisp-stat

Hi Harvey! 

These are all experiments at this point.

One proposal - would you have time for a Skype or google hangouts discussion (later preferred, former possible)?  Live might be faster to convey how I think about it. 

Lisp matrix tries to do things "right" in that heavy optimization is possible (not yet implemented) due to the dispatch design. But does not fully work.

Antik just "works".  But it is a bit old fashioned (which is NOT a flaw) and not as elegant as I would like (perhaps an issue with me, not with it).

So I am caught in general between wanting things that work now, and a vision for how I think data analysis programming, with results providing optimal context-based decision support, should be done. 

Had I been 9 years older I could have taken earlier retirement last year and made this my second career :(.  And based on bad past predictions, current workload and family load, am hesitant for when I will again be making massive progress on the whole thing (currently trying to get R style model.matrix  behavior for rho and lispmatrix working, to experiment with general data analysis modeling specifications - infrastructure, but not terribly useful in the next few months for " getting it done").

If you can work in the examples directory or a sub directory with experiments, I would love to pull the work you are willing to share and take a look at anything that could be brought into core code/systems.

Best,
-tony

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+...@googlegroups.com.
To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.

David Hodge

unread,
May 28, 2015, 6:00:58 AM5/28/15
to lisp...@googlegroups.com
Hi Tony,

I would be inclined to start working again on CLS if we could agree a concrete plan. My house is nearly finished and I am starting to have time to think about fun stuff again.

Part of that plan would have to include a clean up of the core code base - Harveys note below touches on some of the issues. Cleaning up the overlapping dependancies (lisp-matrix vs antik for one) and straightening out naming& packaging  and such would go along way to making CLS usable and hackable I think.

I would be happy to help work on documenting the architecture as-is and fleshing out the to-be vision so it can be implemented. One of the things that stalled me previously was not really having a clear vision in my mind of where things are headed. I am pretty sure you have that and its kind of scattered around the code base in comments, but its hard to turn that into something concrete.

Irrespective of the experiments with data frames etc, lets try to get the  core in a serviceable state so it can be relied on for serious work.

Cheers

28 May 2015 4:59 pm

Hi Harvey! 

These are all experiments at this point.

One proposal - would you have time for a Skype or google hangouts discussion (later preferred, former possible)?  Live might be faster to convey how I think about it. 

Lisp matrix tries to do things "right" in that heavy optimization is possible (not yet implemented) due to the dispatch design. But does not fully work.

Antik just "works".  But it is a bit old fashioned (which is NOT a flaw) and not as elegant as I would like (perhaps an issue with me, not with it).

So I am caught in general between wanting things that work now, and a vision for how I think data analysis programming, with results providing optimal context-based decision support, should be done. 

Had I been 9 years older I could have taken earlier retirement last year and made this my second career :(.  And based on bad past predictions, current workload and family load, am hesitant for when I will again be making massive progress on the whole thing (currently trying to get R style model.matrix  behavior for rho and lispmatrix working, to experiment with general data analysis modeling specifications - infrastructure, but not terribly useful in the next few months for " getting it done").

If you can work in the examples directory or a sub directory with experiments, I would love to pull the work you are willing to share and take a look at anything that could be brought into core code/systems.

Best,
-tony

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+...@googlegroups.com.
To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.
28 May 2015 4:26 pm

A.J. Rossini

unread,
May 28, 2015, 8:27:45 AM5/28/15
to lisp...@googlegroups.com
Hi David - 

This is a clear example --  for me, the dataframes and modeling specifications and the conversion of these into numerical linear algebra and optimization problems, is key.  So that is what I am working on right now. Because I have a passion for it. 

The rest is just experiments to decide how things should evolve (agile/extreme approach, in a way).

So rather than cleaning the code base, I would prefer to have you put experiments (designs, packages) into directories in examples, and when ready, migrate them into the src. And then delete.   (Of course, if you do the work, you get to decide, but see below re distributed VCs)

 If you have a piece that you are passionate about (time series? Etc?) put it in, either with a second asdf file or add on to the current one, and let's see. 

The beauty of distributed VCs like git is that we can be subjective Bayesian about the source code, and each have our own belief of the truth, acknowledging that others might be slightly off.  

Until someone has passion, time and skill simultaneously, we are stuck with herding cats and agreein to disagree while trying to support each other. 

I actually think that a video chat could be fruitful to hammer out if there is a way forward aligning those here with some time and interest. I do not think we are too far from each other's views, but email is a poor and lossy form of communication.

So are you up for it?

Best,
-tony
--
Sent from }*~£%¥>%=,?

David Hodge

unread,
May 28, 2015, 3:34:23 PM5/28/15
to lisp...@googlegroups.com
tony,

More than happy to have a video chat. Time zones make it a bit awkward, but my morning is your evening so we could work something out.

I don't have any problems with experiments per se, but experience shows that integrating such things done over an extended period of time is a challenge.

And I kept on stumbling over truly annoying issues when trying to use cls for actual work that it's was just easier to roll my own stuff. 

I am sure there is a way to manage all this though. 

We can aim for a call next week - it's a holiday weekend here and we will be traveling

Cheers

Sent from Outlook

Harvey Stein

unread,
May 28, 2015, 9:43:50 PM5/28/15
to lisp...@googlegroups.com
My primary goal for CLS is to have, with minimal effort, a replacement for xlispstat.  As such, I'm inclined to use whatever exists and is working (Antik, GSLL, ...) to put together such a replacement.  My close second goal is for it to perform as well as xlispstat, and with a little effort (careful use of typed arrays, etc), to  perform much better - say close to machine speed.  By machine speed, I mean like optimized C except possibly for poor GC performance (which seems to occur in SBCL's conservative GC when creating and throwing away a lot of large arrays).  A third goal would be to include the semantics necessary to minimize the creation of such garbage (optional arguments for scratch space, etc).

I've demonstrated (at least in the case of repeatedly multiplying arrays using vectorized arithmetic), that SBCL is capable of producing code that performs like optimized C code except for the GC, and I can even achieve that just by using typed arrays and making a small tweak to antik:*.  I also have been able to run some numerical code under untweaked CLS which, by adding declarations and replacing unnecessarily generic cls operations with the cl version, performs almost 30x faster than xlispstat, although that's for unvectorized xlispstat code (vectorizing it would speed it up for xlispstat, but not for CLS).  Note though that this is with SBCL.  CCL was about 2x slower, although on the plus side, unlike SBCL, CCL doesn't crash with an out of memory error when creating lots of large garbage arrays.

I don't care so much about different paradigms for data analysis.  My opinion is that if after the above goals are achieved, it'd be easy to start investigating such things, and that without the above foundation, it'd be hard.

As for trying things out in the examples subdirectory, I think that approach would be reasonable for adding bits and pieces in an attempt to fill out compatibility if we currently had a reasonably solid working foundation.  But given the current state of the project, I don't get the impression that this would be an efficient way to build out the code.  My inclination would be to prune everything that's in existing packages, and then work towards filing in the remaining xlispstat compatibility.  The existing experiments/features/approaches can always be brought back if desired.

As for a call, it'd be great to chat & hangouts is fine, but I'm out of town most of next week, so it'd have to be the week after.

-- Harvey
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+unsubscribe@googlegroups.com.

To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.
28 May 2015 4:26 pm
I'm going through code I want to run in common-lisp-statistics and collecting & implementing the bits & pieces that were in xlispstat & aren't yet in cls.  Before I start trying to stick these things into cls, I'm hoping someone can fill me in a little on the architecture of the package.  In particular, cls has matrix handling from lisp-matrix & implements vectorized arithmetic internally.  However, it also uses antik, and antik also implements this functionality.  So what's the rationale for not just pulling this functionality from antik?  Is it just because it started pulling in antik at a later date, or is there another reason?  In any case, is there any reason not to replace all of that with the corresponding functionality from antik?

Thanks,
harvey

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+unsubscribe@googlegroups.com.

To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+unsubscribe@googlegroups.com.

To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.

Harvey Stein

unread,
May 29, 2015, 12:04:32 AM5/29/15
to lisp...@googlegroups.com
Also, I just noticed that the GSLL stuff expects to be passed foreign arrays made by Antik's grid package.  If we don't use Antik's vectorized math, then we'll need to extend CLS's to cover Antik grids, and if we don't want to keep converting and copying data, we'll need to make everything work efficiently with foreign arrays.

I know that at least SBCL can create fast code for arrays of doubles.  It remains to be seen if it can do so with these grid foreign arrays.

-- Harvey

Harvey Stein

unread,
May 29, 2015, 12:08:04 AM5/29/15
to lisp...@googlegroups.com
Although, on the other hand, it'll take work to make all of this stuff work like it does in xlispstat (i.e. - without resorting to grid:this, antik:that, ...).

David Hodge

unread,
May 29, 2015, 12:21:53 AM5/29/15
to lisp...@googlegroups.com
Hi Harvey,  
 
This is the sort of stuff I mean,  Antik and lisp-matrix have lots of overlap and it needs to be rationalized. 


There is lots to like in both packages, but I think that Antik/gsll wins from a completeness and maturity standpoint. Either way there needs to be a clear decision one way or the other so we can get reliable results and reliable performance. 

This is all infrastructure stuff and is largely separate from the end user interface whereTony is focussed. Both are important of course, but my opinion is even if there is a nice modelling interface if the results are rubbish or take 3 weeks to generate I, will look elsewhere.

Most of my interest is around moderate sized time series ( a million or so elements ) and performance is critical for some of the larger calculations.


I look forward to meeting you, even if virtually.

My apologies in advance for any autocorrect misspelling . remember autocorrect is my enema!
Sent from Outlook

_____________________________
From: Harvey Stein <hjs...@gmail.com>
Sent: Friday, May 29, 2015 16:04
Subject: Re: [lisp-stat] lisp-matrix & generic math vs antik
To: <lisp...@googlegroups.com>


Also, I just noticed that the GSLL stuff expects to be passed foreign arrays made by Antik's grid package.  If we don't use Antik's vectorized math, then we'll need to extend CLS's to cover Antik grids, and if we don't want to keep converting and copying data, we'll need to make everything work efficiently with foreign arrays.

I know that at least SBCL can create fast code for arrays of doubles.  It remains to be seen if it can do so with these grid foreign arrays.

-- Harvey

On Thursday, May 28, 2015 at 9:43:50 PM UTC-4, Harvey Stein wrote:
My primary goal for CLS is to have, with minimal effort, a replacement for xlispstat.  As such, I'm inclined to use whatever exists and is working (Antik, GSLL, ...) to put together such a replacement.  My close second goal is for it to perform as well as xlispstat, and with a little effort (careful use of typed arrays, etc), to  perform much better - say close to machine speed.  By machine speed, I mean like optimized C except possibly for poor GC performance (which seems to occur in SBCL's conservative GC when creating and throwing away a lot of large arrays).  A third goal would be to include the semantics necessary to minimize the creation of such garbage (optional arguments for scratch space, etc).

I've demonstrated (at least in the case of repeatedly multiplying arrays using vectorized arithmetic), that SBCL is capable of producing code that performs like optimized C code except for the GC, and I can even achieve that just by using typed arrays and making a small tweak to antik:*.  I also have been able to run some numerical code under untweaked CLS which, by adding declarations and replacing unnecessarily generic cls operations with the cl version, performs almost 30x faster than xlispstat, although that's for unvectorized xlispstat code (vectorizing it would speed it up for xlispstat, but not for CLS).  Note though that this is with SBCL.  CCL was about 2x slower, although on the plus side, unlike SBCL, CCL doesn't crash with an out of memory error when creating lots of large garbage arrays.

I don't care so much about different paradigms for data analysis.  My opinion is that if after the above goals are achieved, it'd be easy to start investigating such things, and that without the above foundation, it'd be hard.

As for trying things out in the examples subdirectory, I think that approach would be reasonable for adding bits and pieces in an attempt to fill out compatibility if we currently had a reasonably solid working foundation.  But given the current state of the project, I don't get the impression that this would be an efficient way to build out the code.  My inclination would be to prune everything that's in existing packages, and then work towards filing in the remaining xlispstat compatibility.  The existing experiments/features/ approaches can always be brought back if desired.
Reply all
Reply to author
Forward
0 new messages