Francogrex <fra...@grex.org> wrote: > This below is an abstract of a seemingly very interesting article from > Ross Ihaka the developer of the very popular and powerful software for > statistical computing (R). He's going "back to the future" and using > lisp to improve things. It's a pity that we can't have the full text > article, it would be a great read.
> Back to the Future: Lisp as a Base for a Statistical Computing System > Ross Ihaka and Duncan Temple Lang > Abstract > The application of cutting-edge statistical methodology is limited by > the capabilities of the systems in which it is implemented. In > particular, the limitations of R mean that applications developed > there do not scale to the larger problems of interest in practice. We > identify some of the limitations of the computational model of the R > language that reduces its effectiveness for dealing with large data > efficiently in the modern era. > We propose developing an R-like language on top of a Lisp-based engine > for statistical computing that provides a paradigm for modern > challenges and which leverages the work of a wider community. At its > simplest, this provides a convenient, high-level language with support > for compiling code to machine instructions for very significant > improvements in computational performance. But we also propose to > provide a framework which supports more computationally intensive > approaches for dealing with large datasets and position ourselves for > dealing with future directions in high-performance computing. > We discuss some of the trade-offs and describe our efforts to > realizing this approach. More abstractly, we feel that it is important > that our community explore more ambitious, experimental and risky > research to explore computational innovation for modern data analyses.
This is an AMAZING paper. It discusses the performance as well as expressive advantages of using Common Lisp as the base of a custom DSL instead of writing one's own interpreter. Over the past several years I've been also been doing something very similar, although substitute MATLAB for R and substitute engineering simulations for statistics.
I often lie awake in the middle of the night weighing the pros and cons of a CL based approach like this article and my work vs using something like SciPy (the scientific & numerical substrate built on top of Python -- a very nice package if you like Python, btw). But the expressive advantages that CL has over Python as well as the raw speed advantages and even better, the continuum of optimization mentioned at the end of section 4 convince me that CL is still the right approach for something like this. It's even better that you don't ever need to drop into C except in the rarest of specialized cases.
In my work, I've developed a library for doing discrete event simulations using the DEVS formalism. Normally, this involves subclassing from the base model class in the framework and then specializing the small number of methods necessary for the simulation behavior. This year, I developed a very expressive macro language that allowed me to express models directly in a very compact notation without it looking like a normal subclassed framework usage. For example, defining a block that implements 'plus' functionality takes roughly 30 lines of defclass and defmethod forms. In the macro, it's 6 lines. And it's still all blindingly fast because this compact notation is expanded at macro-expansion time. In python, if you were to do something like that, you'd have to go to an external parser or suffer some sort of run-time hit.
These reasons and the reasons mentioned in the article make CL my secret weapon.
Summary: for those interested in numerical computing with CL as well as wanting additional evidence/fodder for advocating CL use, this is a great article.
I used R before coming to CL. I still remember the painful hours of debugging C code that I had to write to speed things up. I had do to do that often, since not of lot of the stuff I write vectorizes nicely (eg MCMC methods).
Common Lisp has been a blessing, it is extremely fast. When I started using it, I stressed about optimizing my code, but I only do that very rarely now. It is fast enough most of the time without extra tweaking, and then I just profile and optimize the bottlenecks.
I feel that the authors of the article are moving in the right direction, but they are also trying to sugar-coat Lisp for R users, by thin syntax layers and semantic extensions. I see no purpose in doing that, CL is here, and people can start programming in it today if they want to.
The only thing I miss in CL is some libraries. But the language is quite flexible, so they are easy to develop in most cases. The libraries I miss at the moment include the following:
- a nice robust multivariate optimization/rootfinding library, with a set of methods based on csolve, the other on trust region methods,
- B-spline library (GSLL does work for some stuff, lacks important things)
- common mathematical functions (distributions, gamma, etc)
I know GSLL is quite good, but interfacing with foreign code is still clunky, especially when I want to have a Lisp function called by foreign code. During the summer, I plan to develop some libraries for the above.
I used R before coming to CL. I still remember the painful hours of debugging C code that I had to write to speed things up. I had do to do that often, since not of lot of the stuff I write vectorizes nicely (eg MCMC methods).
Common Lisp has been a blessing, it is extremely fast. When I started using it, I stressed about optimizing my code, but I only do that very rarely now. It is fast enough most of the time without extra tweaking, and then I just profile and optimize the bottlenecks.
I feel that the authors of the article are moving in the right direction, but they are also trying to sugar-coat Lisp for R users, by thin syntax layers and semantic extensions. I see no purpose in doing that, CL is here, and people can start programming in it today if they want to.
The only thing I miss in CL is some libraries. But the language is quite flexible, so they are easy to develop in most cases. The libraries I miss at the moment include the following:
- a nice robust multivariate optimization/rootfinding library, with a set of methods based on csolve, the other on trust region methods,
- B-spline library (GSLL does work for some stuff, lacks important things)
- common mathematical functions (distributions, gamma, etc)
I know GSLL is quite good, but interfacing with foreign code is still clunky, especially when I want to have a Lisp function called by foreign code. During the summer, I plan to develop some libraries for the above.
I used R before coming to CL. I still remember the painful hours of debugging C code that I had to write to speed things up. I had do to do that often, since not of lot of the stuff I write vectorizes nicely (eg MCMC methods).
Common Lisp has been a blessing, it is extremely fast. When I started using it, I stressed about optimizing my code, but I only do that very rarely now. It is fast enough most of the time without extra tweaking, and then I just profile and optimize the bottlenecks.
I feel that the authors of the article are moving in the right direction, but they are also trying to sugar-coat Lisp for R users, by thin syntax layers and semantic extensions. I see no purpose in doing that, CL is here, and people can start programming in it today if they want to.
The only thing I miss in CL is some libraries. But the language is quite flexible, so they are easy to develop in most cases. The libraries I miss at the moment include the following:
- a nice robust multivariate optimization/rootfinding library, with a set of methods based on csolve, the other on trust region methods,
- B-spline library (GSLL does work for some stuff, lacks important things)
- common mathematical functions (distributions, gamma, etc)
I know GSLL is quite good, but interfacing with foreign code is still clunky, especially when I want to have a Lisp function called by foreign code. During the summer, I plan to develop some libraries for the above.
> This is an AMAZING paper. It discusses the performance as well as > expressive advantages of using Common Lisp as the base of a custom DSL > instead of writing one's own interpreter. Over the past several years > I've been also been doing something very similar, although substitute > MATLAB for R and substitute engineering simulations for statistics.
> I often lie awake in the middle of the night weighing the pros and > cons of a CL based approach like this article and my work vs using > something like SciPy (the scientific & numerical substrate built on > top of Python -- a very nice package if you like Python, btw). But > the expressive advantages that CL has over Python as well as the raw > speed advantages and even better, the continuum of optimization > mentioned at the end of section 4 convince me that CL is still the > right approach for something like this. It's even better that you > don't ever need to drop into C except in the rarest of specialized > cases.
> In my work, I've developed a library for doing discrete event > simulations using the DEVS formalism. Normally, this involves > subclassing from the base model class in the framework and then > specializing the small number of methods necessary for the simulation > behavior. This year, I developed a very expressive macro language > that allowed me to express models directly in a very compact notation > without it looking like a normal subclassed framework usage. For > example, defining a block that implements 'plus' functionality takes > roughly 30 lines of defclass and defmethod forms. In the macro, it's > 6 lines. And it's still all blindingly fast because this compact > notation is expanded at macro-expansion time. In python, if you were > to do something like that, you'd have to go to an external parser or > suffer some sort of run-time hit.
> These reasons and the reasons mentioned in the article make CL my > secret weapon.
> Summary: for those interested in numerical computing with CL as well > as wanting additional evidence/fodder for advocating CL use, this is a > great article.
the more-or-less defunct statistical package? Are not Ihaka and Lang re-inventing a wheel that has already fallen off? Might efforts be better directed at reviving it?
Disclaimer: I know little about Lisp-Stat other than that it is/was Lisp-based.
> the more-or-less defunct statistical package? Are not Ihaka and Lang > re-inventing a wheel that has already fallen off? Might efforts be > better directed at reviving it?
> Disclaimer: I know little about Lisp-Stat other than that it is/was > Lisp-based.
The software is XLISP-STAT, which is based on XLisp (an mostly abandoned dialect of Lisp with its own implementation).
XLisp was one of these Lisp implementations in C. An interpreter (IIRC) and not very fast. It implements also its own Lisp language (with some Common Lisp influence). XLisp was quite successful MAAAAAANY years ago. Now it is mostly dead. AutoLisp was once created based on XLisp. David Betz later created a new XLisp implementation based on Scheme - a different language.
These guys moved away from XLisp to languages like R. Now they feel that R has not a bright future as a implementation of an efficient programming language. Going back to XLisp is no option, since it is no longer used/maintained and also relatively 'slow'.
The authors of the paper argue that by using Common Lisp:
* they can write more code in Lisp and less code in C. since Common Lisp implementations have a wider range of performance (due to providing optimizing, incremental, native code compilers) and the implementations are mostly written in Common Lisp themselves.
* they get the choice of several Common Lisp implementations that are maintained, so they don't have to maintain the language implementation.
* they get a standard language with features and extensions that XLisp does not provide or in a more primitive form. They don't have to maintain/invent their own programming language.
Their goal is to get an interactive system and reasonable performance at the same time - something that several Common Lisp implementations can provide.
Tamas> - a nice robust multivariate optimization/rootfinding library, with a set Tamas> of methods based on csolve, the other on trust region methods,
Don't know if this satisfies your requirement for nice and robust, but I have a translation of DONLP2 for multivariate optimization. Translation done by f2cl, but it seems to work and most of the examples for DONLP2 pass.
Tamas> - common mathematical functions (distributions, gamma, etc)
Maxima has implementations of some of these in Lisp. Clocc has some code for this.
Tamas> I know GSLL is quite good, but interfacing with foreign code is still Tamas> clunky, especially when I want to have a Lisp function called by foreign Tamas> code. During the summer, I plan to develop some libraries for the above.
I have also wanted to do this, but I have never really gotten very far.
And Francogrex writes: > This below is an abstract of a seemingly very interesting article from > Ross Ihaka the developer of the very popular and powerful software for > statistical computing (R). He's going "back to the future" and using > lisp to improve things.
In many ways, JavaScript is a much closer match to R, Octave, etc.
* Rewriting existing user code ain't gonna happen. Much of the code is written by reluctant programmers, and companies / funding agencies don't seem interested in directing them to rewrite everything. So don't expect any solutions that require rewriting working, tested code.
* The basic JS type system and conversion system is closer, and the object system suffices for both R and Octave.
* A non-trivial amount of existing code really *does* associate variables with values of different types within a loop. TraceMonkey-style JIT compilation likely will work very well. Building the same atop CL is possible, but if others already have done the work...
So I'd expect the first successful reimplementation project to be based off JS. It's not that CL can't handle these things, but the JS implementations already seem to be going in a similar direction as the scientific/statistical computing languages. And while CL may give a better structure to new code, it won't help with existing code (or existing texts, existing course work, etc.).
> In article > <c368c75f-a30c-47d7-86ec-1a73f107b...@s20g2000yqh.googlegroups.com>,
> Francogrex <fra...@grex.org> wrote: > > This below is an abstract of a seemingly very interesting article from > > Ross Ihaka the developer of the very popular and powerful software for > > statistical computing (R). He's going "back to the future" and using > > lisp to improve things. It's a pity that we can't have the full text > > article, it would be a great read.
> > Back to the Future: Lisp as a Base for a Statistical Computing System > > Ross Ihaka and Duncan Temple Lang > > Abstract > > The application of cutting-edge statistical methodology is limited by > > the capabilities of the systems in which it is implemented. In > > particular, the limitations of R mean that applications developed > > there do not scale to the larger problems of interest in practice. We > > identify some of the limitations of the computational model of the R > > language that reduces its effectiveness for dealing with large data > > efficiently in the modern era. > > We propose developing an R-like language on top of a Lisp-based engine > > for statistical computing that provides a paradigm for modern > > challenges and which leverages the work of a wider community. At its > > simplest, this provides a convenient, high-level language with support > > for compiling code to machine instructions for very significant > > improvements in computational performance. But we also propose to > > provide a framework which supports more computationally intensive > > approaches for dealing with large datasets and position ourselves for > > dealing with future directions in high-performance computing. > > We discuss some of the trade-offs and describe our efforts to > > realizing this approach. More abstractly, we feel that it is important > > that our community explore more ambitious, experimental and risky > > research to explore computational innovation for modern data analyses.
I would like to see the plots package's heuristics preserved, but make it easier to prototype new plots apart from the base ones. In short, a new statistics language should utilize multiple paradigms to design a better graphics package than R's present one. Even though it uses Cleveland's excellent graphing heuristics, the design feels very much like it was drafted by someone with no formal knowledge in program design.
Arguably, Lisp by itself offers no raw materials for visual programming. Lisp is not a visual environment. However, Lisp could be used to prototype a textual command language for shaping data into graphical form, as well as used to reason about a meta object protocol for such data shaping and reports generation.
On Dec 5, 12:05 pm, Rainer Joswig <jos...@lisp.de> wrote:
> Their goal is to get an interactive system and reasonable > performance at the same time - something that > several Common Lisp implementations can provide.
See also Common Lisp Stat, which is an attempt to fast-forward LispStat to Common Lisp, replace the hacky matrix stuff with a more modern approach (lisp-matrix as well as rif and tamas' extensions to that approach), and generally be a sensible platform for research in statistical computing (as opposed to a sensible platform for statistical computing, which is what R is right now -- but it's a bit hellish for research in the area).
There is an older set/mirror on repo.or.cz that I need to update.
If anyone is interested, I'd be happy to talk more. I'm also keeping an R within CL bridge maintained (rif did the development) and am generally keen on Common Lisp statistical stuff.
A bit more about Common LispStat -- it's not entirely LispStat like, as there are key problems with the LispStat approach, but the goal is to create a DSL within lisp for statistical analysis (i.e. machine learning, hypothesis discovery and testing, inference, regression modeling, etc) which is reasonably general, plays well with other CL packages, tries to be a toolkit and not a complete solution, and keeps the syntax as much as possible (as well as playing with with CL data structures when possible.
It's slowly making progress, it doesn't get much done today (again, the goal for me is to create a research platform, not a functional tool which works tommorow -- R does the latter for me), and that means enjoying the combination of "what could I do if..." and "let's debug to see what isn't there yet", slowly replicating what I do at work, in a common lisp environment, as an on-going gap-analysis.
Anyway, I'd be happy to collab with others, it's just my tram-ride commute hobby right now...
On Sun, 07 Dec 2008 22:23:29 -0800, AJ Rossini wrote: > It's slowly making progress, it doesn't get much done today (again, the > goal for me is to create a research platform, not a functional tool > which works tommorow -- R does the latter for me), and that means
I find that development is more rapid when I want functional tools, so I switched to CL entirely for my statistical analysis. But as I mostly do Bayesian stuff, I don't need a lot of libraries anyway. MCMC can be dead slow on R since it is not vectorized.
> Anyway, I'd be happy to collab with others, it's just my tram-ride > commute hobby right now...
I have an almost-complete 2d graphing library based on cl-cairo2 in the works. I plan to clean it up and post it by February (I will be superbusy until then).
On 8 Dec 2008 14:17:42 GMT, <tkp...@gmail.com> wrote:
> I have an almost-complete 2d graphing library based on cl-cairo2 in the > works. I plan to clean it up and post it by February (I will be > superbusy until then).
Wonderful news! I just stumbled across Cairo as an answer to my plotting needs recently. It is also one of those laptop projects for 'away-from-office time.
Need an alpha tester?
-- "Most programmers use this on-line documentation nearly all of the time, and thereby avoid the need to handle bulky manuals and perform the translation from barbarous tongues." CMU CL User Manual ** Posted from http://www.teranews.com **