Re: Back to the Future: Lisp as a Base for a Statistical Computing System

Message has been deleted

Rainer Joswig

unread,

Dec 4, 2008, 4:11:54 PM12/4/08

to

In article
<c368c75f-a30c-47d7...@s20g2000yqh.googlegroups.com>,
Francogrex <fra...@grex.org> wrote:

> This below is an abstract of a seemingly very interesting article from
> Ross Ihaka the developer of the very popular and powerful software for
> statistical computing (R). He's going "back to the future" and using
> lisp to improve things. It's a pity that we can't have the full text
> article, it would be a great read.
>
> Back to the Future: Lisp as a Base for a Statistical Computing System
> Ross Ihaka and Duncan Temple Lang
> Abstract
> The application of cutting-edge statistical methodology is limited by
> the capabilities of the systems in which it is implemented. In
> particular, the limitations of R mean that applications developed
> there do not scale to the larger problems of interest in practice. We
> identify some of the limitations of the computational model of the R
> language that reduces its effectiveness for dealing with large data
> efficiently in the modern era.
> We propose developing an R-like language on top of a Lisp-based engine
> for statistical computing that provides a paradigm for modern
> challenges and which leverages the work of a wider community. At its
> simplest, this provides a convenient, high-level language with support
> for compiling code to machine instructions for very significant
> improvements in computational performance. But we also propose to
> provide a framework which supports more computationally intensive
> approaches for dealing with large datasets and position ourselves for
> dealing with future directions in high-performance computing.
> We discuss some of the trade-offs and describe our efforts to
> realizing this approach. More abstractly, we feel that it is important
> that our community explore more ambitious, experimental and risky
> research to explore computational innovation for modern data analyses.

I could read that full article here:

http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

--
http://lispm.dyndns.org/

glenn....@gmail.com

unread,

Dec 4, 2008, 4:30:08 PM12/4/08

to

Ahh! Beat me to it, I found it as well. It's nice that the full
article is readable this way.

This is an AMAZING

glenn....@gmail.com

unread,

Dec 4, 2008, 4:54:26 PM12/4/08

to

Oops, finger fumble made me submit too early.

This is an AMAZING paper. It discusses the performance as well as
expressive advantages of using Common Lisp as the base of a custom DSL
instead of writing one's own interpreter. Over the past several years
I've been also been doing something very similar, although substitute
MATLAB for R and substitute engineering simulations for statistics.

I often lie awake in the middle of the night weighing the pros and
cons of a CL based approach like this article and my work vs using
something like SciPy (the scientific & numerical substrate built on
top of Python -- a very nice package if you like Python, btw). But
the expressive advantages that CL has over Python as well as the raw
speed advantages and even better, the continuum of optimization
mentioned at the end of section 4 convince me that CL is still the
right approach for something like this. It's even better that you
don't ever need to drop into C except in the rarest of specialized
cases.

In my work, I've developed a library for doing discrete event
simulations using the DEVS formalism. Normally, this involves
subclassing from the base model class in the framework and then
specializing the small number of methods necessary for the simulation
behavior. This year, I developed a very expressive macro language
that allowed me to express models directly in a very compact notation
without it looking like a normal subclassed framework usage. For
example, defining a block that implements 'plus' functionality takes
roughly 30 lines of defclass and defmethod forms. In the macro, it's
6 lines. And it's still all blindingly fast because this compact
notation is expanded at macro-expansion time. In python, if you were
to do something like that, you'd have to go to an external parser or
suffer some sort of run-time hit.

These reasons and the reasons mentioned in the article make CL my
secret weapon.

Summary: for those interested in numerical computing with CL as well
as wanting additional evidence/fodder for advocating CL use, this is a
great article.

Glenn

Tamas K Papp

unread,

Dec 4, 2008, 9:01:20 PM12/4/08

to

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

I used R before coming to CL. I still remember the painful hours of
debugging C code that I had to write to speed things up. I had do to do
that often, since not of lot of the stuff I write vectorizes nicely (eg
MCMC methods).

Common Lisp has been a blessing, it is extremely fast. When I started
using it, I stressed about optimizing my code, but I only do that very
rarely now. It is fast enough most of the time without extra tweaking,
and then I just profile and optimize the bottlenecks.

I feel that the authors of the article are moving in the right direction,
but they are also trying to sugar-coat Lisp for R users, by thin syntax
layers and semantic extensions. I see no purpose in doing that, CL is
here, and people can start programming in it today if they want to.

The only thing I miss in CL is some libraries. But the language is quite
flexible, so they are easy to develop in most cases. The libraries I
miss at the moment include the following:

- a nice robust multivariate optimization/rootfinding library, with a set
of methods based on csolve, the other on trust region methods,

- B-spline library (GSLL does work for some stuff, lacks important things)

- common mathematical functions (distributions, gamma, etc)

I know GSLL is quite good, but interfacing with foreign code is still
clunky, especially when I want to have a Lisp function called by foreign
code. During the summer, I plan to develop some libraries for the above.

Tamas

Tamas K Papp

unread,

Dec 4, 2008, 9:01:20 PM12/4/08

to

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

Tamas K Papp

unread,

Dec 4, 2008, 9:01:19 PM12/4/08

to

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

Tamas K Papp

unread,

Dec 4, 2008, 9:02:31 PM12/4/08

to

On Fri, 05 Dec 2008 02:01:20 +0000, Tamas K Papp wrote:

sorry for the multiple posts, newsreader acting crazy

Mirko

unread,

Dec 4, 2008, 9:19:51 PM12/4/08

to

There are a couple of interesting numerical projects going on:

- NLISP project's aim is to implement MatLab (&IDL's) vector
functionality
- GSLL is a library that links lisp with the GNU Scientific Library

I have used both of these on SBCL with good success and good rapport
with their authors.

Mirko

Prop...@gmx.net

unread,

Dec 5, 2008, 5:36:51 AM12/5/08

to

What about Lisp-Stat,

http://lambda-the-ultimate.org/node/617 ,

the more-or-less defunct statistical package? Are not Ihaka and Lang
re-inventing a wheel that has already fallen off? Might efforts be
better directed at reviving it?

Disclaimer: I know little about Lisp-Stat other than that it is/was
Lisp-based.

Rainer Joswig

unread,

Dec 5, 2008, 6:05:10 AM12/5/08

to

In article
<7f50c3ef-ec8f-4cc1...@j35g2000yqh.googlegroups.com>,
Prop...@gmx.net wrote:

The software is XLISP-STAT, which is based on XLisp
(an mostly abandoned dialect of Lisp with its own implementation).

XLisp was one of these Lisp implementations in C.
An interpreter (IIRC) and not very fast. It implements also
its own Lisp language (with some Common Lisp influence).
XLisp was quite successful MAAAAAANY years ago.
Now it is mostly dead. AutoLisp was once created based
on XLisp. David Betz later created a new XLisp implementation
based on Scheme - a different language.

These guys moved away from XLisp to languages like R.
Now they feel that R has not a bright future as
a implementation of an efficient programming language.
Going back to XLisp is no option, since it is no longer
used/maintained and also relatively 'slow'.

The authors of the paper argue that by using Common Lisp:

* they can write more code in Lisp and less code in C.
since Common Lisp implementations have a wider
range of performance (due to providing
optimizing, incremental, native code compilers)
and the implementations are mostly written in Common Lisp
themselves.

* they get the choice of several Common Lisp implementations that
are maintained, so they don't have to maintain
the language implementation.

* they get a standard language with features and extensions
that XLisp does not provide or in a more primitive form.
They don't have to maintain/invent their own programming
language.

Their goal is to get an interactive system and reasonable
performance at the same time - something that
several Common Lisp implementations can provide.

--
http://lispm.dyndns.org/

Raymond Toy

unread,

Dec 5, 2008, 11:40:27 AM12/5/08

to

>>>>> "Tamas" == Tamas K Papp <Tamas> writes:

Tamas> - a nice robust multivariate optimization/rootfinding library, with a set
Tamas> of methods based on csolve, the other on trust region methods,

Don't know if this satisfies your requirement for nice and robust, but
I have a translation of DONLP2 for multivariate optimization.
Translation done by f2cl, but it seems to work and most of the
examples for DONLP2 pass.

Tamas> - common mathematical functions (distributions, gamma, etc)

Maxima has implementations of some of these in Lisp. Clocc has some
code for this.

Tamas> I know GSLL is quite good, but interfacing with foreign code is still
Tamas> clunky, especially when I want to have a Lisp function called by foreign
Tamas> code. During the summer, I plan to develop some libraries for the above.

I have also wanted to do this, but I have never really gotten very
far.

Ray

Message has been deleted

Jason Riedy

unread,

Dec 5, 2008, 5:27:38 PM12/5/08

to

And Francogrex writes:
> This below is an abstract of a seemingly very interesting article from
> Ross Ihaka the developer of the very popular and powerful software for
> statistical computing (R). He's going "back to the future" and using
> lisp to improve things.

In many ways, JavaScript is a much closer match to R, Octave, etc.

* Rewriting existing user code ain't gonna happen. Much of the
code is written by reluctant programmers, and companies /
funding agencies don't seem interested in directing them to
rewrite everything. So don't expect any solutions that
require rewriting working, tested code.

* The basic JS type system and conversion system is closer, and
the object system suffices for both R and Octave.

* A non-trivial amount of existing code really *does* associate
variables with values of different types within a loop.
TraceMonkey-style JIT compilation likely will work very well.
Building the same atop CL is possible, but if others already
have done the work...

So I'd expect the first successful reimplementation project to be
based off JS. It's not that CL can't handle these things, but
the JS implementations already seem to be going in a similar
direction as the scientific/statistical computing languages. And
while CL may give a better structure to new code, it won't help
with existing code (or existing texts, existing course work,
etc.).

Jason

John "Z-Bo" Zabroski

unread,

Dec 7, 2008, 1:39:46 PM12/7/08

to

On Dec 4, 4:11 pm, Rainer Joswig <jos...@lisp.de> wrote:
> In article
> <c368c75f-a30c-47d7-86ec-1a73f107b...@s20g2000yqh.googlegroups.com>,

I would like to see the plots package's heuristics preserved, but make
it easier to prototype new plots apart from the base ones. In short,
a new statistics language should utilize multiple paradigms to design
a better graphics package than R's present one. Even though it uses
Cleveland's excellent graphing heuristics, the design feels very much
like it was drafted by someone with no formal knowledge in program
design.

Arguably, Lisp by itself offers no raw materials for visual
programming. Lisp is not a visual environment. However, Lisp could
be used to prototype a textual command language for shaping data into
graphical form, as well as used to reason about a meta object protocol
for such data shaping and reports generation.

AJ Rossini

unread,

Dec 8, 2008, 1:16:49 AM12/8/08

to

On Dec 5, 12:05 pm, Rainer Joswig <jos...@lisp.de> wrote:

> Their goal is to get an interactive system and reasonable
> performance at the same time - something that
> several Common Lisp implementations can provide.
>
> --http://lispm.dyndns.org/

See also Common Lisp Stat, which is an attempt to fast-forward
LispStat to Common Lisp, replace the hacky matrix stuff with a more
modern approach (lisp-matrix as well as rif and tamas' extensions to
that approach), and generally be a sensible platform for research in
statistical computing (as opposed to a sensible platform for
statistical computing, which is what R is right now -- but it's a bit
hellish for research in the area).

http://github.com/blindglobe for list of the current git repos and
dependencies (at least some of them).

There is an older set/mirror on repo.or.cz that I need to update.

If anyone is interested, I'd be happy to talk more. I'm also keeping
an R within CL bridge maintained (rif did the development) and am
generally keen on Common Lisp statistical stuff.

later...

AJ Rossini

unread,

Dec 8, 2008, 1:23:29 AM12/8/08

to

A bit more about Common LispStat -- it's not entirely LispStat like,
as there are key problems with the LispStat approach, but the goal is
to create a DSL within lisp for statistical analysis (i.e. machine
learning, hypothesis discovery and testing, inference, regression
modeling, etc) which is reasonably general, plays well with other CL
packages, tries to be a toolkit and not a complete solution, and keeps
the syntax as much as possible (as well as playing with with CL data
structures when possible.

It's slowly making progress, it doesn't get much done today (again,
the goal for me is to create a research platform, not a functional
tool which works tommorow -- R does the latter for me), and that means
enjoying the combination of "what could I do if..." and "let's debug
to see what isn't there yet", slowly replicating what I do at work, in
a common lisp environment, as an on-going gap-analysis.

Anyway, I'd be happy to collab with others, it's just my tram-ride
commute hobby right now...

Tamas K Papp

unread,

Dec 8, 2008, 9:17:42 AM12/8/08

to

On Sun, 07 Dec 2008 22:23:29 -0800, AJ Rossini wrote:

> It's slowly making progress, it doesn't get much done today (again, the
> goal for me is to create a research platform, not a functional tool
> which works tommorow -- R does the latter for me), and that means

I find that development is more rapid when I want functional tools, so I
switched to CL entirely for my statistical analysis. But as I mostly do
Bayesian stuff, I don't need a lot of libraries anyway. MCMC can be dead
slow on R since it is not vectorized.

> Anyway, I'd be happy to collab with others, it's just my tram-ride
> commute hobby right now...

I have an almost-complete 2d graphing library based on cl-cairo2 in the
works. I plan to clean it up and post it by February (I will be
superbusy until then).

Cheers,

Tamas

GP lisper

unread,

Dec 13, 2008, 3:40:22 AM12/13/08

to

On 8 Dec 2008 14:17:42 GMT, <tkp...@gmail.com> wrote:
>
> I have an almost-complete 2d graphing library based on cl-cairo2 in the
> works. I plan to clean it up and post it by February (I will be
> superbusy until then).

Wonderful news! I just stumbled across Cairo as an answer to my
plotting needs recently. It is also one of those laptop projects for
'away-from-office time.

Need an alpha tester?

--
"Most programmers use this on-line documentation nearly all of the
time, and thereby avoid the need to handle bulky manuals and perform
the translation from barbarous tongues." CMU CL User Manual
** Posted from http://www.teranews.com **