A Firm Target

41 views
Skip to first unread message

Steven Núñez

unread,
Oct 31, 2012, 12:11:27 AM10/31/12
to lisp...@googlegroups.com
Gentlemen,

I've been following with interest the discussions on numeric libraries and matrix handling routines with great interest. We're going to need all of that in order to get launched. My lack of comments haven't been from lack of interest, but because of a truly punishing travel schedule that, if it continues, will exceed 150K miles/year.

I, like David, am not an academic at the moment, and I tend to look at things through a very practical lens. We do the same at the company I run, moving in a straight line between what we have and what the customer requests (some people call this 'agile', but that term has been so mis-applied recently that I've stopped using it). I find that this put-it-in-use approach will quickly weed out items that are theoretically nice, but don't really add much value.

Recently I was sent a 'blog post series on predictive analytics by one of our guys who is learning predictive analytics. In it he uses R to walk the reader through the workflow required to produce a number of predictive analytic models, all using the Iris data set. I think this would make a great 'stake in the ground' way to prove that CLS can do useful work and give us a way to focus efforts on achieving a practical outcome that can hopefully drive some of the other sub-projects in CLS.

So, can any of you who are more familiar with CLS tell me: could we recreate the models in the 'blog post using CLS? If not, what remains to be done to get us there? I'd be happy to write a CLS version of that article, which might be a good way to tell the world that CLS has risen from the grave.

Regards,
- Steve Nunez

Peter Schmiedeskamp

unread,
Oct 31, 2012, 1:48:03 PM10/31/12
to lisp...@googlegroups.com, steven...@illation.com
I personally love the idea of a "rosetta code" series of blog posts.

One area where I would personally benefit from seeing CLS in action is in the data io, data formatting, and data cleaning tasks. So much of what I do in R is reshaping, filtering, cleaning, applying aggregating functions across margins, etc.

I'm probably not alone in that I'm pretty R savvy, but very novice to CL. I think I'm also not alone in being interested in moving more of my statistical analysis to lisp.

There's an r-bloggers blog which occasionally has useful content. Perhaps we could consider something similar? In terms of *learning* lisp-stat, I'd be happy to bash my head in on a few articles.

Thinking further down the line, this could be the basis for creating content for a book. Along similar lines, Peter Seibel recently talked about wanting to write a book about statistics for programmers http://lisp-univ-etc.blogspot.com.au/2012/07/lisp-hackers-peter-seibel.html

Cheers,
Peter

A.J. Rossini

unread,
Nov 2, 2012, 1:55:12 AM11/2/12
to lisp...@googlegroups.com, steven...@illation.com
As I've mentioned before, see the "examples" directory for such examples.  We need more.  Many more. 

At some point we'll outgrow that, but right now, we are about 400+ examples from that point :-).

Steven Núñez

unread,
Nov 2, 2012, 7:00:28 AM11/2/12
to lisp...@googlegroups.com
I think you're missing the point and not seeing the forest for the trees. We need to build a community, not get 2-3 people hacking on a few modules. I know its a lot more fun to write code, and who needs documentation, unit tests and comments in the source code? Those are for sissies. However building a community means spreading the message, making it easy to build/compile/install, get help, and a roadmap and direction. 

We (the company I work for) build complex software systems on a daily basis, and I can say with certainty that the state-of-the-art in distributed software development has come a long way from adding 'to-do's to a file.  Name a single successful open source project of any size that doesn't have at least the basics in place (basics meaning wiki, bug-tracking, roadmap, blog). These projects engage other people, motivate them to use and contribute; not just point them at a bunch of files and hope they sort it out. Even someone like myself, motivated to learn and contribute to CLS am not going to sift through a bunch of files and try to figure out what's working, who's doing and what bugs exist, and I doubt too many other people are either, especially when the bar for open source projects has been raised so much higher.

We need some direction here. I floated the idea that we focus efforts on a simple example: the blog post series around predictive analytics. If someone has a better suggestion, I'm all ears. I don't care what it is, but wearing my project manager hat, this project seems to lack any sort of organization, and my suggestion is that before we all jump in and start coding in 10 different directions we put a stake in the ground and put together something that will:
  • Interest other people (predictive analytics seems to be a hot topic right now)
  • Allow others to easily contribute
  • Provide modern tools to collaborate efficiently
Github may be different,  but the collaboration tools we use would require at most 2-3 days to configure; a small price to pay for the return on investment.

I don't mean to sound negative, but after 2-3 weeks of discussion, I still have little clarity of what's going on here. For example, let me ask a few simple questions:
  • Where is the current status page? A few messages say we're quicklispable, but it just failed for me. Who is working on making it quicklispable and what remains to be done?
  • Is there a 'blog to trumpet our capabilities? I suggested that we recreate the 'blog in the 'Firm Target' post. This would:
    • Get our 'blog on the map
    • Give us something to coordinate people, resources, tools and practice in working toward a common goal, a small goal, but it's a start at being organized as a group
I don't think this is the same as: 'Hope someone stumbles across an example that's useful and become interested in CLS'. The point here is that unless we organize ourselves as a project and make it easy for others to join, we're probably never going to reach the critical mass necessary to get CLS off the ground — and 1980 'dig around in the files' is really not going to work for anyone. We need to lower the barriers to entry.



--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To post to this group, send email to lisp...@googlegroups.com.
To unsubscribe from this group, send email to lisp-stat+...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat?hl=en.
 
 

A.J. Rossini

unread,
Nov 2, 2012, 8:27:40 AM11/2/12
to lisp...@googlegroups.com
Hi Steve -

You've got a good point regarding community. However, we are in a
bootstrapping stage -- I'd have no problems with having such tools,
and such support, but someone has to do this. For me, I believe that
in order to write basic user requirements for this project, we need to
have some preliminary data and experimentation. This is closer to a
research project than a working system -- as I mentioned before, if
you want to get things done, use R.

(truth in advertising -- I was one of the first 20 folks to compile
and run the first version of R, back in the mid 90s, and contributed
the first systems and prototypes for for many capabilities that are
considered "critical" to R's success)

What I don't want at this stage is bureaucracy -- while I'm a bit more
into project infrastructure and coordination than Tamas, I still feel
that we need experimentation to move forward.

I've provided some possible directions -- and clearly, we are all
volunteers, unlike any commissioned or commercial projects. And at
this point, while I have a vision that this is the right direction to
go (using common lisp as a basis for a new data analysis system which
provides a different and more modern approach than the S/R
languages....), I still need data in the form of "this is what I'd
like to solve".

For me, the examples that are contributed provide a means to identify
low-hanging fruit as well as commonality between people -- I've never
seen an open source project work which didn't let people do what they
had passions for. (and I guided the ESS project for over 10 years,
it's the oldest (circa 1989) continuous open source
statistical-computing oriented project still running). I did similar
with the XEmacs project back in 1995, and have watched numerous
collaborative projects die -- and have rescued others (the lisp-matrix
suite comes to mind, as well as the Common LispStat pre-alpha system
that I started this from).

So for the blog posts on predictive analytics -- we are so far before
that, that it isn't even funny. The basic regression routines need
to be rewritten using BLAS (they were part of liblispstat and not
efficiently from a stat-theory perspective written, using the sweep
operator approach to linear regression rather than SVD-based
algorithms, though someone could translate the code from lispstat --
which brings up a point, is there a good C2CL converter?). We need to
get dataset management going -- the dataframe routines are a joke, and
require lots more test cases and even something as basic as what I
proposed (column-typing).

TODO.org basically is a simple way to fill in needs as I either see
them, or as others see them. If we are going to use a different
system, someone needs to own and take charge of legacy data. Since
I'm offline more often than not (my primary hacking windows are early
mornings, my tram-ride to work, and my tram-ride back, about 1.5
hours/day total on good days), I pick tools which support that (i.e.
distributed VC like git, and keeping things there).

Since we don't have yet a formalised manifesto, I hesistate to open up
a collaborative wiki (again, the legacy data issue).

Since there are only 23 folks on the mailing list, of which we've
heard from 5 on a regular basis, it's going to be an uphill climb to
create a community, but clearly feasible.

I'd like to have a basic system in place before advertising too much
-- i.e. dataframes capability, basic visuals, descriptive numerics,
some data management functionality (integration, splitting,
reshaping), probability calculations and calculus (bayesian,
resampling, etc)

"I'd like".

Not required, of course.

Remember, this mailing list has been around since 2009, and only now
that I'm making some progress (updating, cruft-removal, new features
and approaches based on the many changes since then) and can promise
that with high probability, I'll be working on this for the next year
or 2, am I willing to committ to setting some goals.


And right now, as you point out, we need some. How about taking a
first stab at laying out what you think we should do and how we'll
implement it? The only thing I'd ask, is that you take in
consideration our limited resources, and right now I'm committed to:

1. coding up the data import, dataframes, and getting illustrative
examples written so that I can start to decide which analytics to
implement (and how they should be)

2. providing input and feedback to queries on this mailing list.

So take a stab at what you'd like to do and how you'd like to do it.
However, the one thing I'd ask is that if you are going to propose a
tool or system that requires maintainance, that you find a means to
supply that effort. ie on the XEmacs project a few years ago, we had
an issue tracker, but the first one proved useless because no one
claimed ownership of it. I've claimed "ownership" of TODO.org,
which should morph into project planning (i.e. the way org-mode will
let one do it), but there are other tools that are available, and I'd
gladly use as a client or user (not supporter or owner) if someone
owns and supports.

But my suggestion is to hold off on the blog posts until we've
"written" the content for them. I'm working on the content, but you
can help as well, and that was my point with the "Examples". And in
particular, things like David's examples need to be worked into a
formal example (which I'll do, unless some one beats me to it).

What would be wonderful for me? To have by the end of 2013, data
management, auditable objects, metadata coded regarding assumptions
for statistical and data analytic procedures (almost a project in
itself), dynamic and interactive graphics (neq dynamic interactive),
and from the analytics, basic regression infrastructure and resampling
infrastructure. If we could implement accelerators and macros for
MCMC and similar posterior likelihood calculations, I'd be in heaven.
Should I write done the project plan based on that as a strawman?

best,
-tony
--
best,
-tony

blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we
can easily roll-back your mistakes" (AJR, 4Jan05).

Drink Coffee: Do stupid things faster with more energy!

Tamas Papp

unread,
Nov 2, 2012, 9:45:00 AM11/2/12
to lisp...@googlegroups.com

On Fri, Nov 02 2012, A.J. Rossini <blind...@gmail.com> wrote:

> that, that it isn't even funny. The basic regression routines need
> to be rewritten using BLAS (they were part of liblispstat and not

FYI, LLA has least squares using QR decomposition (it would be trivial
to enable SVD, I had that running but commented it our for the time
being since I plan to redo the SVD API).

Just try

(lla:least-squares y X)

CL-RANDOM builds on this to provide LINEAR-REGRESSION, which returns an
object you can DRAW from if you are into Bayesian analysis. eg

(let ((regression (linear-regression y x)))
(list (mean regression) ; posterior mean
(draw regression))) ; random draw from the posterior

> On Fri, Nov 2, 2012 at 12:00 PM, Steven Núñez <steven...@illation.com> wrote:
>> I think you're missing the point and not seeing the forest for the trees. We
>> need to build a community, not get 2-3 people hacking on a few modules. I
>> know its a lot more fun to write code, and who needs documentation, unit
>> tests and comments in the source code? Those are for sissies. However
>> building a community means spreading the message, making it easy to
>> build/compile/install, get help, and a roadmap and direction.

I agree with Tony here. It is not that a wiki/blog/roadmap is useless;
on the contrary, these things can be very useful. It is just that at
the moment the marginal returns to other endavours are higher IMO.

Best,

Tamas

Peter Schmiedeskamp

unread,
Nov 2, 2012, 10:23:13 AM11/2/12
to lisp...@googlegroups.com
Having personally not contributed line one of code to this endeavor so far, I'm the last person who should opine, but here goes.

My intuition on open source projects falls toward "more community is better," however Tony raises some great points about the phasing of that community development. My willingness to jump in and work on blog postings is rooted in a desire to help out and to force myself to learn the system--my common lisp AND statistics chops are going to take some development before I'd consider myself qualified to contribute in any meaningful manner to the direction of this project.

From my perspective, I get to help out and learn something either by contributing examples either in github, or as part of a blog. The question then, is one of when is the right time to start trumpeting the return of CLS. Blogging now potentially generates interest now, but perhaps it generates the wrong kind of interest, i.e. the kind of interest in a finished ready-to-roll kind of system.

Given that things are a good long ways away from being done, I am compelled by Tony's approach to holding back on advertising CLS to end-users. We can continue to build content in our examples in github, and then, when CLS is ready, have plenty of content to feed blogs, manuals, and anything else. Then, presumably, we'll have a flood of pleasantly surprised new CLS users, as opposed to a bunch of people who walk away disappointed.

I'm only just now getting around to cloning a copy of this and getting it loaded up with Quicklisp. Once I figure out how to load things up and do some basics, I'll be dedicating my meager talents to example development.

Cheers,
Peter

A.J. Rossini

unread,
Nov 2, 2012, 12:39:40 PM11/2/12
to lisp...@googlegroups.com
Dropped this in to examples/60-regressionExamples.lisp , where it
looks basically like the following. My point is that despite seeing
the gaps (it's a particular bayesian regression, not general, and
there needs to be a data to numerical "model.matrix" function to
convert dataframe-like objects to a numerical model-matrix to use in
the computation, and convert back to the model-oriented regression
back from the numerical fit. i.e. you want to be able to use strings,
symbols, etc in the data frame, not just doubles, ints, and
complexes...

best,
-tony



(in-package :cls-examples)

;;; Example from Tamas Papp, BUT SIMPLE COPY, DOESN'T WORK
(use-package :lla)
(use-package :cl-random)

;; FYI, LLA has least squares using QR decomposition (it would be
;; trivial to enable SVD, I had that running but commented it our for
;; the time being since I plan to redo the SVD API).

;; Just try

(lla:least-squares y X)

;; CL-RANDOM builds on this to provide LINEAR-REGRESSION, which
;; returns an object you can DRAW from if you are into Bayesian
;; analysis. eg

(let ((regression (linear-regression y x)))
(list (mean regression) ; posterior mean
(draw regression))) ; random draw from the posterior

;;; Tony remarks:

;; Great Start! Issues to be addressed in the future: Need to clarify
;; that the so-called Bayesian linear regression is probably using the
;; standard non-informative/Jeffrey's prior, and so needs to clarify
;; that in the medium-term future (and extend with a property
;; pre-spec'd prior, and we need a means to go from a data frame
;; containing non-numerical variables to support using this.

A.J. Rossini

unread,
Nov 2, 2012, 12:47:33 PM11/2/12
to lisp...@googlegroups.com
Actually, what I need to do is make the README.org file pertinent and
up-to-date to where we are, what we lack, and modify the guides for
getting started and trying things out. Will work on this over the
weekend... That will at least get the front page of the github page
"correct". Then I need to professionalize the google-group welcome
page from my sarcastic intro (I.e. put on my group-head manager hat
from work), and we'll be presentable.

Then, perhaps we can identify things to tackle explicity (Steve's
point, and some of mine), how to get them off the ground (see Tamas'
post and my suggestions for improvement on linear regression), and
make sure that we keep the front end and backends separated during
birthing pains (which will be plenty).

And make sure it's in the repository until we move the legacy info in
the code and artifact-documents into a real system....

best,
-tony

Steven Núñez

unread,
Nov 2, 2012, 7:02:37 PM11/2/12
to lisp...@googlegroups.com
Hi Tony,

That's a great start; thanks. I'm traveling today, so it might take me a
while to properly digest and provide useful comments, but I'm glad to see
we're beginning to organize as a group.

I will suggest one goal that I think we should peruse as the number one,
immedidate priority: Be 100% QuickLisp-able, i.e. (ql:quick load
"common-lisp-stat") with no more required. These days (well at least in
the majority of cases) if you can't get a package via quick lisp, people
will look elsewhere.

While we may need to get a lot of groundwork done before really making
efforts to 'market' CLS, I think quicklisp-ability will go a long way to
perhaps getting some of the other 23 on the list involved, even if it's
just kicking the tires -- myself included. And if someone randomly runs
into CLS, at they'll only be a one-liner away from being able to
experiment.

Since I'm interested in predictive analytics, I'll take on the original
'blog post (Peter -- want to help?), one step at a time, by trying code,
filing bug reports, putting in feature requests, etc.

Regards,
- Steve

David Hodge

unread,
Nov 3, 2012, 11:00:19 AM11/3/12
to lisp...@googlegroups.com

I thought, as an exercise, i would extend my quick hacks of the other day and make them more generally useful.

After looking at Tony's examples more thoroughly (now that i understand them) I thought I would implement the summarise function, at least for basic statistics a la R
So print out fivum's etc for numeric columns, frequency counts for factors and so on. THis for dataframe-array types.


And found myself having to make a decision - I have just recently wrote some things for matrix-like (reducing columns, rows etc) and would have to reimplement them for arrays. I tried to use dataframe-matrixlike, but I recall Tony saying that that was not fully there at the moment. As hack its not much effort for this one issue, but it will get out of hand as i start to look at filtering datasets etc. (as an aside i am thinking of something like (summarise (select *df* where $sepal.length >5 ))

And then i remembered that lisp-matrix only accommodates floats at the moment, whereas arrays are normal lisp arrays. I think the right idea, for the moment, is to try to generalise things as much as possible - flesh out data frame:matrix-like , and try to paper over the seams with types as much as possible so things can move forward from a development perspective.

This then kind of assumes that lisp-matrix stays in, possibly underpinned by Antik later. Which, then of course, leads to the question about what do about factors/categorical variables, as right now lisp-matrix only supports floats.

Its a bit of a conundrum for me.

Advice and options sought. We probably want to actually have a bit of discussion on the topic before too much time passes I think.

Cheers




Tamas Papp

unread,
Nov 3, 2012, 11:18:24 AM11/3/12
to lisp...@googlegroups.com
Hi Tony,

On Fri, Nov 02 2012, A.J. Rossini <blind...@gmail.com> wrote:

> Dropped this in to examples/60-regressionExamples.lisp , where it
> looks basically like the following. My point is that despite seeing
> the gaps (it's a particular bayesian regression, not general, and
> there needs to be a data to numerical "model.matrix" function to
> convert dataframe-like objects to a numerical model-matrix to use in
> the computation, and convert back to the model-oriented regression
> back from the numerical fit. i.e. you want to be able to use strings,
> symbols, etc in the data frame, not just doubles, ints, and
> complexes...
>
> (lla:least-squares y X)
>
> ;; CL-RANDOM builds on this to provide LINEAR-REGRESSION, which
> ;; returns an object you can DRAW from if you are into Bayesian
> ;; analysis. eg
>
> (let ((regression (linear-regression y x)))
> (list (mean regression) ; posterior mean
> (draw regression))) ; random draw from the posterior
>
> ;;; Tony remarks:
>
> ;; Great Start! Issues to be addressed in the future: Need to clarify
> ;; that the so-called Bayesian linear regression is probably using the
> ;; standard non-informative/Jeffrey's prior, and so needs to clarify
> ;; that in the medium-term future (and extend with a property
> ;; pre-spec'd prior, and we need a means to go from a data frame
> ;; containing non-numerical variables to support using this.

Yes, the regression is using the `reference' prior. You can also give
it conjugate priors, which can be obtained, for example, from previous
regressions; eg

(let* ((regression (linear-regression y1 x2))
(regression-with-more-data (linear-regression y2 x2 :prior regression)))
...)

I think that we are talking about two different things: for me, this is
a tool to do a quick & dirty regression before I move on to a more
elaborate model, whereas you want a very R-like regression framework
that is a full-fledged DSL for regressions. As you are probably aware,
writing a library like that takes a lot of effort and time, which is
fine if someone wants to invest that, but IMO that is very unlikely
given the current size/focus of the CL numerical community.

BTW, I am now using Stan (http://mc-stan.org/) for inference -- I wrote
a very basic CL wrapper for Stan and can now do Bayesian inference
relatively quickly.

Best,

Tamas

A.J. Rossini

unread,
Nov 3, 2012, 11:40:13 AM11/3/12
to lisp...@googlegroups.com
Quick comments:

You should be using generics when possible, so make it just dispatch
on arrays -- there is a dataframe-like subclass that uses CL-arrays as
a backing store, and so if you code up the generic and the method to
dispatch appropriately, we can write the other versions. In fact, if
you do arrays, we can always do object conversion or copying just to
make it work.

Optimisation (of code) is for later.

So again, write the code to "get the job done", and then we can
refactor that as needed to get the job done right...

The one thing I'd ask is that you use a higher-level data access
package, i.e. xarray or similar. Then it should be easy to let the
dispatch handle it, and as we fix the dispatch, we solve the general
problem...

(dataframes are close enough, just use them and we'll fix them as you
find features or needs....).

See src/data/dataframe-array.lisp for the infrastructure you need,
src/data/dataframe.lisp for the generics you can use.

For descriptive statistics, keep it in the CL array structure. The
only reason to use lisp-matrix backend is for BLAS activities
(numerical crunching) and yes, at some point we'll get there.

And CL arrays can be also determined to be "matrix-like" with a bit of
work (i.e. if it's numerical, there is a means to convert).

lisp-matrix will NEVER support non-numerical arrays.

Dataframe-like objects support general data, and given a model and
assumptions, map to a matrix-like object for computation. One of the
deep secrets that you need to know about going between data and data
analysis is that there is a mapping, sometimes the identity mapping,
that goes on between the dataframe-like and the matrix-like that will
be used for computing quantities, and then the results are attributed
back as metadata onto the dataframe-like.

We've got the general infrastructure there, lisp-matrix introduces the
matrix-like object which uses different storage types depending on the
system, with the criteria that the values of the cells in the matrix
are numbers that can be computed with (and that there is a common
class that holds all the values in the matrix-like. dataframe-like
is similar, but with column-enforced typing (planned future feature
:-).

So repeat after me:

Dataframe-like + model/assumptions -> matrix-like which can be
computed with -> metadata and derived data based on model/assumptions
for dataframe-like

However, we'll expand to that AFTER we get the dataframe-like (i.e.
CL array) function computed with.

David, does this clear up a bit of your questions?

best,
-tony

A.J. Rossini

unread,
Nov 3, 2012, 11:50:44 AM11/3/12
to lisp...@googlegroups.com
Excellent! Except that I'd not have put this into CL-random, but
elsewhere (keeping a more modular organization, but this is of course
a case of "where should the design stop and the work gets done".

> I think that we are talking about two different things: for me, this is
> a tool to do a quick & dirty regression before I move on to a more
> elaborate model, whereas you want a very R-like regression framework
> that is a full-fledged DSL for regressions. As you are probably aware,
> writing a library like that takes a lot of effort and time, which is
> fine if someone wants to invest that, but IMO that is very unlikely
> given the current size/focus of the CL numerical community.

It's actually my goal, i.e. this is what I meant by this being
academic for me (I _WANT_ the full fledged, general purpose, kicking
and screaming bell&whistles DSL - and I want it modern and optimal,
and by that I mean that I'd like a language to specify regressions
that reflects modern beliefs and coherency along the range of
regression variants and super-algorithms (i.e. regularized estimates,
bootstrapped estimates, etc). My research area, back when I was a
prof, was in computing systems for data analysis. With the advent of
R, this subject area has disappeared as a research area, to be
replaced with applied research in the field along with development
(not research), which for me means using the system, not innovating on
the core system.

And this is why I differentiate between your goals and mine. But
also, it's why I want you to not get distracted by some aspects of
this project, since clearly your work has and is contributing heavily
to what I end up using.

> BTW, I am now using Stan (http://mc-stan.org/) for inference -- I wrote
> a very basic CL wrapper for Stan and can now do Bayesian inference
> relatively quickly.

From what I've been told by Andrew Gelman and Co, it's the thing to
use now. This is actually one of the few exceptions to the above
rule ("dearth of research in computing systems) thanks to the cool
research happening with MCMC sampling algorithm acceleration... But
I-d like to figure out a reasonably comprehensive modeling language
that covers the range of regression approaches. I believe it can be
done, but it's also "research", not development. And there are many
intermediate steps that need to be taken (i.e. the stuff you are
doing) before I can get there.

A.J. Rossini

unread,
Nov 3, 2012, 6:37:18 PM11/3/12
to lisp...@googlegroups.com, steven...@illation.com, st...@nunez.org

Basically, just have to do it (and then make sure that we describe the correct branch to pull, perhaps "release" branch or "master"?).  But I've been hesistating until it does something, since Xach doesn't update too often.

Chicken-n-egg problem (i.e. perhaps a version controlled pull for the main repository, with the stab;e sub-repo's like the lisp-matrix/fnv/ffa and xarray and etc from quicklisp, or?)

So my conundrum: being able to patch problems for the user quickly, verses ease of access, and with the current development state of affairs (lots of small fixes, mostly, and lots of docs, etc, being added) have been tempted not to push a mostly needing-fixes repo there just to have folks complain that nothing works and that it isn't simple to fix.

The alternative is that they can git-clone into their quicklisp local-projects directory (which I've tried) along with a few other projects from my github pages (I think there was 3 others?) and quicklisp pulls in the rest, from more stable and functional projects (lift, etc...).

So that was the document I'm planning to add to the readme, an updated "here's how to quickly start".     And I still lean towards, "and then formally quicklisp it once it settles down". 

I'd listen to other points of view, provided they describe why a different user experience is handy right now given the state of affairs, given the general ease of pulling from github through the windows application or simple command  line incantation of 5 lines which we can provide ( "cd XXXX ; git clone XXX1 ; git clone XXX2 ....).


best,
-tony

On Saturday, November 3, 2012 11:17:43 PM UTC+1, st...@nunez.org wrote:
I guess I should have clarified this -- I mean a one line quickload without having the source installed. Has any progress been made in getting Xach to include this in the main quick lisp archives?

On Saturday, November 3, 2012 10:02:39 AM UTC+11, Steven Núñez wrote:
> Hi Tony,

A.J. Rossini

unread,
Nov 3, 2012, 6:41:23 PM11/3/12
to lisp...@googlegroups.com, steven...@illation.com, st...@nunez.org
Another possibility for user expansion -- post on comp.lang.lisp, the current state of affairs, "Here's announcing the revitalized..."

Since, the only people that know about this either:

1. caught my google+ post
2. know someone who did
3. follow my common-lisp-stat GitHub archive

Maybe there is another way, but I can't think of any?

Here, I'd welcome suggestions for what to write and include, I'm a bit too familiar with all the problems to write a decent "Come try and participate" post... :-).

best,
-tony

David Hodge

unread,
Nov 3, 2012, 6:51:26 PM11/3/12
to lisp...@googlegroups.com, lisp...@googlegroups.com, steven...@illation.com, st...@nunez.org
Tony,

I think that the git clone into quicklisp local projects,use quicklisp for stable libraries is a good approach for now.

One thing is to minimise the dependencies and move them all into the "external" directory, so you can get a smooth install experience. And no, there are about one dozen dependencies in the original repo I am afraid, which made installation of this a nightmare.


I have not tested it for a while, but for my repo, the install procedure is 
     1. Git clone 
     2. (ql:register local projects)
     3. (ql:quickload)

Which is pretty painless

Sent from my iPad
--

A.J. Rossini

unread,
Nov 3, 2012, 6:56:16 PM11/3/12
to lisp...@googlegroups.com, steven...@illation.com, st...@nunez.org
I agree about the dependency hell -- which is why I worked it down to I think 3-4 outside of quicklisp, and that was absolutely critical.

If you can provide exact code, for step #2, that would be useful (I know #3 :-)) -- or we can just have people use the quicklisp/local-projects directory (unless they are doing something fancy, which one person mentioned...)

best,
-tony

David Hodge

unread,
Nov 3, 2012, 6:58:58 PM11/3/12
to lisp...@googlegroups.com
Some responses



Sent from my iPad

On 3 Nov, 2012, at 11:40 PM, "A.J. Rossini" <blind...@gmail.com> wrote:

> Quick comments:
>
> You should be using generics when possible, so make it just dispatch
> on arrays -- there is a dataframe-like subclass that uses CL-arrays as
> a backing store, and so if you code up the generic and the method to
> dispatch appropriately, we can write the other versions. In fact, if
> you do arrays, we can always do object conversion or copying just to
> make it work.
>
> Optimisation (of code) is for later.

Agree. I will make it work first.

>
> So again, write the code to "get the job done", and then we can
> refactor that as needed to get the job done right...
>
> The one thing I'd ask is that you use a higher-level data access
> package, i.e. xarray or similar. Then it should be easy to let the
> dispatch handle it, and as we fix the dispatch, we solve the general
> problem...
>
Xarray it is then. Unless there is a clear need for some other package of course, but I don't see that

> (dataframes are close enough, just use them and we'll fix them as you
> find features or needs....).
>
> See src/data/dataframe-array.lisp for the infrastructure you need,
> src/data/dataframe.lisp for the generics you can use.
Yup.
>
> For descriptive statistics, keep it in the CL array structure. The
> only reason to use lisp-matrix backend is for BLAS activities
> (numerical crunching) and yes, at some point we'll get there.
>
> And CL arrays can be also determined to be "matrix-like" with a bit of
> work (i.e. if it's numerical, there is a means to convert).
>
> lisp-matrix will NEVER support non-numerical arrays.

So this is the bit below that is quite important to understand.......
>
> Dataframe-like objects support general data, and given a model and
> assumptions, map to a matrix-like object for computation. One of the
> deep secrets that you need to know about going between data and data
> analysis is that there is a mapping, sometimes the identity mapping,
> that goes on between the dataframe-like and the matrix-like that will
> be used for computing quantities, and then the results are attributed
> back as metadata onto the dataframe-like.
>
> We've got the general infrastructure there, lisp-matrix introduces the
> matrix-like object which uses different storage types depending on the
> system, with the criteria that the values of the cells in the matrix
> are numbers that can be computed with (and that there is a common
> class that holds all the values in the matrix-like. dataframe-like
> is similar, but with column-enforced typing (planned future feature
> :-).
>
> So repeat after me:
>
> Dataframe-like + model/assumptions -> matrix-like which can be
> computed with -> metadata and derived data based on model/assumptions
> for dataframe-like

Consider it repeated.

Data frame metadata needs to be enforced now for types. And given the scheme out lined above, we do need to take an initial stab at missing data so the the automagical transport between data frames and lisp matrices can happen reliably.
>
> However, we'll expand to that AFTER we get the dataframe-like (i.e.
> CL array) function computed with.
>
> David, does this clear up a bit of your questions?

Indeed.

David Hodge

unread,
Nov 3, 2012, 7:07:17 PM11/3/12
to lisp...@googlegroups.com
Actually, I sent you the instructions a while ago.

I will resend after my first coffee and turn on my mac.

The joys of a 4 year who want to get up early!



Sent from my iPad

A.J. Rossini

unread,
Nov 3, 2012, 7:07:49 PM11/3/12
to lisp...@googlegroups.com
Quickly - for now, for missing data, go with 'nil symbol and a
missing? or missing-p function (or generic) which works basically like
nilp.

It'll work until we build the real system over it...

best,
-tony

A.J. Rossini

unread,
Nov 3, 2012, 7:08:55 PM11/3/12
to lisp...@googlegroups.com
I know those joys. I'm dealing with a 15 yr old who wants to stay out late :-).

We might need to update those instructions, please start a new thread
with them? "install instructions" or similar?

David Hodge

unread,
Nov 3, 2012, 7:18:42 PM11/3/12
to lisp...@googlegroups.com, lisp...@googlegroups.com
A generic will work best I think.

It's on the todo.

Sent from my iPad

Peter Schmiedeskamp

unread,
Nov 3, 2012, 8:43:09 PM11/3/12
to lisp...@googlegroups.com
Perhaps germane to the discussion of how to set up a local working
copy of CLS to hack on, I've managed to fork and clone several repos
from Tony's github. I found the github page on forking useful for
configuring git to know about upstream repositories (i.e. Tony's repo,
not my fork): https://help.github.com/articles/fork-a-repo

Just because it was easy, I created a shell script to clone what I
believe are the correct four repositories into my local-projects
directory. It appears (ql:quickload :cls) barfs on lisp-matrix, but
I'm guessing I might not have the right branch:

Error while invoking #<COMPILE-OP (:VERBOSE NIL) {1008DAA3D3}>
on #<CL-SOURCE-FILE "lisp-matrix" "testing" "unittests">
[Condition of type ASDF:COMPILE-ERROR]

I'm in slightly new territory here, so if anyone can tell from the
attached shell script what I've done wrong, I'd appreciate it!

Cheers,
Peter
setup-cls.sh

David Hodge

unread,
Nov 3, 2012, 8:56:12 PM11/3/12
to lisp...@googlegroups.com
in another email i listed the known external dependancies.

Here is a shell script to create them as submodules

so the easiest way to make it work is have your cls repo in the quick lisp local projects directory

1. clone all the git submodules (ear my instruction in the installation thread)

2. check the libraries for cl-blapack (read my instructions in the installation thread). actually, I just realised, its the old cl-blapack, you will have to hack the library locations manually. , go to cl-blackpack/load-foreign-libraries.lisp
and follow the instructions in there. I will push my changes to tony later today.

3. register the local projects in quick lisp (read my instructions again)

and then just (ql:quickload. :cls)

Its should work, let me know if you run into any bumps.



create-submodules.sh

Peter Schmiedeskamp

unread,
Nov 3, 2012, 9:07:03 PM11/3/12
to lisp...@googlegroups.com
Oh, drat. Looks like I failed reading comprehension! I'll go back and
track that down and review.

Thank you for the helpful pointers on getting started!

Cheers,
Peter
>> <setup-cls.sh>

David Hodge

unread,
Nov 3, 2012, 9:18:43 PM11/3/12
to lisp...@googlegroups.com
actually,

I literally just sent it before your email arrived.

so no problem with your comprehension :)

I take it you are somewhere near my side of the world (singapore?) as it would be very late in either europe or the uS?

Cheers

Peter Schmiedeskamp

unread,
Nov 3, 2012, 9:20:10 PM11/3/12
to lisp...@googlegroups.com
Seattle here. Just about 6:00 p.m. Getting this set up is a good rainy
day / nurse a cold activity. Singapore sounds more exciting :-)

-p
Reply all
Reply to author
Forward
0 new messages