Installation brokenness fixed, rsm-string-cls master branch now correct

A.J. Rossini

unread,

Nov 4, 2012, 9:56:01 AM11/4/12

to lisp...@googlegroups.com

ARGH.

In testing out the instructions, noticed that rsm-string-cls was COMPLETELY borked. I'd made required fixes on the clsmaster branch, but not on master, so a default checkout gave a broken copy on SBCL.

Apologies to everyone.

Now it's a 4 repo git-clone activity

(common-lisp-stat, rsm-string-cls, lisp-matrix, listoflist)

and we should have:

lisp-matrix, listoflist into the quicklisp archives

rsm-string-cls nuked in favor of a supported CSV slurper.

Progress, sort of...

Steven Núñez

unread,

Nov 5, 2012, 10:51:26 PM11/5/12

to lisp...@googlegroups.com

Have these changes been pushed into the master repo? I just tried following David's directions and got this:

"rsm-string" not found

The directions to get there were:

cd ~/quicklisp/local-projects

git clone git://github.com/blindglobe/common-lisp-stat.git

cd common-list-stat

git submodules init

(FYI: there's a type in the above, the git command is 'submodule', not submoduleS, at least on my MacOS git (version 1.7.5.4). Following that:

(ql:register-local-projects)

(ql:quickload :cls)

Any ideas?

- Steve

--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To post to this group, send email to lisp...@googlegroups.com.
To unsubscribe from this group, send email to lisp-stat+...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat?hl=en.

A.J. Rossini

unread,

Nov 6, 2012, 12:45:00 AM11/6/12

to lisp...@googlegroups.com

Sent from my iPod what's-a-ma-jiggy

Am 06.11.2012 um 04:51 schrieb "Steven Núñez" <steven...@illation.com>:

Have these changes been pushed into the master repo? I just tried following David's directions and got this:

Not yet. I'll do it during my morning coffee break, need to finalize the replacement of cl-csv over rsm-string-CLS.

A.J. Rossini

unread,

Nov 6, 2012, 1:15:16 AM11/6/12

to lisp...@googlegroups.com

On Tue, Nov 6, 2012 at 6:45 AM, A.J. Rossini <blind...@gmail.com> wrote:

>
> Not yet. I'll do it during my morning coffee break, need to finalize the
> replacement of cl-csv over rsm-string-CLS.

I've committed the changes, at the expense of filename.csv->array not
working quite right (format conversion). Will work on this on the
tram ride in, this morning.

best,
-tony

blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we
can easily roll-back your mistakes" (AJR, 4Jan05).

Drink Coffee: Do stupid things faster with more energy!

Steven Núñez

unread,

Nov 6, 2012, 1:19:23 AM11/6/12

to lisp...@googlegroups.com

A bit closer. Now it's barfing on: "System "listoflist" not found".

Definitely agree that using QuickLisp for the dependencies instead of git
is the way forward...

- Steve

A.J. Rossini

unread,

Nov 6, 2012, 1:24:46 AM11/6/12

to lisp...@googlegroups.com

On Tue, Nov 6, 2012 at 7:19 AM, Steven Núñez <steven...@illation.com> wrote:
> A bit closer. Now it's barfing on: "System "listoflist" not found".

Agree about git sucking as a delivery system for the un-initiated :-).

Seriously, the quicklisp release cycle is monthly at best, we can't
rely on it for even weekly fixes.

You need to git clone the 4 other repositories into the same place
(quicklisp/local-projects) as common-lisp-stat is, so that they are
found.

Steven Núñez

unread,

Nov 6, 2012, 1:35:57 AM11/6/12

to lisp...@googlegroups.com

On 2012-11-06 17:24 , "A.J. Rossini" <blind...@gmail.com> wrote:

>On Tue, Nov 6, 2012 at 7:19 AM, Steven Núñez <steven...@illation.com>
>wrote:
>> A bit closer. Now it's barfing on: "System "listoflist" not found".
>

>You need to git clone the 4 other repositories into the same place
>(quicklisp/local-projects) as common-lisp-stat is, so that they are

>Found.

Isn't that what the 'git submodules init' command is in David's
instructions supposed to do? At least that what I assume it's supposed to
do (I'm an Hg user, not familiar with git).

David Hodge

unread,

Nov 6, 2012, 1:37:34 AM11/6/12

to lisp...@googlegroups.com, lisp...@googlegroups.com

Steve,

There was a message in this list to Peter which outlined the steps required to get things working locally right now.

It provides a script with the list of dependencies.

For the moment, it assumes the existence of the external subdirectory, though I do hope that goes away eventually.

In my view, given everything is in flux right now, using quicklisp for infrastructure libraries is fine as they won't change. But "internal" libraries like lisp-matrix and probably one or two others, won't be a good candidate for quicklisp for the foreseeable future. I think my other notes outline the current state, so I won't repeat them here.

Ideally, we would want someone to be just able to quickload common-lisp-stat and all the dependencies just come down And it auto configures itself for platform specifics.

Right now it does not.

Check out the note I sent to Peter, and if it does not work, give me a call.

Btw, thanks for noticing the typo!

Cheers

Sent from my iPad

Steven Núñez

unread,

Nov 6, 2012, 2:15:42 AM11/6/12

to lisp...@googlegroups.com

On 2012-11-06 17:37 , "David Hodge" <david...@gmail.com> wrote:

>[snip]

>
>In my view, given everything is in flux right now, using quicklisp for
>infrastructure libraries is fine as they won't change. But "internal"
>libraries like lisp-matrix and probably one or two others, won't be a
>good candidate for quicklisp for the foreseeable future. I think my other
>notes outline the current state, so I won't repeat them here.
>
>Ideally, we would want someone to be just able to quickload
>common-lisp-stat and all the dependencies just come down And it auto
>configures itself for platform specifics.

Agreed. I think the key to being always 'quicklispable' is a continuous
build system. That way even if monthly is the most frequent pull we can
get from quick lisp, at least it will be working. Developers (as opposed
to users) that want the absolute latest will probably be using git anyway.

- Steve

A.J. Rossini

unread,

Nov 6, 2012, 3:11:30 AM11/6/12

to lisp...@googlegroups.com, steven...@illation.com, Faré

On that note, have dropped in fare-csv instead of cl-csv, cl-csv had a bit too much cruft and tried to do a bit too much, better to use singularly focused packages, I think. So we are down to lisp-matrix and friends, and listoflist, both of which can be quicklispable. Except that CL-BLAPACK is bit by changes in SBCL from 0.x to 1.x, working on that one tomorrow, see the github issues list for cl-blapack.

Changes committed, and build locally for me.

Missing is a function in src/data/import.lisp which will check a CL array and convert strings into appropriate numbers or symbols or just leave as strings. Then for numbers and symbols, mark as appropriate (precision/etc for numbers, special uses for symbols such as nil or T or NA or missingType3 or whatever. And for strings, leave as strings.

Once this function is written, and it can be hinted or take a "best guess" at the class for the column, we'll be back in shape, with a much improved (and supported) data-slurping mechanism, a

best,
-tony

David Hodge

unread,

Nov 6, 2012, 6:58:09 AM11/6/12

to lisp...@googlegroups.com, steven...@illation.com, Faré

I am looking forward to that, as I was expecting to write that myself.

The data frame summarise function I am writing allows the user to specify specific summary metrics (e.g. fivnum, variance etc) according to the column type. If I can depend on that being set, or at least call the type assessment functions when I need, its a win!

While we are talking import, most of the data I am interested comes in fairly complex structured formats (either your legacy fortran stuff or netCDF) . Now in the current scheme I think I saw a function that could be called (I assume for each row?), but am not sure of where its at in the current mini-hackathon.

i have evolved a fairly simple scheme, though at the moment dependant on yet another library for parsing and validation of fields. My plan was to provide the user a wizard to build the record description in a simple list structure.

The parsing function is actually pretty simple and generic , so once I understand the import interface a bit more, its should be easy to hook this up.

Which leads to the possible inclusion of data-format-validation as the library that does all the parsing for me at the moment. In quick lisp, for me seems relatively efficient ( more an impression than anything formal, most of my dataset are 100k records or so, and I don't appear to wait too long for them to be parsed

Not a decision to be made right now, but something to think about.

best,
-tony

A.J. Rossini

unread,

Nov 6, 2012, 2:31:23 PM11/6/12

to lisp...@googlegroups.com, steven...@illation.com, Faré

On Tuesday, November 6, 2012 12:58:13 PM UTC+1, David Hodge wrote:

On 06/11/2012, at 4:11 PM, "A.J. Rossini" <blind...@gmail.com> wrote:

Missing is a function in src/data/import.lisp which will check a CL array and convert strings into appropriate numbers or symbols or just leave as strings. Then for numbers and symbols, mark as appropriate (precision/etc for numbers, special uses for symbols such as nil or T or NA or missingType3 or whatever. And for strings, leave as strings.

Once this function is written, and it can be hinted or take a "best guess" at the class for the column, we'll be back in shape, with a much improved (and supported) data-slurping mechanism, a

I am looking forward to that, as I was expecting to write that myself.

You still can :-).

The data frame summarise function I am writing allows the user to specify specific summary metrics (e.g. fivnum, variance etc) according to the column type. If I can depend on that being set, or at least call the type assessment functions when I need, its a win!

Exactly. And critically, I'd like to be able to drop in complicated datatypes into a slot -- i.e. kinetic-class or time-course-class, which would be a set of time-marked values (think x-y plot, with x=time) .

Eventually, I'd like to enforce the statistical assumption that rows are weakly conditionally independent within a dataset. Which we can do if we can have multiple time-course classes per observation, i.e, in a clinical trial of 10 people, measured daily over a year, there should be 10 rows, with a column for each measurement time. If you wanted to get a set of common times between the measurements, you have to do an intersection on the set of times for each measure, i.e.

(intersection (times var1) (times var2) (times var3))

or something similar to that. Which means that we write the appropriate infrastructure so that if you want to summarize a column of trajectories, it does the right thing !

(this would be cooler than R, actually...)

While we are talking import, most of the data I am interested comes in fairly complex structured formats (either your legacy fortran stuff or netCDF) . Now in the current scheme I think I saw a function that could be called (I assume for each row?), but am not sure of where its at in the current mini-hackathon.

I think I started coding things similar to R's "apply" functions, which will zoom down a margin in a multidimensional array -- but "started" doesn't mean that there is any merit or completion of the actual code.

i have evolved a fairly simple scheme, though at the moment dependant on yet another library for parsing and validation of fields. My plan was to provide the user a wizard to build the record description in a simple list structure.
The parsing function is actually pretty simple and generic , so once I understand the import interface a bit more, its should be easy to hook this up.

Which leads to the possible inclusion of data-format-validation as the library that does all the parsing for me at the moment. In quick lisp, for me seems relatively efficient ( more an impression than anything formal, most of my dataset are 100k records or so, and I don't appear to wait too long for them to be parsed

I like that idea -- and had it on my list of things to do in 2010...

Not a decision to be made right now, but something to think about.

Well, it's something to experiment with right now!

best,
-tony

David Hodge

unread,

Nov 7, 2012, 2:12:08 AM11/7/12

to lisp...@googlegroups.com, "steven.nunez@illation.com Núñez", Faré

On 07/11/2012, at 3:31 AM, "A.J. Rossini" <blind...@gmail.com> wrote:

On Tuesday, November 6, 2012 12:58:13 PM UTC+1, David Hodge wrote:

On 06/11/2012, at 4:11 PM, "A.J. Rossini" <blind...@gmail.com> wrote:

Missing is a function in src/data/import.lisp which will check a CL array and convert strings into appropriate numbers or symbols or just leave as strings. Then for numbers and symbols, mark as appropriate (precision/etc for numbers, special uses for symbols such as nil or T or NA or missingType3 or whatever. And for strings, leave as strings.

Once this function is written, and it can be hinted or take a "best guess" at the class for the column, we'll be back in shape, with a much improved (and supported) data-slurping mechanism, a

I am looking forward to that, as I was expecting to write that myself.

You still can :-).

ok. Will look at tis over the next couple of days.

The data frame summarise function I am writing allows the user to specify specific summary metrics (e.g. fivnum, variance etc) according to the column type. If I can depend on that being set, or at least call the type assessment functions when I need, its a win!

Exactly. And critically, I'd like to be able to drop in complicated datatypes into a slot -- i.e. kinetic-class or time-course-class, which would be a set of time-marked values (think x-y plot, with x=time) .

Eventually, I'd like to enforce the statistical assumption that rows are weakly conditionally independent within a dataset. Which we can do if we can have multiple time-course classes per observation, i.e, in a clinical trial of 10 people, measured daily over a year, there should be 10 rows, with a column for each measurement time. If you wanted to get a set of common times between the measurements, you have to do an intersection on the set of times for each measure, i.e.

(intersection (times var1) (times var2) (times var3))

or something similar to that. Which means that we write the appropriate infrastructure so that if you want to summarize a column of trajectories, it does the right thing !

(this would be cooler than R, actually...)

While we are talking import, most of the data I am interested comes in fairly complex structured formats (either your legacy fortran stuff or netCDF) . Now in the current scheme I think I saw a function that could be called (I assume for each row?), but am not sure of where its at in the current mini-hackathon.

I think I started coding things similar to R's "apply" functions, which will zoom down a margin in a multidimensional array -- but "started" doesn't mean that there is any merit or completion of the actual code.

i have evolved a fairly simple scheme, though at the moment dependant on yet another library for parsing and validation of fields. My plan was to provide the user a wizard to build the record description in a simple list structure.
The parsing function is actually pretty simple and generic , so once I understand the import interface a bit more, its should be easy to hook this up.

Which leads to the possible inclusion of data-format-validation as the library that does all the parsing for me at the moment. In quick lisp, for me seems relatively efficient ( more an impression than anything formal, most of my dataset are 100k records or so, and I don't appear to wait too long for them to be parsed

I like that idea -- and had it on my list of things to do in 2010...

Not a decision to be made right now, but something to think about.

Well, it's something to experiment with right now!

No need to experiment. I will write up my simple parser for fixed length fields and I guess write the wizard.

Hmm, more entries in the todo list.

Reply all

Reply to author

Forward