Sorry for being out for the last few days. Family issues, I'm an
alone parent, still resolving issues with my late wife's inheritance,
etc...
At this point, I'd rather see code and examples than worry about infrastructure.
Set up a github account, clone the repo, download locally, and start
playing in the examples directory, which is where I've been doing the
"blog" stuff, using either org-mode as a literate programming tool or
just lisp-plus-comments.
The general principle:
1. write what you think you want to write, and then
2. write what you need to write,
3. and then we'll gap-analysis the difference.
So for example, I'm trying to get the t-test working :-). I've got
code (not checked in) that does it, but it's #2, not yet #1.
For tables -- #2 looks like what David put together, but what do we
want for #1? Well, we need a table- or tabular- or
cross-classification- class (CC-class), which represents counts, and
is completely different than a dataframe. Ideally, in that CC-class
structure, we want to store how it got there (i.e. if from a file, a
quick metadata record, if from a data-frame, etc...), so that we can
audit the path.
This will be important later, see comments on reproducibility when I
wax philosophical in the source code.
For missing data -- the same. We need extensions of the numbers to
include infinity, missingness-categories, as well as nominal and
ordinal categorical variable storage structures.
Again, the TODO.org file in the main directory needs to be updated
with these tasks, and I'll see about this this afternoon during my
afternoon coffee break...
Check in "experiments in how we could do things" into the examples
directory, and just make sure (if possible, if you want) to
distinguish between #1 (done right) and #2 (done right now...).
If you are looking at ways to contribute, and these will get dumped
into TODO.org later:
1. look at Tamas' package which includes infinity, and think about
how to add a few objects which represent various missingness states
(just do a single one, and then we can get the whole semi-heirarch
family of categories, MAR, MCAR, CAR, non-ignoreable, etc...)
2. put David's code for generating declt code into the doc directory
with a makefile so others can work on modifying docs based on the
resulting output. I've got a quick hack for making it work in
quicklisp local's directory, that I'll share and put into the
documentation subdirectory as a note
3. convince Tamas to release his graphics package and see if Mirko's
grammar of graphics is in the right direction, and see if they can be
pieced together as an example (okay, this is huge)
4. figure out how to enforce typing in dataframe columns (should be an
error if someone adds/mods a value which is not the right type).
Figure out how to type the various data forms (comp-sci types,
statistical types, the overlap)
5. write examples, and get'em working so we can refactor and migrate
code into the code base...
I'll start a new thread on the various topics later, maybe tomorrow.