Hi Steve -
You've got a good point regarding community. However, we are in a
bootstrapping stage -- I'd have no problems with having such tools,
and such support, but someone has to do this. For me, I believe that
in order to write basic user requirements for this project, we need to
have some preliminary data and experimentation. This is closer to a
research project than a working system -- as I mentioned before, if
you want to get things done, use R.
(truth in advertising -- I was one of the first 20 folks to compile
and run the first version of R, back in the mid 90s, and contributed
the first systems and prototypes for for many capabilities that are
considered "critical" to R's success)
What I don't want at this stage is bureaucracy -- while I'm a bit more
into project infrastructure and coordination than Tamas, I still feel
that we need experimentation to move forward.
I've provided some possible directions -- and clearly, we are all
volunteers, unlike any commissioned or commercial projects. And at
this point, while I have a vision that this is the right direction to
go (using common lisp as a basis for a new data analysis system which
provides a different and more modern approach than the S/R
languages....), I still need data in the form of "this is what I'd
like to solve".
For me, the examples that are contributed provide a means to identify
low-hanging fruit as well as commonality between people -- I've never
seen an open source project work which didn't let people do what they
had passions for. (and I guided the ESS project for over 10 years,
it's the oldest (circa 1989) continuous open source
statistical-computing oriented project still running). I did similar
with the XEmacs project back in 1995, and have watched numerous
collaborative projects die -- and have rescued others (the lisp-matrix
suite comes to mind, as well as the Common LispStat pre-alpha system
that I started this from).
So for the blog posts on predictive analytics -- we are so far before
that, that it isn't even funny. The basic regression routines need
to be rewritten using BLAS (they were part of liblispstat and not
efficiently from a stat-theory perspective written, using the sweep
operator approach to linear regression rather than SVD-based
algorithms, though someone could translate the code from lispstat --
which brings up a point, is there a good C2CL converter?). We need to
get dataset management going -- the dataframe routines are a joke, and
require lots more test cases and even something as basic as what I
proposed (column-typing).
TODO.org basically is a simple way to fill in needs as I either see
them, or as others see them. If we are going to use a different
system, someone needs to own and take charge of legacy data. Since
I'm offline more often than not (my primary hacking windows are early
mornings, my tram-ride to work, and my tram-ride back, about 1.5
hours/day total on good days), I pick tools which support that (i.e.
distributed VC like git, and keeping things there).
Since we don't have yet a formalised manifesto, I hesistate to open up
a collaborative wiki (again, the legacy data issue).
Since there are only 23 folks on the mailing list, of which we've
heard from 5 on a regular basis, it's going to be an uphill climb to
create a community, but clearly feasible.
I'd like to have a basic system in place before advertising too much
-- i.e. dataframes capability, basic visuals, descriptive numerics,
some data management functionality (integration, splitting,
reshaping), probability calculations and calculus (bayesian,
resampling, etc)
"I'd like".
Not required, of course.
Remember, this mailing list has been around since 2009, and only now
that I'm making some progress (updating, cruft-removal, new features
and approaches based on the many changes since then) and can promise
that with high probability, I'll be working on this for the next year
or 2, am I willing to committ to setting some goals.
And right now, as you point out, we need some. How about taking a
first stab at laying out what you think we should do and how we'll
implement it? The only thing I'd ask, is that you take in
consideration our limited resources, and right now I'm committed to:
1. coding up the data import, dataframes, and getting illustrative
examples written so that I can start to decide which analytics to
implement (and how they should be)
2. providing input and feedback to queries on this mailing list.
So take a stab at what you'd like to do and how you'd like to do it.
However, the one thing I'd ask is that if you are going to propose a
tool or system that requires maintainance, that you find a means to
supply that effort. ie on the XEmacs project a few years ago, we had
an issue tracker, but the first one proved useless because no one
claimed ownership of it. I've claimed "ownership" of TODO.org,
which should morph into project planning (i.e. the way org-mode will
let one do it), but there are other tools that are available, and I'd
gladly use as a client or user (not supporter or owner) if someone
owns and supports.
But my suggestion is to hold off on the blog posts until we've
"written" the content for them. I'm working on the content, but you
can help as well, and that was my point with the "Examples". And in
particular, things like David's examples need to be worked into a
formal example (which I'll do, unless some one beats me to it).
What would be wonderful for me? To have by the end of 2013, data
management, auditable objects, metadata coded regarding assumptions
for statistical and data analytic procedures (almost a project in
itself), dynamic and interactive graphics (neq dynamic interactive),
and from the analytics, basic regression infrastructure and resampling
infrastructure. If we could implement accelerators and macros for
MCMC and similar posterior likelihood calculations, I'd be in heaven.
Should I write done the project plan based on that as a strawman?
best,
-tony
--
best,
-tony
blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we
can easily roll-back your mistakes" (AJR, 4Jan05).
Drink Coffee: Do stupid things faster with more energy!