Have these changes been pushed into the master repo? I just tried following David's directions and got this:
best,
-tony
On 06/11/2012, at 4:11 PM, "A.J. Rossini" <blind...@gmail.com> wrote:I am looking forward to that, as I was expecting to write that myself.Missing is a function in src/data/import.lisp which will check a CL array and convert strings into appropriate numbers or symbols or just leave as strings. Then for numbers and symbols, mark as appropriate (precision/etc for numbers, special uses for symbols such as nil or T or NA or missingType3 or whatever. And for strings, leave as strings.
Once this function is written, and it can be hinted or take a "best guess" at the class for the column, we'll be back in shape, with a much improved (and supported) data-slurping mechanism, a
The data frame summarise function I am writing allows the user to specify specific summary metrics (e.g. fivnum, variance etc) according to the column type. If I can depend on that being set, or at least call the type assessment functions when I need, its a win!
While we are talking import, most of the data I am interested comes in fairly complex structured formats (either your legacy fortran stuff or netCDF) . Now in the current scheme I think I saw a function that could be called (I assume for each row?), but am not sure of where its at in the current mini-hackathon.
i have evolved a fairly simple scheme, though at the moment dependant on yet another library for parsing and validation of fields. My plan was to provide the user a wizard to build the record description in a simple list structure.The parsing function is actually pretty simple and generic , so once I understand the import interface a bit more, its should be easy to hook this up.Which leads to the possible inclusion of data-format-validation as the library that does all the parsing for me at the moment. In quick lisp, for me seems relatively efficient ( more an impression than anything formal, most of my dataset are 100k records or so, and I don't appear to wait too long for them to be parsed
Not a decision to be made right now, but something to think about.
On Tuesday, November 6, 2012 12:58:13 PM UTC+1, David Hodge wrote:On 06/11/2012, at 4:11 PM, "A.J. Rossini" <blind...@gmail.com> wrote:Missing is a function in src/data/import.lisp which will check a CL array and convert strings into appropriate numbers or symbols or just leave as strings. Then for numbers and symbols, mark as appropriate (precision/etc for numbers, special uses for symbols such as nil or T or NA or missingType3 or whatever. And for strings, leave as strings.I am looking forward to that, as I was expecting to write that myself.
Once this function is written, and it can be hinted or take a "best guess" at the class for the column, we'll be back in shape, with a much improved (and supported) data-slurping mechanism, a
You still can :-).
The data frame summarise function I am writing allows the user to specify specific summary metrics (e.g. fivnum, variance etc) according to the column type. If I can depend on that being set, or at least call the type assessment functions when I need, its a win!
Exactly. And critically, I'd like to be able to drop in complicated datatypes into a slot -- i.e. kinetic-class or time-course-class, which would be a set of time-marked values (think x-y plot, with x=time) .
Eventually, I'd like to enforce the statistical assumption that rows are weakly conditionally independent within a dataset. Which we can do if we can have multiple time-course classes per observation, i.e, in a clinical trial of 10 people, measured daily over a year, there should be 10 rows, with a column for each measurement time. If you wanted to get a set of common times between the measurements, you have to do an intersection on the set of times for each measure, i.e.
(intersection (times var1) (times var2) (times var3))
or something similar to that. Which means that we write the appropriate infrastructure so that if you want to summarize a column of trajectories, it does the right thing !
(this would be cooler than R, actually...)While we are talking import, most of the data I am interested comes in fairly complex structured formats (either your legacy fortran stuff or netCDF) . Now in the current scheme I think I saw a function that could be called (I assume for each row?), but am not sure of where its at in the current mini-hackathon.
I think I started coding things similar to R's "apply" functions, which will zoom down a margin in a multidimensional array -- but "started" doesn't mean that there is any merit or completion of the actual code.
i have evolved a fairly simple scheme, though at the moment dependant on yet another library for parsing and validation of fields. My plan was to provide the user a wizard to build the record description in a simple list structure.The parsing function is actually pretty simple and generic , so once I understand the import interface a bit more, its should be easy to hook this up.Which leads to the possible inclusion of data-format-validation as the library that does all the parsing for me at the moment. In quick lisp, for me seems relatively efficient ( more an impression than anything formal, most of my dataset are 100k records or so, and I don't appear to wait too long for them to be parsed
I like that idea -- and had it on my list of things to do in 2010...
Not a decision to be made right now, but something to think about.
Well, it's something to experiment with right now!