dataframes and types

46 views
Skip to first unread message

A.J. Rossini

unread,
May 17, 2014, 11:25:18 AM5/17/14
to lisp...@googlegroups.com
so I've made some progress in understanding what I what to do with dataframes, that can be seen in the RHO package on my github page (and can be D/L-d through there).  There is a example.lisp file and a unittests.lisp file there that should give an idea of what functionality I am looking for.  I'm still exploring this a bit more, and have not finalized my thoughts.  But the typing works, and then there will need to be a summary-generic written for each typed column, so that general summarization makes sense.   

Construction of dataframes from vectors and from combinations of data frames and vectors is what I am thinking about right now, since I'm currently very interested in so-call meta analytic datasets (combinations and compositions of self-coherent datasets into larger entities in an approach which records the potential for incoherencies across the datasets). 

I am not sure that the end result of RHO will be used for Common Lisp Statistics, so I've not added it to quicklisp (and please do not request such yet) as it still might make more sense to move the functionality, once I understand it a bit better, to one of the other packages that we've discussed for holding dataframes.

BUT, to clarify, I want to insist that every row of a dataset is independent (or conditionally independent) from the other rows, and that data structures (elements in a column) must have a means for summarizing down to a (possibly ordered by so-called relevance) vector of summary statistics, which could be a single quantity (self, in the case of numeric data) or multiple quantities (number of nodes, number of vectors, connectivity index) in the case of a graph, or Cmax/Tmax/etc in the case of a kinetic profile).  

From that, different numerical matrices and criteria functions (for optimization) would be created depending on the data analysis desired, la-de-dah, etc.

But basically, I'm just providing a middle-of-progress update. 

I've asked a question on comp.lang.list which I'd like to ask here, which is the following (no need to answer in both places :)

I'm making progress on the dataframes package (thanks Marco A for the seed code that is slowly evolving!).  It's going well, except that one optional feature would be to be able to find all variables in a particular package which currently hold data (ie point to a place) of a particular type.   

(yes, I know I've got the technical details and names wrong -- educational corrections welcome, thanks...) 


;; I.e. I'd like to be able to 

(defparameter my-s (make-strand ....)) 

(defparameter my-df (make-data-frame ....)) 

;; 

(find-all-data-variables) ; => (my-s my-df) 


;; What I thought I could do is something like: 

(defun find-all-data-variables (&key (pkg *package*)) 
 (let ((lst ())) 
    (do-symbols (s package) 
      (if (typep s 'STRAND) 
          (push s lst)) 
      (if (typep s 'DATA-FRAME) 
          (push (data-frame-column-names s) lst))) 
    lst)) 

;; but I think I am getting confused between variables and places (this is not ;; the first time, and it will not be the last time, I think...).   

Is what I am doing possible, or am I just making an error in concept somewhere? 



best,
-tony

Mirko Vukovic

unread,
May 19, 2014, 9:19:31 PM5/19/14
to lisp...@googlegroups.com


On Saturday, May 17, 2014 11:25:18 AM UTC-4, A.J. Rossini wrote:
so I've made some progress in understanding what I what to do with dataframes, that can be seen in the RHO package on my github page (and can be D/L-d through there).  There is a example.lisp file and a unittests.lisp file there that should give an idea of what functionality I am looking for.  I'm still exploring this a bit more, and have not finalized my thoughts.  But the typing works, and then there will need to be a summary-generic written for each typed column, so that general summarization makes sense.   

Construction of dataframes from vectors and from combinations of data frames and vectors is what I am thinking about right now, since I'm currently very interested in so-call meta analytic datasets (combinations and compositions of self-coherent datasets into larger entities in an approach which records the potential for incoherencies across the datasets). 

I am not sure that the end result of RHO will be used for Common Lisp Statistics, so I've not added it to quicklisp (and please do not request such yet) as it still might make more sense to move the functionality, once I understand it a bit better, to one of the other packages that we've discussed for holding dataframes.

BUT, to clarify, I want to insist that every row of a dataset is independent (or conditionally independent) from the other rows, and that data structures (elements in a column) must have a means for summarizing down to a (possibly ordered by so-called relevance) vector of summary statistics, which could be a single quantity (self, in the case of numeric data) or multiple quantities (number of nodes, number of vectors, connectivity index) in the case of a graph, or Cmax/Tmax/etc in the case of a kinetic profile).  

From that, different numerical matrices and criteria functions (for optimization) would be created depending on the data analysis desired, la-de-dah, etc.

But basically, I'm just providing a middle-of-progress update. 

Stuff deleted ...

Tony,

I did not have time to check out your unit tests - will do over the weekend.

I would suggest that data-frames focus on the container design - a collection of typed vectors with search and extraction capabilities.  I would also add serialization capabilities to data-frames.

Statistics would be stored in a different type of data structure - possibly classes based on the base ``statistics'' class.  All statistics would have links to columns of data-frams from which they are derived.  And some of the statistics objects (such as fits or power spectra) would store their data in local data-frames.

I have also started on a similar path as you - writing down use cases, and contemplating the software architecture.  I sketched out a design bases on CLOS+MOP.  A deftable (expanding on defclass) would be used to define the table schema.  With CLOS, one can then define schema inheritance, and methods specialized on table type.

This work is very much unfinished, incomplete, and not fully coherent..  I'll put it up over the weekend.

Mirko

Mirko Vukovic

unread,
May 27, 2014, 2:53:42 PM5/27/14
to lisp...@googlegroups.com

On Saturday, May 17, 2014 11:25:18 AM UTC-4, A.J. Rossini wrote:
so I've made some progress in understanding what I what to do with dataframes, that can be seen in the RHO package on my github page (and can be D/L-d through there).  There is a example.lisp file and a unittests.lisp file there that should give an idea of what functionality I am looking for.  I'm still exploring this a bit more, and have not finalized my thoughts.  But the typing works, and then there will need to be a summary-generic written for each typed column, so that general summarization makes sense.   

Tony,

can  you post a link to your repo?  I am having trouble finding it.

Thanks,

Mirko

A.J. Rossini

unread,
May 27, 2014, 3:24:47 PM5/27/14
to lisp...@googlegroups.com
Look for the "rho" repository on my github pages.   Will send link tomorrow
--
You received this message because you are subscribed to the Google Groups "Common Lisp Statistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-stat+...@googlegroups.com.
To post to this group, send email to lisp...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-stat.
For more options, visit https://groups.google.com/d/optout.


--
Sent from Gmail Mobile

A.J. Rossini

unread,
May 28, 2014, 5:21:32 AM5/28/14
to lisp...@googlegroups.com
--
best,
-tony

blind...@gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes" (AJR, 4Jan05).

Drink Coffee:  Do stupid things faster with more energy!
Reply all
Reply to author
Forward
0 new messages