Dear all,
I try to slowly change my data science work towards clojure, currently working exclusively in R.
I would be far more confident in this, if I would know that I can still use R, if something is missing.
Regarding the pure data munging functionality, clojure is on the same level more-or-less the R.
But as there are far more R packages (for anything you can imagine), so I would like to see
an easy just-in-case integration.
I am aware of the ggplot integration in Gorilla repl, and this is indeed a good idea.
But that's only for plotting, which is only a subset of R functionality.
I would like to have a generic way to work with as little friction as possible in R and Clojure on data frames.
So that ideally I can start creating a data frame in clojure and (if I miss certain functionality (=specific packages) in clojure, I cold go To R and back.
This could be either done with a pipeline orchestrate by "make" or similar, so out-of-process, communicating via files.
I would prefer a solution based on JRI.
I prototyped some functions (using JRI and partial "clj-jri") and they basically result in the idea, to have 3 bridge functions in clojure:
(assign-dataframe-in-R var-name dataset)
(execute-r-source "transform.R")
(get-dataset-from-R var-name)
They should
- transparently convert a clojure data structure into an R object (data.frame or data_frame)
- set the object it in the current R session,
- execute a piece of R code in-process which transforms the arbitrary input data.frame to an arbitrary output data frame.
- does the conversion from R-data frame into clojure data structure and returns it
It is rather straightforward to do with JRI.
This approach has the huge advantage that the transform.R is a very normal R file, which can be edited and debuged
in the same emacs instance the Clojure... The clojure and R files can be side-by-side.
I have decided not to rely on the dataset implementations of Incanter / core.matrix for different reasons,
but to use sequences-of-maps instead.
I am bit wondering what you think about this approach, and if you are aware of a place, where this code could live.
It has a python/R implementation for a in common byte representation of a data frame, but there is not a java/clojure implementation yet.
Regards,
Carsten