Is there any desire or need for a Clojure DataFrame?By DataFrame, I mean a structure similar to R's data.frame, and Python's pandas.DataFrame.Incanter's DataSet may already be fulfilling this purpose, and if so, I'd like to know if and how people are using it.From quickly researching, I see that some prior work has been done in this space, such as:Rather than going off and creating a competing implementation (https://xkcd.com/927/), I'd like to know if anyone here is actively working on, or would like to work on a DataFrame and related utilities for Clojure (and by extension Java)? Is it something that's sorely needed, or is everybody happy with using Incanter or some other library that I'm not aware of? If there's already a defacto standard out there, would anyone care to please point it out?As background information:My specific use-case is in NLP and ML, where I often explore and prototype in Python, but I'm then left to deal with a smattering of libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with their own ad-hoc implementations of algorithms, matrices, and utilities for reading data. It would be great to have a unified way to explore my data in the Clojure REPL, and then serve the same code and models in production.I would love for Clojure to have a broadly compatible ecosystem similar to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and Incanter appear to fulfill a large chunk of those roles, but I am not aware if they've yet become the defacto standards in the community.Any feedback is greatly appreciated.
Chris, thanks for the reply.It's good to know that I'm not the only one who misses this functionality! My goal is definitely to be compatible with Incanter and core.matrix, as they both seem mature, and I will never have the time to implement that functionality from scratch myself. I'll be studying the source of Pandas over the next few days, as I want to have a good idea of how they implement their dataframes before starting on the Clojure version. My long-term goal is for future authors to look to this set of core tools for data analysis as the basis for any packages they build.If you'd like to publish whatever you've written (hacked up code is ok), I'll take a look at that as a starting point, or at least as one possible design.- Arthur
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/4a_f1-xboOY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This is already working well for the array programming APIs (it's easy to mix and match Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in computations).
This is already working well for the array programming APIs (it's easy to mix and match Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in computations).While we could agree to some extent on the other parts of your post but the GPU part is *NOT* true: I would like you to point me to a single implementation anywhere (Clojure or other) that (easily or not) mixes and matches arrays in RAM and arrays on the GPU backend. It simply does not work that way.
Renjin and Spark's dataframes are not going to be easily removed from their respective codebases, as far as my brief perusal of the source can tell. I agree that N-D DataFrames would be a good addition to the ecosystem, similar to the goals of Python's xarray (xarray.pydata.org). However, it is not a priority for myself as of this time. Thanks for pointing out the DataSet proposal. I'll take a look at that later.On a slightly related note, where is the best place to ask core.matrix questions? I have some small questions about sparse matrix support in core.matrix, and what sparse formats are implemented.
Chaoya,I haven't been working on this, and I don't really intend to anytime soon, there's other work that I must attend to in the immediate time-frame.- Arthur