Hello,
I have been developing daru (
https://github.com/v0dro/daru) for almost 4 months now and it has now evolved into a pretty robust library for day to day data analysis tasks.
The latest version (0.0.5) was released on 28th February and contains a large number of new features. Some of them are:
- New improved plotting DSL
- Faster CSV loading
- Compatibility with nmatrix and statsample
- Ability to split or group data with the
#group_by
function - Generating quick data statistics with
#pivot_table
- A whole array of statistics functions on
Daru::DataFrame
- Hierarchical indexing of data with
Daru::MultiIndex
I'd like to take daru ahead and make it a very robust data analysis tool in Ruby that data scientists can use for all sorts of data processing. I'd like to keep this as an option for my GSOC project, under SciRuby's guidance, so that the library can be developed keeping real world needs in mind.
I have planned some features for future releases which I think can be implemented in the the GSOC period. Please tell me your thoughts about them. They are listed below:
-
Complete
compatibility with statsample for performing various statistics
operations.
-
Read
data into DataFrame from more sources (currently only CSV support is
active)
-
Support
for indexing data on TimeSeries.
-
Arel-like
querying syntax for DataFrame.
Also if you have anything else in mind let me know that too!