GSOC idea: Better data analysis in ruby

47 views
Skip to first unread message

Sameer Deshmukh

unread,
Mar 3, 2015, 12:53:59 PM3/3/15
to SciRuby Mailing List
Hello,

I have been developing daru (https://github.com/v0dro/daru) for almost 4 months now and it has now evolved into a pretty robust library for day to day data analysis tasks.

The latest version (0.0.5) was released on 28th February and contains a large number of new features. Some of them are:
  • New improved plotting DSL
  • Faster CSV loading
  • Compatibility with nmatrix and statsample
  • Ability to split or group data with the #group_by function
  • Generating quick data statistics with #pivot_table
  • A whole array of statistics functions on Daru::DataFrame
  • Hierarchical indexing of data with Daru::MultiIndex
You can see an overview of latest daru features on my blog: http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/

I'd like to take daru ahead and make it a very robust data analysis tool in Ruby that data scientists can use for all sorts of data processing. I'd like to keep this as an option for my GSOC project, under SciRuby's guidance, so that the library can be developed keeping real world needs in mind.

I have planned some features for future releases which I think can be implemented in the the GSOC period. Please tell me your thoughts about them. They are listed below:
  • Complete compatibility with statsample for performing various statistics operations.

  • Read data into DataFrame from more sources (currently only CSV support is active)

  • Support for indexing data on TimeSeries.

  • Arel-like querying syntax for DataFrame.

Also if you have anything else in mind let me know that too!

Regards,
Sameer Deshmukh

John Woods

unread,
Mar 3, 2015, 12:56:34 PM3/3/15
to SciRuby Mailing List
Hi Sameer,

I think this is a great project idea. To approve it, however, I want you to find a mentor who is going to make active use of your code. In general, I want that to be the case for all projects — no one should be writing code unless someone is going to be using it immediately and can provide feedback as a user (not just from a distance).

Make sense?

John

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Claudio Bustos

unread,
Mar 3, 2015, 1:44:20 PM3/3/15
to sciruby-dev

I could test it. Since july I will a lot more time to help

Carlos Agarie

unread,
Mar 4, 2015, 8:45:46 AM3/4/15
to sciru...@googlegroups.com
I changed jobs recently (this week, to be exact) and my new position is as a "data scientist" (the first one in the company), so I could test it. I should spend one or two more weeks studying the current infrastructure and the big problems before building models and playing with data, so let's keep in touch. :P

By the way, having example IRuby notebooks in your README.md is really great!


-----
Carlos Agarie
Software Engineer
+55 11 97320-3878 | @carlos_agarie

Sameer Deshmukh

unread,
Mar 5, 2015, 7:02:55 AM3/5/15
to sciru...@googlegroups.com
Thank you!
Reply all
Reply to author
Forward
0 new messages