Should I use daru?

88 views
Skip to first unread message

Shallment

unread,
Jul 20, 2016, 10:18:53 AM7/20/16
to SciRuby Development
I'm writing [tushare-ruby](https://github.com/waditu/tushare-ruby). It's a project to fetch and combine the stock data from the web. And I'm just using something like `[{'code' => 'xxx', 'price' => xxx}.....]` as result. Then I heard daru, because the parent project [tushare](https://github.com/waditu/tushare) is using python and panda. Since daru is something like panda, so I'm considering  using daru. But when I use both Array and Dataframe to do some sorting, Dataframe is slower. I don't know should I change it? What's the benifit of using daru?

John Woods

unread,
Jul 20, 2016, 10:32:40 AM7/20/16
to SciRuby Development
Daru uses NMatrix for storage. Array and NMatrix store data differently. I'm over-simplifying, but the former uses a doubly-linked list which also behaves like a vector, and the latter uses a vector. So for many use-cases, NMatrix will be faster — but maybe not if you're adding items to a very, very long list on the fly.

On Wed, Jul 20, 2016 at 9:18 AM Shallment <shall...@gmail.com> wrote:
I'm writing [tushare-ruby](https://github.com/waditu/tushare-ruby). It's a project to fetch and combine the stock data from the web. And I'm just using something like `[{'code' => 'xxx', 'price' => xxx}.....]` as result. Then I heard daru, because the parent project [tushare](https://github.com/waditu/tushare) is using python and panda. Since daru is something like panda, so I'm considering  using daru. But when I use both Array and Dataframe to do some sorting, Dataframe is slower. I don't know should I change it? What's the benifit of using daru?

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rodrigo Botafogo

unread,
Jul 20, 2016, 5:49:59 PM7/20/16
to SciRuby Development
Maybe you could also take a look at SciCom.  It runs on JRuby and uses Renjin (R engine) for the JVM.  Don´t know if it will be fast, though.

Good luck.

Sameer Deshmukh

unread,
Jul 22, 2016, 3:24:08 AM7/22/16
to SciRuby Development
Shallment,

Daru will offer ready made data structures for indexing data with objects or datetimes (see Daru::DateTimeIndex). It has many methods for performing statistical analysis of time series data (aka stock market data) and also works with missing data.

Yes it's slower than it's Python counterparts, mainly because it's written in pure Ruby, unlike pandas which has a large chunk of it's functionality written in Cython. We are working on the speed and there has been a drastic improvement in speed in the current version as opposed to previous versions. Speed will only improve in later versions.

Daru is not able to leverage nmatrix completely since it's important to maintain compatibility with JRuby, but this will soon be improved once NMatrix is available for JRuby. We will probably shift all storage requirements to NMatrix once this is done, which will improve daru further.

People using pandas previously will also have a less steep learning curve to go through.
Reply all
Reply to author
Forward
0 new messages