GSoC'17 discussion : Make Daru more ready for integration with modern Web framework

105 views
Skip to first unread message

Shekhar Prasad Rajak

unread,
Mar 17, 2017, 10:21:50 AM3/17/17
to SciRuby Development
Hello,
My name is Shekhar Prasad Rajak. I want to discuss about idea "Make Daru more ready for integration with modern Web framework" (link: https://github.com/SciRuby/sciruby/wiki/Google-Summer-of-Code-2017-Ideas#make-daru-more-ready-for-integration-with-modern-web-framework). I am going to apply for GSoc'17, so trying to know what Daru community expecting.

I have some questions :

Part 1: Import
===============

> importers at least for the following sources (a matter to discuss):
> ActiveRecord
> Sequel
> JSON
> Redis

If I understood properly then there is method `from_activerecord` to import actie records in dataframe (link : https://github.com/SciRuby/daru/blob/master/lib/daru/dataframe.rb#L100), similarly
`from_sql`,  `from_excel`.

Means there is need of methods `from_json`, `from_redis`.


Part 2: Export
==============

I have seen methods `to_json`, `to_html` in dataframe.rb. I don't see methods like `to_sql`, `to_csv`, `to_plaintext`, so we need all these methods.

Part 3: Presentation
====================

I have gone through the links: https://github.com/SciRuby/daru#visualization and found that there are many kind of graph and charts can be created.

Please give me some idea what to do in this section. Actually I didn't understand these lines:

>daru(dataframe, **options) helper, which can be called from any view templating/layouting system and returns dataframe formatted into HTML;
>it should be more sophisticated than current DataFrame#to_html


 Please share links of examples and ideas, so that I can understand and explore them better.

Thanks,

--

Shekhar Prasad Rajak

Victor Shepelev

unread,
Mar 20, 2017, 3:56:51 AM3/20/17
to SciRuby Development
Hey!

Some answers inside.

2017-03-17 16:21 GMT+02:00 Shekhar Prasad Rajak <shekharr...@gmail.com>:
Hello,
My name is Shekhar Prasad Rajak. I want to discuss about idea "Make Daru more ready for integration with modern Web framework" (link: https://github.com/SciRuby/sciruby/wiki/Google-Summer-of-Code-2017-Ideas#make-daru-more-ready-for-integration-with-modern-web-framework). I am going to apply for GSoc'17, so trying to know what Daru community expecting.

I have some questions :

Part 1: Import
===============

> importers at least for the following sources (a matter to discuss):
> ActiveRecord
> Sequel
> JSON
> Redis

If I understood properly then there is method `from_activerecord` to import actie records in dataframe (link : https://github.com/SciRuby/daru/blob/master/lib/daru/dataframe.rb#L100), similarly
`from_sql`,  `from_excel`.

Means there is need of methods `from_json`, `from_redis`.

Not only that. What is necessary is some kind of "foundation" for new importers to be easily defined. Imagine somebody (not daru maintainers) want to import dataframes from Redis -- and probably even publish it for others (as a separate gem)? Now, this person should dig deep into Daru, and understand how those `from_xxx` methods are created and interacted with. What would be better is something like that (pseudocode, just to show the idea):

class RedisImporter < Daru::IO::Importer
  def initialize(connection)
  ....
  end

  # returns just an array of hashes, base class does the rest
  def fetch(some params) 
  ....
  end
end

^ and, when something like this is required, you can immediately do `DataFrame.from_redis(...)`, and it will work.
After that, we can easily define tons of importers.
 


Part 2: Export
==============

I have seen methods `to_json`, `to_html` in dataframe.rb. I don't see methods like `to_sql`, `to_csv`, `to_plaintext`, so we need all these methods.

The same as about importers said, relates here.
 

Part 3: Presentation
====================

I have gone through the links: https://github.com/SciRuby/daru#visualization and found that there are many kind of graph and charts can be created.

Please give me some idea what to do in this section. Actually I didn't understand these lines:

>daru(dataframe, **options) helper, which can be called from any view templating/layouting system and returns dataframe formatted into HTML;
>it should be more sophisticated than current DataFrame#to_html


Well, current Daru visualisation abilities target scientific usage, they can be used to understand data in IRuby notebooks. But, when you want to visualize some data for integration into existing web application, you have different requirements. 
E.g., when experimenting, you want "don't make me think or setup stuff, just show the data in the most simple and visible way". 
But when integrating with some application, you'd rather want "give me enough settings for show the tables with appropriate styles, with fancy controls and stuff".

So, we need a view helpers (that methods you was confused about), for using in Rails Views, for example, like this (considering Slim template):

html 
  head
  body
    div id="data"
      = daru(sales_dataframe, rows: 100, sortable: [:amount, :date], style: :compact)
    ....

Does it make sense for you?

V.

Shekhar Prasad Rajak

unread,
Mar 20, 2017, 11:50:55 PM3/20/17
to SciRuby Development

Thank for reply. I will explore these points more. 
What I think is : 

Import/Export
=============

We must have one base file `io.rb` (parent class) and derived classes in different files, say `csv.rb`, `excel.rb`, `json.rb`, `html.rb` and so on. Whenever we want to add new import/export method then we just need to define it in new file inheriting the `io.rb` (similar to the index, where `Index` is base class and derived classes are `MultiIndex` and `CategoricalIndex`).

I will discuss it more after designing proper structure.

Presentation:
==========

After some research I see very popular gems created by Andrew Kane.

Few gems that can be used in Daru is :

1. Google Charts 

The Google charts API allows you to submit data to Google, and their API will respond with a URL to an image. You can do venn diagrams, scatter graphs, polar charts, bar charts, world maps, etc. It's a nice way to embed some graphs in a web app.

1. Charkick [link : https://github.com/ankane/chartkick] : can be helpful in
creating charts and graphs.

2. searchkick [link: https://github.com/ankane/searchkick] : This can be useful in searching data (pretty fast).

I am exploring the projects/applications where these gems are used, to understand them better.

I see that Daru already have nice visualization techniques [ link : https://github.com/SciRuby/daru#visualization] using Nyaplot, GnuplotRB, Gruff. Do we need new presentation techniques apart from these ? Can we don't use them in web application? 

Right now I am reading about active record, redis. How they work and how we can import/export data from/to it.

We mostly want Daru to be used in rails (mostly), as data analysis library. Please shares any project or example links related to this, if possible.It will help me. Right now I now this rails app [link : https://github.com/lokeshh/daru_rails], that uses daru.

I hope I will come up with better idea soon and extend these above ideas.

--
Shekhar

Victor Shepelev

unread,
Mar 22, 2017, 9:12:04 AM3/22/17
to SciRuby Development
Shekhar, 

I believe, that the first, and the most important preparation task is not to decide "how" to do this project (which libraries, or files structure), but "what and why" to do.

Imagine some large existing Rails application, just a standard online shop for example.
Imagine somebody wants to add sales analytics reports into admin part of the shop, and evaluates Daru usage for that.
What will this person most probably need?

1. First, fetch some data about sales:
* from database -- probably ActiveRecord, or possibly Sequel, or something fancy
* from search index -- most probably ElasticSearch
* from some API (for large apps) -- probably JSON or XML

2. Process it with Daru

3. Show the table with some data, but (unlike current `to_html`):
* styled with custom styles of admin panels;
* probably dynamically paged (with "show next 100 rows", or "load more")
* probably sorted (and even dynamically sorted, when after resorting the new data is loaded)
* probably with custom filters
* with some Download button, allowing to export Excel, or CSV, or PDF

4. Show some charts, but, unlike what current visualisations provide, again with custom styles and more "business-y" looking ones, not "scientific-y"

5. Export some long report with selected charts and tables

6. Export data in custom format (XML, JSON, Excel) to external analysis software.

Now, on this use case, we can analyse what components are missing in Daru, and what is the best road to them.

Does it make sense for you?

V.
--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ananyo Maiti

unread,
Mar 23, 2017, 1:14:54 AM3/23/17
to SciRuby Development, shekharr...@gmail.com
Shekhar,
Just wanted to make you aware that there are already some exporters for some formats like csv, excel and sql. Look at the `write_csv`, `write_excel`, `write_sql` methods in dataframe.rb , how they are written and what improvements has to be made.

- Ananya

Shekhar Prasad Rajak

unread,
Mar 23, 2017, 11:31:39 AM3/23/17
to SciRuby Development, shekharr...@gmail.com
Thanks Ananyo, I have tried examples to understand them .

--
Shekhar
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.

Shekhar Prasad Rajak

unread,
Mar 23, 2017, 11:34:12 AM3/23/17
to SciRuby Development

Thanks Victor, those points are really helpful. I have tried to create few rails application and used daru gems in that.

I found that :

1. Normal Import and Export can be done using Daru.

2. To present the data in various view is difficult using Daru (Also I am getting problem in viewing the graphs using Nyaplot in application. `@plot.to_iruby.second ` doesn't display anything in my system. I am searching for solution)

3. We can export the graphs and charts in new html file using Nyaplot and other libraries. But we need to export only chart/graph to display in existing html file.

4. To display nice UI of chart/graph we need some good JS,CSS files to import. E.g. chartkick have it's css,js files, which makes them beautiful.

5. To display large data table we must have pagination feature. To display specific number of rows in table and next/back link and the footer. To do that we can use gems like `will_paginate` etc.

6. to display data from dataframe we need to convert them into normal file (having rows ,coloumn) and then after adding some good css, can be send them into webpage. There are good gems to doing that like `table_for`,`print_table`, etc.

7. Actually we must have one method which can add css/js into our normal output table/chart/garph . And use it in webpage.

8. To analysis of all the tables present in database we have to use/extend Daru features. Means need  modular and extendable methods, that can handle all size of tables/data.

9. Analysis result must be downloadable, I hope, little effort is needed for it.

I am trying to explore rails application where data analysis is used using simple ruby code. I hope I will soon collect more features that is required in data analysis application.


--
Shekhar
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.

Shekhar Prasad Rajak

unread,
Mar 25, 2017, 4:38:18 PM3/25/17
to SciRuby Development


* Along with these I also see that there are many #TODO in group_by spec file.  that means more things have to be added. (I will soon come up after proper understanding it)

* I have seen issue : https://github.com/SciRuby/daru/issues/152, regrading  Redesign of `DataFrame#group_by`. I am interested to work on this. I think the `group_by` coloumn(s) and originam index must be used as MultiIndex of the resultant dataframe. Other column values must be mapped with these MultiIndex in resultant dataframe.

* I find that Nyaplot can do many things and convert graphs/chart to html code, that can be exported in html file.

Also there is an open issue : https://github.com/SciRuby/daru/issues/154 to add more features of Nyaplot into daru. It will be really cool if we use these graph/chart html code to show data in webpage for data analysis. I think more features can be added for that in Nyaplot.


* I am testing the nyaplot output html code into other webpage. Some examples I added into this repo :


I don't know why, but sometime webpage don't load some of the graphs and charts.

* I found that it is very easy to create svg image of graph/chart generated from data using `rubyvis` gem ,link : http://rubyvis.rubyforge.org/index.html
Also it can be eaily attached to html file witout any extra load of css and js.

* I am also discussing about dataframe merge and join here :

I will share my GSoC application docs link soon.

Thanks!

--
Shekhar

Victor Shepelev

unread,
Mar 27, 2017, 4:48:54 PM3/27/17
to SciRuby Development
1. Normal Import and Export can be done using Daru.

Yep, but think about "what it costs now to add RethinkDB (for ex.) importer to Daru" (for user, not for maintainers)? And what it should cost? Ideally, it should be something as easy as

class DaruRethingImporter < DaruImporter
  def initialize(connection)
  ...
  end

  def import(params)
     connection.... ......
     # => some data
  end
end

...and it should be enough to have

Daru::DataFrame.from_rething(conn, params)

Makes sense?

 
2. To present the data in various view is difficult using Daru (Also I am getting problem in viewing the graphs using Nyaplot in application. `@plot.to_iruby.second ` doesn't display anything in my system. I am searching for solution)

Yep. It is difficult AND even when possible targets "scientific notebooks" only.
 
3. We can export the graphs and charts in new html file using Nyaplot and other libraries. But we need to export only chart/graph to display in existing html file.

Exporting to external html file has no real value for dynamic systems :(

4. To display nice UI of chart/graph we need some good JS,CSS files to import. E.g. chartkick have it's css,js files, which makes them beautiful.

Yep. That's why this project probably assumes we'll create additional gems, not add bunch of CSS/JS to daru's core.
 
5. To display large data table we must have pagination feature. To display specific number of rows in table and next/back link and the footer. To do that we can use gems like `will_paginate` etc.

Yep. Just need to investigate how they could/should be integerated with daru in a most natural way.
 
6. to display data from dataframe we need to convert them into normal file (having rows ,coloumn) and then after adding some good css, can be send them into webpage. There are good gems to doing that like `table_for`,`print_table`, etc.

Probably so. Needs investigation too: what would be the most natural way? What would be the most effective way? What would be the most readable way?
 
7. Actually we must have one method which can add css/js into our normal output table/chart/garph . And use it in webpage.

Probably/hopefully.
 
8. To analysis of all the tables present in database we have to use/extend Daru features. Means need  modular and extendable methods, that can handle all size of tables/data.

Well, this is probably not a point for current project (make daru more suitable to the web). I mean, I'd be happy to see by the end of summer something like `Daru.from_db(100k_entries_table).select ... pivot ... ` to work fast as a lightning, but it is probably not a realistic expectation :)
 
9. Analysis result must be downloadable, I hope, little effort is needed for it.

Yep, there is again, like for importers, just a way to "easily" define some new exporters, even fancy ones. 


To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev+unsubscribe@googlegroups.com.

Shekhar Prasad Rajak

unread,
Mar 30, 2017, 6:31:27 AM3/30/17
to SciRuby Development
Thanks Victor for sharing some ideas.

>Yep, but think about "what it costs now to add RethinkDB (for ex.) importer to Daru" (for user, not for maintainers)? And what it should cost?

I read about RethinkDB https://www.rethinkdb.com/api/ruby/ and nobrainer (Ruby ORM for RethinkDB) https://github.com/nviennot/nobrainer and discussed few points with the developers.

RethinkDB is good for :

1. To build scalable realtime apps.
2. When NoSQL database is needed. It stores schemaless JSON documents.
3. Can be used in Rails, sinatra. When we add nobrainer/RethinkDB into Rails [https://rethinkdb.com/docs/rails/] then these lines automatically added in the model <model_name>.rb file, when we generate the model using rails command :

```
  include NoBrainer::Document
  include NoBrainer::Document::Timestamps

```

and then it is ready, running with RethinkDB and Rails!

So I think if we want `Daru` to create new db in web application then we must use `RethinkDB` gem and process the data.

4. Kaminari paginator can be used easily using kaminari-nobrainer https://github.com/nviennot/kaminari-nobrainer

But :

1. It can only connect with the database created by RethinkDb.We can't connect sql, mongodb, pg and other types of databases.

So we need to use some good gems to connect with different databases. Here some ideas:

MongoDB
------------

1. We can use mongo gem to get connection from the MongoDB server.

2. I tried to use mongo gem in ruby program to connect to mongo database and create Daru dataframe from the datatable.


I have commented the output as well.

Postgresql
----------

1. Need to use `pg` gem [http://www.rubydoc.info/gems/pg/0.10.0/PGconn] to get connection.
2. After the getting the database and tables we can process the data using Daru. I will try sample code to do that.


Cubrid (Open Source Database Highly Optimized for Web Applications) for MySql database:


1. We can use `cubrid` gem to get connected with sql database. docs : http://www.cubrid.org/wiki_apis/entry/cubrid-ruby-api-documentation

Why to use cubrid :


2. CUBRID is faster than other popular alternatives. It is designed and optimized for high-traffic Web sites. With CUBRID you no longer need to worry about the rapidly growing size of your data.

3. CUBRID to scale very well as data increases and the number of users grows. No more limits for the number of databases, tables or rows.

But it seems it is only compatible for ruby version "~> 1.8.0"


Google bigQuery for large dataset:
--------------------------------------------------



I don't have much idea about it right now.


I will discuss more about `export`, visualzation in next post soon.

--
Shekhar

Victor Shepelev

unread,
Mar 30, 2017, 6:38:57 AM3/30/17
to SciRuby Development
It all makes sense, but I'd suggest not to get caught into "chase for numbers of importers (or exporters)". Good working architecture, allowing to easy add importers/exporters + pluggable visualisation solution could make much more solid proposal then promise to wrap all the known databases ;)

To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages