GSoC 2018 | Interested in making daru more usable for log data analysis

91 views
Skip to first unread message

Rohit Ner

unread,
Feb 28, 2018, 6:37:50 AM2/28/18
to SciRuby Development
Hello everyone.

I'm Rohit Ner, a sophomore from IIT, Kharagpur. I'm quite familiar with Web Development, Ruby, Rails, Jekyll, Git and GitHub workflow. I came across this project idea - "Ruby/Rails data analysis tools". I was thinking about the library structure for this purpose. The logs can be imported processed and visualized using the daru series of gems. For making the library developer friendly, I was thinking of developing a dashboard interface like that of goaccess.io.
I have managed to fix some issues earlier in the daru repository.

Can @athityakumar / @zverok / @Shekharrajak please tell me if I am thinking in the right way? I also want to about any other prerequisites which would give me clearer thoughts towards the solution.`

Regards
Rohit Ner
Second Year Undergraduate Student
Department of Mathematics
Indian Institute of Technology, Kharagpur, India

Shekhar Prasad Rajak

unread,
Mar 2, 2018, 10:17:57 PM3/2/18
to SciRuby Development
Hello Rohit,

For the project , you must have clear knowledge about daru, daru-io, daru-view features. So just go through the wiki pages and blogs written by me and Athitya for daru-view and daru-io. There is 3 subsections in the project 'Business Intelligence with daru', 'Data cleaning library', 'Ruby/Rails data analysis tools'. Please go through it all. 

>I was thinking of developing a dashboard interface like that of goaccess.io.
Yes, we wan to create similar things by using daru and daru plugin gems. 

Regards,
Shekhar

Victor Shepelev

unread,
Mar 8, 2018, 2:53:08 PM3/8/18
to SciRuby Development
Dear Rohit!

Please sorry my late reply, now I am fully online for the GSoC. Hope you haven't dropped your hopes due to the waiting :(

I believe that dashboard is an extremely good goal and demonstration of Daru infrastructure abilities. So, I encourage you to proceed in this direction. It could be even a good idea to formulate the proposal around the "provide dashboard", and include "...and develop the necessary libraries on the road".

Please let me know what do you think about this, and feel free to ask any questions.

V.

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rohit Ner

unread,
Mar 10, 2018, 4:45:38 PM3/10/18
to SciRuby Development
Hi Victor!

I am really happy to hear from you. 
After browsing through some rails templates like this one. I think daru plugins can be integrated with the template app. I wanted to know if doing so will be a good idea for such a project as it will require updating the dependencies of the templates. Pros are that it will save time in developing front end for the matter.

Victor Shepelev

unread,
Mar 12, 2018, 4:44:09 AM3/12/18
to SciRuby Development
Hi Rohit!

I believe that thinking about Rails plugins/templates is a bit of "starting from the wrong end".
I'd say, that logical structure of the project could be:

1. Define what you can parse from web framework (Rails/non-Rails) logs;
2. Define what you want to output from this data (metrics, tables, indicators);
3. Handle parsing in generic way (e.g. provide infrastructure for parsers and collectors of different data, test on some example Rails logs)
4. Handle generation output in generic way (again, provide some generic infrastructure for "rendering", while first target could be just rendering to console!)
5. Work on "bells and whistles" -- integrations with Rails and other frameworks and other export formats and so on.

The thing is, if you'll have enough time and energy to develop 1-4, and will not have enough time for 5, the result would already be important achievement. But if you'll spend half of the time playing with good templates, and defining on updating template dependencies and front-end quirks and then will have not enough time to develop daru-related code, it would be sad...

Hope that helps.

V.

--

Rohit Ner

unread,
Mar 14, 2018, 1:34:12 AM3/14/18
to SciRuby Development
Hi Victor,

The gem request-log-analyzer has a proper structure defined for parsing a rails logs as well as exporting it to console/html files. I have some ideas in mind
1. daru stores the output of this gem and does further data analysis using its present methods
2. a data cleaning library to make dataframes robust
3. developing ready-to-use business methods. I need your inputs in knowing what methods if provided from daru package, will make it more useful for firms

My intention is to make full use of the parsing gem mentioned above and concentrating on the output part. As preliminary work, should I try to monkey patch the gem to export data to daru?

Regards
Rohit

Victor Shepelev

unread,
Mar 20, 2018, 4:43:19 PM3/20/18
to SciRuby Development
Dear Rohit!

I have read through your proposal. Before we'll discuss details and plans, I have two big things two discuss.

1. Current work in the library (though pretty slow mostly because of me) is going towards separating core (Dataframe/Vector/Index) from "periphery" (IO, analysis, views and so on), which will simplify maintaining and contributions to each of libraries. Therefore, planning to monkey-patch some methods, and include new modules in the daru library itself maybe not as useful, as creating separate library/libraries for BI and data cleaning. In fact, this approach allows to focus the work and limit its scope. Some of "core functionality" methods still could be committed to daru itself, but some others that exist and are NOT that "core" for DF, could, vice versa, be duplicated and optimized in BI module (with future deletion from the core). 

In fewer words, try to rethink your plan in a modular fashion, "clean module will receive dataframe as an argument, do this and that, and will be set this and that way". 

2. My largest concern. For now, your proposal seems extremely ambitious. Besides coding, it will require testing, documenting, studying related libraries and integrating with them, documenting again, trying different approaches and so on... And for one person, it seems to really easy to be late on any stage, which endangers all following stages. The end result may be a lot of good and useful code (or a lot of demos and experimental code), but no particular deliverables that are easy to use and maintain for others. I can suggest considering two options:

a) simplifying the proposal (for example, concentrate on reading different log formats into daru dataframes + visualizing the standard daru summaries; or, vice versa, just data cleaning, with simplest Rails log reader as a source of data); or
b) plan work in a "circular" manner, like: stage1: add Rails standard logs reader, try some visualizations, commit; stage2: more log formats, some data cleaning, commit; stage3: a bit of BI, a bit more visualizations, commit; and so on. This way, instead of monolithic "fail-or-succeed" plan you can have more flexible "useful at each stage" steps. 

The end result could be a bit less expressive than planned, yet it could be still solid, useful and demonstratable. And either of approaches allows extending scope a bit if you'll find everything done and a lot of time left.

An interesting perspective is to think this way: what is the minimum change to Daru ecosystem for Rails log analysis? (I believe, just IO module, as we already have some grouping/aggregating and some visualization) What would be next minimum useful step and its goal? And the next?

Note also that typical GSoC plan includes some "buffer" weeks (that are deliberately left for fixing bugs, documenting and facing unexpected problems), especially before the phase ends. Tight schedules can look shiny at the planning phase, but when something unexpected happens (and it will happen), they do not have enough flexibility and tend to break completely.

Hope that sounds reasonable!

V.

Rohit Ner

unread,
Mar 21, 2018, 4:55:57 AM3/21/18
to SciRuby Development
Hi Victor!

Thanks for pointing out the importance of adding auxiliary functionalities separate from the core.

I really like the idea of adding things in a circular manner. This approach also provide time to work on IO of log formats along with metric visualization.
Once substantial amount of results are achieved in the two areas, focus can be shifted in later phases to adding BI simultaneously with cleaning library.

I would format my proposal in this way keeping in mind the necessary points and look forward for discussing the details.
Reply all
Reply to author
Forward
0 new messages