GSoC Project [Categorical Data Support] [Lokesh Sharma -> Sameer Deshmukh/Alexej Gossman]

57 views
Skip to first unread message

Lokesh Sharma

unread,
Apr 23, 2016, 9:09:03 AM4/23/16
to SciRuby Development
Hello everyone.

This summer will be a pleasure working with you all. Thank you SciRuby for giving me this opportunity.

About the project:
My project is about adding Categorical data support for Daru, Statsample and Statsample-glm to make it more powerful tool to extract insight from Data. Here is a summary of my project.

About me:
I'm second year undergraduate student doing Bachelors in Computer Science & Engineering at NIT-Hamirpur. I've devoted most of my time in machine learning, AI, algorithms and data analysis. From some time now I'm enthralled by vast scope of Software as a Service apps and cloud computing. This lead me to learn Rails and I came to know about Ruby and finally fell in love with SciRuby. I hope my work in SciRuby would make this beautiful language available in much larger areas like Data Science. I will make sure that I deliver my best. I am very excited and eager to learn a lot from everybody. Let's get started!

Alexej Gossmann

unread,
Apr 29, 2016, 2:35:47 AM4/29/16
to sciru...@googlegroups.com, Sameer Deshmukh
Hi Lokesh and Sameer,

I made an example of logistic regression with categorical data on a real life dataset (animal shelter data from kaggle.com) using daru and statsample-glm:


Please take a look, I have pointed out some problems in the notebook. The code is relatively long for how little it does, but I hope that the work done this GSoC will reduce it to just a few lines. We can improve this data analysis as daru gains more categorical data capabilities.

Best,
Alexej


--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sameer Deshmukh

unread,
Apr 29, 2016, 2:53:01 PM4/29/16
to SciRuby Development, sameer.d...@gmail.com
Lokesh,

As part of the community bonding period, you can try optimizing the code in this notebook and use that as a starting point to design the API that you will eventually implement.

Alexej Gossmann

unread,
Apr 29, 2016, 3:08:40 PM4/29/16
to sciru...@googlegroups.com

Yes, I think one of the goals of this project is to make this type of data analyses short, easy and intuitive.

Alexej

Lokesh Sharma

unread,
Apr 30, 2016, 2:44:53 PM4/30/16
to SciRuby Development
Great! This would really help me confidently come up with a good API design. I've suggested some edits here. Please have a look.

Alexej Gossmann

unread,
May 1, 2016, 1:14:12 PM5/1/16
to sciru...@googlegroups.com
Thanks Lokesh! I have incorporated your suggestions and other improvements into the notebook. I'm amazed how much daru has improved in the last few weeks (compared to the released gem)! I will improve this data analysis some more, but first I will address some problems that I experience with statsample-glm.

Best,
Alexej

2016-04-30 13:44 GMT-05:00 Lokesh Sharma <lokeshh...@gmail.com>:
Great! This would really help me confidently come up with a good API design. I've suggested some edits here. Please have a look.

--
Reply all
Reply to author
Forward
0 new messages