GSoC mentor application

50 views
Skip to first unread message

Victor Shepelev

unread,
Jan 29, 2016, 6:18:53 PM1/29/16
to sciru...@googlegroups.com
Hello all.

I'm writing here after kind invitation of Sameer Deshmukh, author of DaRu.

He thinks I could be a mentor for this year, and I'd be glad to, but don't know if SciRuby community will accept me.

About me: 33yr, Kharkiv, Ukraine. Programming since childhood, commercial programming last 13 yrs, Ruby programmer last ~10 yrs. Author of several Ruby gems (lately several of them have gained some moderate popularity).

Also I have some experience with teaching/mentoring Ruby, both online and offline.

Why do I think I can be acceptable as a mentor:
I always was fond of open source, and of Ruby language, and always wanted to do things previously impossible in my favorite language, and make those things available to others.

Then, ~year ago I was fascinated by Stephen Wolfram's "Frontiers of Computational Thinking" post (http://blog.stephenwolfram.com/2015/03/frontiers-of-computational-thinking-a-sxsw-report/). And what I've thought "doing such a fascinating things in several lines of code" should be totally possible with Ruby. And here I am.

Since that moment, I've developed some kind of "personal plan" (or, rather, personal crusade) of libraries Ruby should have but currently have not. 
There are libraries:
* for real-life knowledge extraction from open data (so everyone can, like in Wolfram, query "list of countries sorted by GDP" in exactly one statement)
* for creative visualisations
* for symbolical computations
* for data processing
* for quick visual demonstration of data

Of course, then I did my homework and discovered that SciRuby (which existence I was aware beforehands) already implemented daru for data processing, IRuby for visual demonstrations, and working on symengine wrapper for symbolic computations. So, I've decided to moderate my ambitions and work on something useful and not intersecting with current SciRuby projects.

So, my current work is:
* reality gem (working on this by Ruby Association Grant, early stage, will be much more in a weeks): https://github.com/molybdenum-99/reality
* ...and underlying infoboxer -- hi-level Wikipedia client, quite uniq, I assume: https://github.com/molybdenum-99/infoboxer
* and some small visualisation gems: world cloud -- https://github.com/zverok/magic_cloud, and country graph -- https://github.com/zverok/worldize

So, I have some kind of "solid vision" of what useful could possibly be done in one small area of intersection of SciRuby & "everyday Ruby": make working with data fun, easy and useful for "the rest of Rubyists". From progress in this area both communities (not strictly intersecting) may benefit: scientific projects would be popular for "everyday tasks", and will have more casual contributors...

What kind of projects I'd like to mentor:
1. Making existing SciRuby projects more friendly for "everyday Rubyists", corresponding to their understanding and expectations
1a. careful interface refactoring (we already talked with Sameer about plans on DaRu) for corresponding to best modery Ruby practices and best kind of integration to "standard" libraries and solutions
1b. demo projects with large popularization potential: this kind of projects, I think, especially well suited for students: this is something you can do through summer-and-forget (which is anyways how students on summer do), and it will be useful for community
2. Some new (on adaptation/enchancement of existing) "creative visualisation" projects
2. Some data access projects (like my reality), but I'm not insisting on this :)

I don't want to sound arrogant, and I am, in fact, ready to any work I can be helpful with.

Thank you and sorry for long letter.

V.

Carlos Agarie

unread,
Feb 1, 2016, 11:55:14 AM2/1/16
to sciru...@googlegroups.com
Victor,

Welcome to SciRuby!

I'm pretty sure there are projects of the kind you described in item
1b that would be wonderful for the community. I also think your
personal plan is good and can help bring some new ideas to SciRuby's
GSoC page.

--
Carlos Agarie
Software Engineer
+55 11 97320-3878 | @carlos_agarie
> --
> You received this message because you are subscribed to the Google Groups
> "SciRuby Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sciruby-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Victor Shepelev

unread,
Feb 10, 2016, 6:57:50 PM2/10/16
to sciru...@googlegroups.com
After long and very helpful discussion with Sameer, here are my two (rather HUUUUGE) proposals:

I. Demo Projects
Resoning: as we all know, most of students participate in projects like GSoC only during project term, and drop them the day after final evaluation. So, this idea comes from: a) what interesting and useful student can do in ~2 months and b) what of this can be for SciRuby's benefit.

So, the idea is: require students to do some kind of "demo project", corresponding to this requirements:
0. It, of course, uses SciRuby projects for data loading, investigating and visualisations
1. It should be "interesting": using some popular data (movies, cars, recipes, comics books, space shuttles, Doctor Who heroes...)
2. It should be understandable: easy to explain "what we studied and what received", with some wow-factor of resulting data/charts
3. ...Yet it should be non-trivial (nobody should think "ha, I can repeat this study in mere minutes")
4. It should be readable from both inside SciRuby community and outside of it. This means: provide IRuby notebooks (with more scientific explanations) as well as blog posts/example scripts (with more popular explanation)
4a. It should also be extenable from inside and outside (it should be easy understandable, what can be done MORE on this project, with more insights and creative charts as a result)
5. (this is optional, but VERY useful for SciRuby) one of outcomes should be set of issues/disccussions/pull requests into used SciRuby projects and/or analysis of student's experience while applying them.

A few examples:
* take IMDB data (openly available: https://getsatisfaction.com/imdb/topics/api_bulk_data_access) and investigate actors carreers, correlation of lenght of actor carreer with popularity of the movie, show how for typical American actor genre preferences shift through carreer and so on
* take centennial earthquake catalog: http://earthquake.usgs.gov/data/centennial/ and intersect it with some WorldBank data indicators: http://data.worldbank.org/indicator -- and show which indicators are correlated and plot colourful visualisation of that indicators vs earthquakes
* repeat this study: http://evelinag.com/blog/2015/12-15-star-wars-social-network/index.html -- but for another set of movies :)

More examples can/should be brainstormed by community, but some kind of freedom should be left in examples description (like "investigate movies from IMDB or books from some dataset like http://www2.informatik.uni-freiburg.de/~cziegler/BX/").
Also, considering requirement (5), maybe examples should brainstormed the way which will accent on some of libraries (processing really large/complex datasets with DaRu, advanced statistics with statsample, creative visualisations with nyaplot/plotrb/gnuplot...).

II. Visualisation
Yet my proposal is slightly different: don't play "catch the leader" (python/matlab), but rather play "catch up with the [Ruby] community".
More concrete: here's current SciRuby's visualisation options:
* nyaplot: produces HTML+D3 (most used by SciRuby community currently?)
* Ruby Gnuplot: communicates with specialized command-line software, produces whatever you want.
* rubyvis: looks dead (?), produces SVG
* plotrb: looks dead (?), produces input data for D3

As far as I can understand, neither of those options can be (comfortably) used outside IRuby notebook -- and, therefore, Ruby community outside SciRuby is momentarily alienated.

Outside SciRuby, most of Rubyists use Gruff, which produces ("simple-but-good-for-presentations") static images, or they use some client-side solution like highcharts.

So, I can think of, in fact, two different projects, both of which may have interest for both communities (SciRuby and casual Ruby):
1. Prepare nyaplot to be used on sites, not only in notebooks/standalone
2. (Seems very imporant to me) Make some kind of RMagick-backed graphing solution, maybe with same API as nyaplot (or even as an "alt.backend" for nyaplot), tightly integrated into SciRuby ecosystem.

Especially task 2 seems "interesting" for student (hey! pictures! colors! visual culture!), reasonable amount of work AND opening path for further visualisation-related projects. (Several Magick::Image-s can be combined and interweaved, charts and real-life graphics can be combined, several small separate gems for concrete visualisation tasks can be developed independently.)

IRuby already capable of RMagick::Image visualisation, so student can work on moderate amount of tasks: just calculations/output of various kinds of graphical contents.

WDYT?

John Woods

unread,
Feb 11, 2016, 12:16:32 PM2/11/16
to sciru...@googlegroups.com
1. I'm somewhat worried about the first project because it sounds like a documentation project:

2. I've added the matplotlib proposal because I've been using Python for the first time in years, and though it's not the most organized code, *stuff just works the way you expect it to* (more or less). You plot something and it shows up on the screen. That convenience is worth a lot.

I think there's a place in our ecosystem for the currently existing projects, like nyaplot, plotrb, ruby-gnuplot, and rubyvis. But there are also some problems with these that make them less than robust. Since they are interfaces for languages (or for plotting programs), they add a ton of overhead in, which means you can't readily plot large datasets (I'd actually have to reboot my Macbook Air periodically). Also, it's really tough to communicate syntax errors in the underlying program or language back to the Ruby coder.

We could totally reinvent the wheel and write our own C library, and that may be what's necessary — but I'd prefer not. I learned firsthand how hard it was to get NMatrix working properly with all of the necessary external libraries. And GUI API is a special kind of hell.

Thoughts?

Alexej Gossmann

unread,
Feb 11, 2016, 12:47:25 PM2/11/16
to SciRuby Development
Hi!

I have a remark about project 1 (demo projects).
If the student is more academically inclined, then the final result of project 1 can be a publication in for example the Journal of Statistical Software (probably there is a more suitable journal, but I'm not very familiar with the journal landscape).
I think that it certainly would be nice to have a publication showing of all the different SciRuby tools to the scientific community. I'm not sure though, if that would be okay with the GSoC rules (as John has pointed out).

Best,
Alexej

Victor Shepelev

unread,
Feb 11, 2016, 12:51:17 PM2/11/16
to sciru...@googlegroups.com
1. Maybe I provided a bad explanation for an idea, but... No, as I see it, it is about code. In my head, student would do something like:
a) write scripts to investigate data
b) write scripts to visualise and present his findings
c) (possibly) extract some of his work as a small libraries/datasets (for ex., code + repo for "extract and prepare IMDB data")
d) (possibly) make enchancements to existing SciRuby libraries, when he needs them
e) ...and finally, write some reports/notebooks/blog posts -- as any student in GSoC does

But if you think it is still not suiting GSoC requirements/SciRuby goals, I'm totally ok with it :)

2. Yes, I totally understand your reasonings. Just in area I'm trying to "push" gently (more interaction between SciRuby and not-so-sci-Ruby communities), there's currently no option for visualisation which is handy and understandable for "casual" Rubyist. Maybe this gap can be closed by other means than that I've proposed.
Reply all
Reply to author
Forward
0 new messages