Re: [scala-language] [GSoC] Multidimensional arrays

David Hall

unread,

Feb 25, 2014, 1:21:56 PM2/25/14

to scala-l...@googlegroups.com, scala-...@googlegroups.com

On Tue, Feb 25, 2014 at 12:25 AM, Hubert Plociniczak <hubert.pl...@epfl.ch> wrote:

Hi,

On 02/24/2014 11:27 PM, Christopher Medrela wrote:

Hello!

My name is Christopher Medrela and I'd like to participate in GSoC working
on a
Scala project. Last year I was working at Django (my mentor was Russell
Keith-Magee) and I successfully finished my project (revamping validation
framework). This year I'm also applying to Django, but I'd like to try
something
else.

I'm fluent in Python and I have a lot of interest in Scala. I think that
Scala
skills wouldn't be the biggest problem but the lack of knowledge of Scala
frameworks and libraries. I have basic knowledge of numpy and pandas
libraries.
My English turned out to be good enough to discuss in real time via skype.

Therefore, I'd like to work on some components/library/framework independent
from others. Can be math-heavy. Can be mix of Python and Scala.

I think our preference is to stick to Scala.

I fished out two projects: "multidimensional arrays" and "visualization
library". In next days I will focus on the first one.

Do you think this is a good project? Maybe there are some better given my
skills? Or maybe this project is not worth much to Scala community and it
would
be better to focus on the another one?

I will let the authors of ideas to speak about that but....

The multidimensional array project requires pretty advanced Scala experience. Scala's a very different language from Python. Visualization would probably be a better match, depending on your experience with things like matplotlib. Let's follow up on the Breeze mailing list (+cc). Also see below.

The second issue is that I'm going to have an internship in the late
summer. I
will apply only for these ones which won't clash with GSoC. Therefore I'd
like
to start coding as early as possible. Is it possible to shift internal
dates so
I could start coding earlier (i.e. when the list of accepted students will
be
published -- that is 21 April)?

... that might be a problem. There are some strict rules/deadlines imposed on us by Google but depending on the situation maybe something could be done. I will leave the decision to Tobias but we had some bad experience with students trying to combine two things at the same time so I think we will hesitate to accept students who have another job that overlaps with GSoC. It's good that you are asking now rather than letting us know later though (the latter did happen in the past and is definitely not cool).

I also share Hubert's concerns about your timeline. The EPFL guys have more experience with this than me.

-- David

hubert

--
You received this message because you are subscribed to the Google Groups "scala-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-language+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Christopher Medrela

unread,

Feb 25, 2014, 4:22:16 PM2/25/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

Thank you for your feedback!

OK, I will have a look at this project. I forget to add that I know also Java,

so Scala is not a completely new thing for me.

The second issue is that I'm going to have an internship in the late
summer. I
will apply only for these ones which won't clash with GSoC. Therefore I'd
like
to start coding as early as possible. Is it possible to shift internal
dates so
I could start coding earlier (i.e. when the list of accepted students will
be
published -- that is 21 April)?

... that might be a problem. There are some strict rules/deadlines imposed on us by Google but depending on the situation maybe something could be done. I will leave the decision to Tobias but we had some bad experience with students trying to combine two things at the same time so I think we will hesitate to accept students who have another job that overlaps with GSoC. It's good that you are asking now rather than letting us know later though (the latter did happen in the past and is definitely not cool).

I'm going to apply for these internships which won't clash with GSoC. That

means, that I'm *not* going to do two things at the same time. I won't have any

job/holiday during GSoC (except for classes, of course).

I also share Hubert's concerns about your timeline. The EPFL guys have more experience with this than me.

I've asked Carol if Google is against internally shifting rules. [1] She said:

"We don't police what you deliver to your org and when, simply that you meet the

milestones of the program as laid out." Since I'd like to start earlier, not

later, deadlines and milestones of the program are not a problem.

[1] https://groups.google.com/forum/#!topic/google-summer-of-code-discuss/phFlDN34KKA

David Hall

unread,

Feb 25, 2014, 4:44:58 PM2/25/14

to chris....@gmail.com, scala-...@googlegroups.com

-scala-language

Ok, that is good. I'm also pretty open to new projects ideas. If you, e.g., wanted to bring pandas like functionality in (basically incorporating/replacing the Saddle library), I would be on board with that too.

I want to be sure that people can do the projects they're proposing, but otherwise I'm flexible.

-- David

The second issue is that I'm going to have an internship in the late
summer. I
will apply only for these ones which won't clash with GSoC. Therefore I'd
like
to start coding as early as possible. Is it possible to shift internal
dates so
I could start coding earlier (i.e. when the list of accepted students will
be
published -- that is 21 April)?

... that might be a problem. There are some strict rules/deadlines imposed on us by Google but depending on the situation maybe something could be done. I will leave the decision to Tobias but we had some bad experience with students trying to combine two things at the same time so I think we will hesitate to accept students who have another job that overlaps with GSoC. It's good that you are asking now rather than letting us know later though (the latter did happen in the past and is definitely not cool).

I'm going to apply for these internships which won't clash with GSoC. That
means, that I'm *not* going to do two things at the same time. I won't have any

job/holiday during GSoC (except for classes, of course).

I also share Hubert's concerns about your timeline. The EPFL guys have more experience with this than me.

I've asked Carol if Google is against internally shifting rules. [1] She said:
"We don't police what you deliver to your org and when, simply that you meet the
milestones of the program as laid out." Since I'd like to start earlier, not

later, deadlines and milestones of the program are not a problem.

[1] https://groups.google.com/forum/#!topic/google-summer-of-code-discuss/phFlDN34KKA

--

You received this message because you are subscribed to the Google Groups "scala-language" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.

Christopher Medrela

unread,

Feb 27, 2014, 4:30:34 PM2/27/14

to scala-...@googlegroups.com, chris....@gmail.com, dl...@cs.berkeley.edu

To be honest, I'm completely new to breeze and I don't know which

ideas/projects are worth much and which doesn't introduce much value as well

as which projects are easy and which are hard. So in this issue I have to rely

on you.

I'm trying to find a point to start from. Unfortunately, there is no ticket

tracker, no todo list and the mailing lists doesn't say much. The only one

thing I've found is this survey [1]. What does the results of the survey say?

Is there any clue what people expect from breeze-viz?

[1] https://groups.google.com/forum/#!searchin/scala-breeze/viz/scala-breeze/7mpuCJ5zWdA/3l8V9L6JOjYJ

I've heard [2] that ktakagaki (Kenta) can help with breeze-viz and that he has

some ideas. Kenta, can you comment on my post and share your ideas?

[2] https://groups.google.com/d/msg/scala-breeze/Z_2_WIwpjdI/kP5_Cqqgta0J

My plan is to mimic (more or less) the architecture and API of existing

libraries like matplotlib, because IMO it's better not to reinvent the wheel

and make the same mistakes again. Of course, where it will be possible, I will

try to use the power of Scala by i.e. using operator overloading instead of

ordinary method names.

I'm completely new to Breeze and I'm not as good at Scala as at Python, so I

propose to adopt the following strategy:

At the very beginning I will work only at writing a draft of breeze-viz

documentation. Writing documentation will force me to read code accurately and

to understand how everything works as well as to predict risks and dangers.

This shouldn't take too long.

Then, I will start to do small improvements in breeze-viz. After this startup

I will refactor code (if necessary), make bigger changes and introduce

essential features like.

I'm aware this is pretty vague, I will post precise propositions of

improvements in the weekend.

David Hall

unread,

Feb 28, 2014, 4:01:34 AM2/28/14

to scala-...@googlegroups.com, Christopher Medrela

That's fair. I want to find something that interests you, to be sure. Things that I would like to have happen in the near and/or long term, in no particular order:

0) A good interactive-ish visualization library (that is, can pop up a window, not just generate graphics)

1) GPUs (I'm starting to work on this already)

2) NumPy parity: Besides ndarrays, this is basically just fleshing out a few functions.

3) Something like Pandas (/ annexing Saddle)

4) pretty much anything in SciPy

5) Integrating algebraic hierarchy from Spire or Algebird

6) Symbolic math

I'm trying to find a point to start from. Unfortunately, there is no ticket
tracker, no todo list and the mailing lists doesn't say much. The only one
thing I've found is this survey [1]. What does the results of the survey say?

Is there any clue what people expect from breeze-viz?

[1] https://groups.google.com/forum/#!searchin/scala-breeze/viz/scala-breeze/7mpuCJ5zWdA/3l8V9L6JOjYJ

Breeze-Viz is pretty dormant, so you'd basically be helping to kickstart it back into life. The survey results weren't terribly informative, except everyone wanted more documentation for everything.

I've heard [2] that ktakagaki (Kenta) can help with breeze-viz and that he has
some ideas. Kenta, can you comment on my post and share your ideas?

[2] https://groups.google.com/d/msg/scala-breeze/Z_2_WIwpjdI/kP5_Cqqgta0J

Martin Senne also has ideas.

My plan is to mimic (more or less) the architecture and API of existing
libraries like matplotlib, because IMO it's better not to reinvent the wheel
and make the same mistakes again. Of course, where it will be possible, I will

try to use the power of Scala by i.e. using operator overloading instead of
ordinary method names.

I thinks is mostly right. My biggest pet-peeve with those APIs (matplotlib, matlab) is that they maintain opaque global state with regard to the "current plot." I'd rather everything be a method on a plot object, which is how it's done right now in Breeze-Viz. Otherwise, these APIs are clearly successful and I agree it's worth modeling on them.

I'm completely new to Breeze and I'm not as good at Scala as at Python, so I

propose to adopt the following strategy:

At the very beginning I will work only at writing a draft of breeze-viz
documentation. Writing documentation will force me to read code accurately and

to understand how everything works as well as to predict risks and dangers.
This shouldn't take too long.

Then, I will start to do small improvements in breeze-viz. After this startup

I will refactor code (if necessary), make bigger changes and introduce
essential features like.

I'm aware this is pretty vague, I will post precise propositions of
improvements in the weekend.

I think this is a good plan. I don't know that it's worth writing thorough documentation of everything, but getting a good understanding of what's there is clearly a good idea. (I assume you've seen the woefully short Breeze-Viz section at https://github.com/scalanlp/breeze/wiki/Quickstart)

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/eeeb5872-8396-4987-ad7d-ecb4d4ecb8c4%40googlegroups.com.

ktakagaki

unread,

Mar 1, 2014, 6:33:29 AM3/1/14

to scala-...@googlegroups.com, chris....@gmail.com, dl...@cs.berkeley.edu

Hi,

I can share a few of my thoughts regarding visualization, as requested.

First, to give you a caveat of where I come from, I am a neuroscientist, with no formal training in math or computer programming beyond basic university classes. I have been with various languages since I was a kid on a TRS-80/4, but the only "real" language I have written signficant stuff for real uses is in Java (and now a bit of Scala). I do have significant experience in scientific languages, my main languages being MatLab and Mathematica. Ever since I started in the early 90's, I have tilted strongly towards Mathematica especially for graphics output (for relatively obvious reasons), so anything I write here about graphics is probably pretty biased. MatLab used to be far faster for crunching, and I used to use the two M's simultaneously, but these days Mathematica is actually significantly faster for most things I do (with superb parallelization/distribution baked in), and my heavy stuff is in Java/Scala anyway, so I only touch MatLab these days when a student needs help or I need to use someone's toolbox. I also used Igor Pro to some extent at work (that's another language people used to use for large data/graphics/array heavy stuff). I have explored python/scipy/matplotlib a while back, before I decided to go forward with Scala/breeze. And I tried a Coursera course in R, the graphics were pretty good, but there were just too many quirks in the language at the outset for me to want to pursue (and I'm not a statistician), so I dropped the course.

So anyway, here are my visualization thoughts, mainly from the bio-scientific user POV. The idea "reinvents" the wheel, as you put it, and would take longer to get off the ground. However, that tradeoff (I think) would allow for much easier expansion in the future as the package matures (see Goal 3 below), and allow us to go far beyond matplotlib/MatLab.

Goal 1 (for me)... is to make clean, publishable graphs (i.e. ideally PDF/PS output), in an interactive way.

That is where MatLab falls on its face. The default output is 80's-looking and customizability is poor, so most people I know resort to touching up vector output with Illustrator, etc., before submitting figures.

matplotlib looks much more modern, although I haven't published with it myself.

Goal 2... to make layered/tiled graphics with multiple elements

For real life use as a scientist, simple line graphs and bar charts are often not enough. See [Wilkinson](http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html), [ggplot examples](https://www.google.de/search?q=ggplot2+examples), [Mathematica examples](http://reference.wolfram.com/mathematica/guide/GraphicsOptionsAndStyling.html)

In terms of layering and composing graphs, it is important to choose the coordinate system very carefully, and be in control of this. The MatLab axis concept causes severe headaches after the 2nd or 3rd graph element, with a lot of hand coding of layouts. It does seem that the matplot lib people have improved the situation, albeit slightly (http://matplotlib.org/gallery)

Goal 3... to make custom graphics and plots

This is where (if you share this goal), it may be particularly unwise to stick to JFreeChart as a backend (i.e. breeze.viz). (Furthermore, JFreeChart is swing-based, but I think java is moving towards JavaFX.)

In order to make custom plots, I think one needs a more systematic representation of primitives, (eg)[http://reference.wolfram.com/mathematica/howto/CombineTwoOrMoreGraphics.html]

If I were forced to do this myself today (gun to my head) without further discussion, I would turn to ScalaFX/JavaFX to create our own graphics primitive/plot hierarchy from the ground up, modelled around (Mathematica graphics/plots)[https://reference.wolfram.com/mathematica/tutorial/TheStructureOfGraphics.html], but with a more OOP and 21st century flair. A big advantage of this approach is that it can be expanded a lot in the future----3D plots and animations (JavaFX), which are becoming pretty bread and butter in my field, plotting functions (not data, eg plot( sin(_) ); I think breeze.viz already does this), and (dynamically interactive plots)[https://reference.wolfram.com/mathematica/guide/InteractiveManipulation.html], which I find very useful to quickly scan through large datasets.

Goal 4... To have sane default values, but to be able to specify every option of the graph in detail.

Matlab (and matplotlib) make extensive use of "nargin"-type parameters, many of which are text string values. Since Scala is a typed language, our only alternative to follow this syntax is to pass text strings exclusively and parse them at runtime, or to limit parameters to a single type, which is very often unreasonable.

Based on discussions in the past in the breeze group (https://groups.google.com/forum/#!topic/scala-breeze/o7A49ZYP1kg, https://github.com/scalanlp/breeze/pull/115, https://groups.google.com/forum/#!topic/scala-breeze/IcZxSOq6Fr8), I have chosen (case class/case object based options)[https://github.com/scalanlp/breeze/blob/master/src/main/scala/breeze/signal/options.scala] for a similar problem in the breeze.signal package. This has the benefit that you can pass options that encapsulate different value types, for example, OptWIndow.Automatic, OptWindow.Hamming(a, b), .... The objects are also compiled, so no parsing at runtime.

I have a feeling this would also work well for graphics options (OptColor.Black, OptColor.Hue(h, s, b, alpha), OptColor.RGB(r, g, b), OptColor.ColorMap( ColorMapHeat ), OptColor.Automatic, ....). You could also pass object options after display, to actively modify existing graphics.

Goal 5... slightly unrelated, but to have an iPython notebook/Mathematica notebook type REPL interface, which records commands, output and graphics output. See (https://github.com/Bridgewater/scala-notebook)

...so these are my thoughts so far, I'm interested to hear what you think, and what other non-Mathematica-partisans think.

This is just my 1 cent(?) and I am always enjoying the learning experience which working on breeze has given me so far.

Studying breeze.viz carefully and adding docs definitely sounds like a great start, regardless of what direction you take.

If you decide to go in the approximate direction sketched out above, a reasonable goal for the summer might be to design the basic object/options hierarchy, and to get to a line graph and a bar graph/histogram.

I'm sure you will get much further during the summer if you build more on previous stuff.

I am available to discuss graphing through the summer, if needed.

Cheers,

Kenta

ktakagaki

unread,

Mar 1, 2014, 6:40:40 AM3/1/14

to scala-...@googlegroups.com, Christopher Medrela, dl...@cs.berkeley.edu

Just saw David's email... I totally agree that the "current plot/axis" concept is quite undesireable, but I guess that's already a moot point.

I wonder what David and you think about the problem (as I see it) of option specification for graphics, vis-a-vis parsing, etc.

Kenta

David Hall

unread,

Mar 2, 2014, 12:05:24 AM3/2/14

to scala-...@googlegroups.com, Christopher Medrela

On Sat, Mar 1, 2014 at 3:33 AM, ktakagaki <kentaroh...@gmail.com> wrote:

So anyway, here are my visualization thoughts, mainly from the bio-scientific user POV. The idea "reinvents" the wheel, as you put it, and would take longer to get off the ground. However, that tradeoff (I think) would allow for much easier expansion in the future as the package matures (see Goal 3 below), and allow us to go far beyond matplotlib/MatLab.

Goal 1 (for me)... is to make clean, publishable graphs (i.e. ideally PDF/PS output), in an interactive way.

That is where MatLab falls on its face. The default output is 80's-looking and customizability is poor, so most people I know resort to touching up vector output with Illustrator, etc., before submitting figures.

matplotlib looks much more modern, although I haven't published with it myself.

I agree that looking nice is important. matplotlib looks the best of the major packages, from what I've used. (I've not used Mathematica.) R, despite being GoG based, usually looks pretty bad (though, honestly, the examples in Wilkinson's book look well thought out, but not actually "pretty.") D3 obviously does a great job in being pretty, and it's GoG, so...

Regardless, I think getting the right API (or at least, the right structure) is the key thing...

Goal 2... to make layered/tiled graphics with multiple elements

For real life use as a scientist, simple line graphs and bar charts are often not enough. See [Wilkinson](http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html), [ggplot examples](https://www.google.de/search?q=ggplot2+examples), [Mathematica examples](http://reference.wolfram.com/mathematica/guide/GraphicsOptionsAndStyling.html)

In terms of layering and composing graphs, it is important to choose the coordinate system very carefully, and be in control of this. The MatLab axis concept causes severe headaches after the 2nd or 3rd graph element, with a lot of hand coding of layouts. It does seem that the matplot lib people have improved the situation, albeit slightly (http://matplotlib.org/gallery)

I agree GoG is clearly the best basis we have right now.

Goal 3... to make custom graphics and plots

This is where (if you share this goal), it may be particularly unwise to stick to JFreeChart as a backend (i.e. breeze.viz). (Furthermore, JFreeChart is swing-based, but I think java is moving towards JavaFX.)

In order to make custom plots, I think one needs a more systematic representation of primitives, (eg)[http://reference.wolfram.com/mathematica/howto/CombineTwoOrMoreGraphics.html]

If I were forced to do this myself today (gun to my head) without further discussion, I would turn to ScalaFX/JavaFX to create our own graphics primitive/plot hierarchy from the ground up, modelled around (Mathematica graphics/plots)[https://reference.wolfram.com/mathematica/tutorial/TheStructureOfGraphics.html], but with a more OOP and 21st century flair. A big advantage of this approach is that it can be expanded a lot in the future----3D plots and animations (JavaFX), which are becoming pretty bread and butter in my field, plotting functions (not data, eg plot( sin(_) ); I think breeze.viz already does this), and (dynamically interactive plots)[https://reference.wolfram.com/mathematica/guide/InteractiveManipulation.html], which I find very useful to quickly scan through large datasets.

Can't really say anything here.

Goal 4... To have sane default values, but to be able to specify every option of the graph in detail.

Matlab (and matplotlib) make extensive use of "nargin"-type parameters, many of which are text string values. Since Scala is a typed language, our only alternative to follow this syntax is to pass text strings exclusively and parse them at runtime, or to limit parameters to a single type, which is very often unreasonable.

Based on discussions in the past in the breeze group (https://groups.google.com/forum/#!topic/scala-breeze/o7A49ZYP1kg, https://github.com/scalanlp/breeze/pull/115, https://groups.google.com/forum/#!topic/scala-breeze/IcZxSOq6Fr8), I have chosen (case class/case object based options)[https://github.com/scalanlp/breeze/blob/master/src/main/scala/breeze/signal/options.scala] for a similar problem in the breeze.signal package. This has the benefit that you can pass options that encapsulate different value types, for example, OptWIndow.Automatic, OptWindow.Hamming(a, b), .... The objects are also compiled, so no parsing at runtime.

I have a feeling this would also work well for graphics options (OptColor.Black, OptColor.Hue(h, s, b, alpha), OptColor.RGB(r, g, b), OptColor.ColorMap( ColorMapHeat ), OptColor.Automatic, ....). You could also pass object options after display, to actively modify existing graphics.

I agree on the options. Strings are for other languages. (Don't get me started on the 111 thing in matplotlib and matlab.)

Goal 5... slightly unrelated, but to have an iPython notebook/Mathematica notebook type REPL interface, which records commands, output and graphics output. See (https://github.com/Bridgewater/scala-notebook)

There's also https://github.com/mattpap/IScala . I agree it would be nice to have something that we knew worked well.

--

You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/c7d4a16a-8a6c-4ddc-ad03-1ef131aea5c5%40googlegroups.com.

Kentaroh Takagaki

unread,

Mar 2, 2014, 2:50:53 AM3/2/14

to scala-breeze, Christopher Medrela

I forgot about D3... a good case study for you for thinking about API (re)design might be to go through some of these graphs, or the GoG plots, and think about how to implement them using an matplotlib-style API... I think things like the following become pretty problematic:

http://www.jasondavies.com/tree-of-life/

http://bl.ocks.org/mbostock/4060606

Regarding layering and tiling plots/custom graphics, let me provide some concrete examples of where I think that the object-oriented way graphics are set up in Mathematica provide some powerful paradigm hints. For instance, for a histogram with line superimposed:

http://matplotlib.org/mpl_examples/statistics/histogram_demo_features.hires.png

in matplotlib, you specify the line as an input to plot:

# the histogram of the data

n, bins, patches = matplotlib.pyplot.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line

y = matplotlib.mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

In Mathematica, you would create two Graphics objects, and simply combine them. Something like:

grHistogram = Histogram[ {data}, /plot options, axes styles, etc/];

grSmoothHistogram = SmoothHistogram[ {data}, /plot options, axes styles, etc/];

grFinal = Show[ grHistogram, grSmoothHistogram, /further plot options/]

Where the plot options between grHistogram and grSmoothHistogram diverge for common things such as the axes styles and plot title, the options from the first Graphics object (grHistogram) are in effect. The final output, of course, is also a Graphics object, and can be further superimposed by using this Show[] command, which creates yet another new graphic.

grFinal = Show[ grFinal, Graphics[Text["Hello World!", /position specifications, etc/ ] ]

In order to stack the two graphs in a row, in MatLab style, you would use the subplot command, which makes it very easy to make standard subplots, but breaks down pretty quickly for complex things.

In Mathematica, you would make a new Graphics object, using something like:

grStacked = GraphicsColumn[ grHistogram, grSmoothHistogram, /spacing options, etc/ ]

What GraphicsColumn actually does is to transform(scale, translate) the coordinate space of the two Graphics Objects, and to combine them into a new Graphics object. Given that it is object-oriented in this way, it is trivial to stack or tile different graphs in whatever way you want. It also scales easily to a map object, allows you to put histograms or pie charts or flow plots on top.

grRussiaMap = CountryData["Russia", {"Shape", "Mollweide"}]

CountryData["World", {"Shape", "Mollweide"}]

Kenta Takagaki (高垣堅太郎)

kentaroh...@gmail.com

http://www.kentarohtakagaki.org/

--
You received this message because you are subscribed to a topic in the Google Groups "Scala Breeze" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-breeze/GcrhfKJHUEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-breeze...@googlegroups.com.

To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/CALW2ey3hCBk2JOB90vpYLs1qO3Xm1kRTtUYkhEeF_jfoAm4XcA%40mail.gmail.com.

Kentaroh Takagaki

unread,

Mar 2, 2014, 3:07:06 AM3/2/14

to scala-breeze

...sorry, premature send...

I forgot about D3... a good case study for you for thinking about API (re)design might be to go through some of these graphs, or the GoG plots, and think about how to implement them using an matplotlib-style API... I think things like the following become pretty problematic:

http://www.jasondavies.com/tree-of-life/

http://bl.ocks.org/mbostock/4060606

Regarding layering and tiling plots/custom graphics, let me provide some concrete examples of where I think that the object-oriented way graphics are set up in Mathematica provide some powerful paradigm hints. For instance, for a histogram with line superimposed:

http://matplotlib.org/mpl_examples/statistics/histogram_demo_features.hires.png

in matplotlib, you specify the line as an input to plot:

# the histogram of the data

n, bins, patches = matplotlib.pyplot.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line

y = matplotlib.mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')

plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

In Mathematica, you would create two Graphics objects, and simply combine them. Something like:

grHistogram = Histogram[ {data}, /plot options, axes styles, etc/];

grSmoothHistogram = SmoothHistogram[ {data}, /plot options, axes styles, etc/];

grFinal = Show[ grHistogram, grSmoothHistogram, /further plot options/]

Where the plot options between grHistogram and grSmoothHistogram diverge for common things such as the axes styles and plot title, the options from the first Graphics object (grHistogram) are in effect. The final output, of course, is also a Graphics object, and can be further superimposed by using this Show[] command, which creates yet another new graphic.

grFinal = Show[ grFinal, Graphics[Text["Hello World!", /position specifications, etc/ ] ]

In order to stack the two graphs in a row, in MatLab style, you would use the subplot command, which makes it very easy to make standard subplots, but breaks down pretty quickly for complex things.

In Mathematica, you would make a new Graphics object which encapsulates the two objects, using something like:

grStacked = GraphicsColumn[ grHistogram, grSmoothHistogram, /spacing options, etc/ ]

What GraphicsColumn actually does is to transform(scale, translate) the coordinate space of the two Graphics Objects, and to combine them into a new Graphics object. Given that it is object-oriented in this way, it is trivial to stack or tile different graphs in whatever way you want. It also scales easily to a map object, and allows you to put histograms or pie charts or flow plots on top, etc, etc.

grRussiaMap = CountryData["Russia", {"Shape", "Mollweide"}];

grMedalCount = PieChart[ xxxx ];

Show[ grRussiaMap, Translate[ Scale[grMedalCount, xxx] , CityData["Sochi", "Coordinates"] ]

What to take from all this? I think the main message is perhaps that one should consider clearly splitting the graphics object itself from the actual display of the object, so that the graphics objects can be manipulated, combined, transformed, etc. That makes some of the more complex graphics in D3 or GoG pretty easy.

==================

Re the options case classes/objects, I just want to make a clarification for you, just in case you are like me and the implicit transformations are not quite second nature yet... So the following actual syntax:

plot(xxxx, ...., lineThickness = OptLineThickness.DoubleValue( 5d ), ... )

plot(xxxx, ...., lineDashing = OptLineStyle.Dashing( 5d, 2d ), ... )

...etc...

can be made much nicer for end users by providing implicit conversion (eg. for Double/Seq[Double]), to allow:

plot(xxxx, ...., lineThickness = 5d, ... )

plot(xxxx, ...., lineDashing = List(5d, 2d), ... )

...etc...

This allows the appearance of dynamic typing while still resolving everything at compile time.

Kenta

Kenta Takagaki (高垣堅太郎)

kentaroh...@gmail.com

http://www.kentarohtakagaki.org/

On Sun, Mar 2, 2014 at 6:05 AM, David Hall <dl...@cs.berkeley.edu> wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "Scala Breeze" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-breeze/GcrhfKJHUEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-breeze...@googlegroups.com.

To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/CALW2ey3hCBk2JOB90vpYLs1qO3Xm1kRTtUYkhEeF_jfoAm4XcA%40mail.gmail.com.

Christopher Medrela

unread,

Mar 3, 2014, 3:44:44 PM3/3/14

to scala-...@googlegroups.com, Christopher Medrela, dl...@cs.berkeley.edu

I filtered out projects no 1, 2, 3 and 5, because they are hard or require

really good knowledge of Scala and its type system or require good knowledge

of some libraries (i.e. Saddle) which I lack.

So I fished out projects (0) breeze-viz, (4) anything from SciPy and (6)

symbolic math. I find all three projects suitable for me, because they require

superb comprehension of neither breeze internals nor Scala type systems as

well as because these project are about building a new layer based on other

layers, which is much easier than tampering with existing layers.

For me, the most interesting are the last two (SciPy and symbolic math), so I

will drop the idea of breeze-viz.

I think that introducing features from SciPy would be better project than

Computer Algebra System. There are some existing Computer Algebra Systems

written in Java, i.e. Java Algebra System [1] and SymJA [2]. Writing a new one

is not a productive use of time. We could use expressiveness of Scala to

create a better interface for these libraries (i.e. by using operator

overloading, implicits and so on) and nothing more. Could it be integrated

with Breeze? IMO at this moment breeze is mostly about linear algebra and

there is no connection between linear algebra and CAS, i.e. vectors and

matrixes couldn't be reused in CAS. Therefore, a new CAS may as well be a

separate project but there is no reason to reinvent a wheel.

[1] http://krum.rz.uni-mannheim.de/jas/

[2] http://code.google.com/p/symja/

On the other side, integrating existing libraries isn't a good project for me

since it requires good comprehension of the libraries, which I lack.

Therefore, I will focus on introducing SciPy features. There are many SciPy

modules: optimization, interpolation, Fourier transform, clustering and so on.

Each module usually introduces a few algorithms sharing the same contract,

i.e. `scipy.interpolate` provides `barycentric_interpolate` and

`krogh_interpolate` (and many other univariate interpolators). My proposal is

to implement only a few algorithms for each module and to write good

documentation that will include a chapter how to write your own algorithms.

That way, we could implement more modules and lower the barrier to breeze

contribution. No module would be complete, but I find it more beneficial in

the long term rather than focusing on small amount of modules, because

lowering the barrier will attract more contributors that will implement other

algorithms.

At the beginning I will focus on writing documentation for existing modules

without enhancing them. This could be quite beneficial since everybody wants

documentation. This is also a chance for me to get into breeze internals as

well as to better understand Scala and proper use of its features.

After this setup, I will implement the new modules. There are many modules

inside SciPy. Which features should attract more attention?

1) clustering

2) integration

3) interpolation

4) signal processing: B-splines, filtering and so on

5) graph routines and data structures

6) enhancing optimization

7) enhancing statistical functions

IMO (2) integration and (3) interpolation are the most important, but I would

like to know your opinion.

We will start with interpolation (if this is a desired feature) since it's the

easiest topic for me (last term I attended a subject that treated about

interpolation among other numerical algorithms).

I will implement linear and spline interpolator as well as design an interface

of all univariate interpolators. Tests and documentation also will be written.

BTW, I will use test-driven or documentation-driven methodology, so that I

will start from writing tests or documentation. Then, I will publish

tests/docs so you could make an opinion and give me feedback about API before

implementation.

After that, integration will get attention. Again, I will implement only one

method of integration (let it be integration of univariate function using

trapezoid method) and provide rich documentation.

Before the coding period starts, I could implement one small module (i.e.

linear univariate interpolation) as a proof that I know Scala good enough to

manage this project and to get started with tools I will use during the GSoC

(I mean sbt and markdown).

David Hall

unread,

Mar 3, 2014, 5:26:58 PM3/3/14

to scala-...@googlegroups.com, Christopher Medrela

On Mon, Mar 3, 2014 at 12:44 PM, Christopher Medrela <chris....@gmail.com> wrote:

On Friday, February 28, 2014 10:01:34 AM UTC+1, David Hall wrote:

To be honest, I'm completely new to breeze and I don't know which
ideas/projects are worth much and which doesn't introduce much value as well
as which projects are easy and which are hard. So in this issue I have to rely

on you.

That's fair. I want to find something that interests you, to be sure. Things that I would like to have happen in the near and/or long term, in no particular order:

0) A good interactive-ish visualization library (that is, can pop up a window, not just generate graphics)
1) GPUs (I'm starting to work on this already)
2) NumPy parity: Besides ndarrays, this is basically just fleshing out a few functions.

3) Something like Pandas (/ annexing Saddle)
4) pretty much anything in SciPy
5) Integrating algebraic hierarchy from Spire or Algebird
6) Symbolic math

I filtered out projects no 1, 2, 3 and 5, because they are hard or require
really good knowledge of Scala and its type system or require good knowledge

of some libraries (i.e. Saddle) which I lack.

So I fished out projects (0) breeze-viz, (4) anything from SciPy and (6)
symbolic math. I find all three projects suitable for me, because they require
superb comprehension of neither breeze internals nor Scala type systems as

well as because these project are about building a new layer based on other
layers, which is much easier than tampering with existing layers.

Fair enough

Perfect.

At the beginning I will focus on writing documentation for existing modules
without enhancing them. This could be quite beneficial since everybody wants
documentation. This is also a chance for me to get into breeze internals as

well as to better understand Scala and proper use of its features.

After this setup, I will implement the new modules. There are many modules
inside SciPy. Which features should attract more attention?

1) clustering
2) integration
3) interpolation
4) signal processing: B-splines, filtering and so on
5) graph routines and data structures
6) enhancing optimization

7) enhancing statistical functions

IMO (2) integration and (3) interpolation are the most important, but I would
like to know your opinion.

I think those two sound like good areas! Clustering should probably be downstream (in Nak or elsewhere), but it would be good to have integration and interpolation in the library.

We will start with interpolation (if this is a desired feature) since it's the
easiest topic for me (last term I attended a subject that treated about
interpolation among other numerical algorithms).

Great!

I will implement linear and spline interpolator as well as design an interface
of all univariate interpolators. Tests and documentation also will be written.
BTW, I will use test-driven or documentation-driven methodology, so that I

will start from writing tests or documentation. Then, I will publish
tests/docs so you could make an opinion and give me feedback about API before
implementation.

That sounds like a much healthier approach than what I usually do. :-)

After that, integration will get attention. Again, I will implement only one
method of integration (let it be integration of univariate function using
trapezoid method) and provide rich documentation.

Ok sounds good. It's a good starting point.

Before the coding period starts, I could implement one small module (i.e.
linear univariate interpolation) as a proof that I know Scala good enough to
manage this project and to get started with tools I will use during the GSoC

(I mean sbt and markdown).

I think that sounds good.

-- David

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/0a531c7c-d7ea-495e-b104-07667d461ec7%40googlegroups.com.

Christopher Medrela

unread,

Mar 6, 2014, 7:12:26 AM3/6/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

OK, so here is a short draft of linear interpolation documentation [1].

We need to discuss the API so feel free to comment and criticize.

I wrote the proposal draft. [2] What information is the proposal missing? What

issues do you disagree? Again, don't hesitate to share with your opinion.

[1] https://gist.github.com/chrismedrela/9346729

[2] https://gist.github.com/chrismedrela/9348472

David Hall

unread,

Mar 6, 2014, 5:24:50 PM3/6/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com

Left my comments on the gists.

--

You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/ac60eead-6b72-4b41-98c5-5ab371432340%40googlegroups.com.

Christopher Medrela

unread,

Mar 9, 2014, 12:11:48 PM3/9/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

OK, I've commented on the proposal. I've also written first draft of linear

interpolation [1]. Please have a look at it.

Do I need to post my proposal to scala-language too? Who does decide which

students will be accepted? And how can I improve chances of being accepted?

BTW I think that it'd be better to include documentation into repo.

Unfortunately, it's not easy to integrate with github wiki pages -- each wiki

page is a separated repo. However, documentation could be hosted on

http://www.scalanlp.org/. Of course, this is only an idea, we don't have to do

it. What do you think about that?

[1] https://github.com/chrismedrela/breeze/commit/bee3e7a55d76ac4d10bb75d6dd28f6613d7533be

David Hall

unread,

Mar 10, 2014, 2:58:59 AM3/10/14

to scala-l...@googlegroups.com, scala-...@googlegroups.com

On Sun, Mar 9, 2014 at 6:10 PM, Jonathan Merritt <j.s.m...@gmail.com> wrote:

Hi Chris and David,

I'm not formally involved with this GSoC project, but I've also left a quick comment / question on the first draft of linear interpolation.

To summarise my question for the benefit of the mailing list: what typeclass should interpolation operate on?

In the past, I have required linear interpolation for both vector / tensor spaces and for scalars. Mathematically, interpolation requires the same operations on both types: multiplication by a scalar and addition. However, the operation for scalar multiplication is encoded differently for field / scalar types than it is for vector / tensor spaces. It's my understanding (please correct me if I'm wrong here) that for SemiRings, scalar multiplication is typically done by promoting a scalar to the SemiRing and then performing a SemiRing multiplication operation, whereas for vector / tensor spaces, the scalar multiplication is an explicit operation (mulVS, or OpMulScalar). Is there a simple way around these differences, unifying the types to allow the operations to be done in the same way?

There are two ways I could answer:

1) You sketched this one on github yourself: Scalars can be lifted to a VectorSpace (or, a Module if/when we pull in Spire types)

2) We can probably just appropriate the UFunc infrastructure. Rather than trying to identify the right structure from abstract algebra, just explicitly declare your requirements as implicit parameters (e.g. needs an Impl for OpAdd and OpMul) and then anyone can get them. An interpolator is basically no different from cos.

I've also encountered a similar problem with FIR and IIR filtering of signals: it should be possible to write the filter identically for scalar and vector space types, but there seems to be a mismatch due to the way that scalar multiplication is represented.

If you could elaborate a little, I might be able to help, or it might make me understand the issue better, at least.

-- David

Please let me know if I'm missing something obvious. This might not actually be an issue; it could be that I am currently mis-using the typeclasses somehow. :-)

Thanks,

Jonathan Merritt.

--

You received this message because you are subscribed to the Google Groups "scala-language" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-language" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Hall

unread,

Mar 13, 2014, 2:01:01 PM3/13/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com

Sorry for the delay. Been super busy.

kOn Sun, Mar 9, 2014 at 9:11 AM, Christopher Medrela <chris....@gmail.com> wrote:

OK, I've commented on the proposal. I've also written first draft of linear

interpolation [1]. Please have a look at it.

It looks pretty good! We'll tweak it, but it's a good start.

Do I need to post my proposal to scala-language too? Who does decide which
students will be accepted? And how can I improve chances of being accepted?

Yeah, I think they want that.

BTW I think that it'd be better to include documentation into repo.

Unfortunately, it's not easy to integrate with github wiki pages -- each wiki
page is a separated repo. However, documentation could be hosted on
http://www.scalanlp.org/. Of course, this is only an idea, we don't have to do

it. What do you think about that?

Each wiki page is a single file, yes? g...@github.com:scalanlp/breeze.wiki.git (We can do a git submodule if you want.)

I think that's the best place for it.

-- David

[1] https://github.com/chrismedrela/breeze/commit/bee3e7a55d76ac4d10bb75d6dd28f6613d7533be

--

You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/64de0e31-a58a-4629-a319-73fcf0da6be3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Medrela

unread,

Mar 14, 2014, 11:21:17 AM3/14/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

On Thursday, March 13, 2014 7:01:01 PM UTC+1, David Hall wrote:

Sorry for the delay. Been super busy.

kOn Sun, Mar 9, 2014 at 9:11 AM, Christopher Medrela <chris....@gmail.com> wrote:

OK, I've commented on the proposal. I've also written first draft of linear

interpolation [1]. Please have a look at it.

It looks pretty good! We'll tweak it, but it's a good start.

I improved the code. Is there any way to avoid repeating all arguments in

LinearInterpolator constructor?

Do I need to post my proposal to scala-language too? Who does decide which
students will be accepted? And how can I improve chances of being accepted?

Yeah, I think they want that.

BTW I think that it'd be better to include documentation into repo.

Unfortunately, it's not easy to integrate with github wiki pages -- each wiki
page is a separated repo. However, documentation could be hosted on
http://www.scalanlp.org/. Of course, this is only an idea, we don't have to do

it. What do you think about that?

Each wiki page is a single file, yes? g...@github.com:scalanlp/breeze.wiki.git (We can do a git submodule if you want.)

I think that's the best place for it.

The problem with docs in submodules is that documentation commits are not

associated with code commits. If we had documentation in the same repository,

then there would be no problem like "which breeze version does this documentation

describe?". Today I discovered Github Pages. It supports Jekyll, so it can

generate fancy looking pages from Markdown docs! Maybe we could give it odds?

David Hall

unread,

Mar 15, 2014, 12:47:00 PM3/15/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com

On Fri, Mar 14, 2014 at 8:21 AM, Christopher Medrela <chris....@gmail.com> wrote:

On Thursday, March 13, 2014 7:01:01 PM UTC+1, David Hall wrote:

Sorry for the delay. Been super busy.

kOn Sun, Mar 9, 2014 at 9:11 AM, Christopher Medrela <chris....@gmail.com> wrote:

OK, I've commented on the proposal. I've also written first draft of linear

interpolation [1]. Please have a look at it.

It looks pretty good! We'll tweak it, but it's a good start.

I improved the code. Is there any way to avoid repeating all arguments in

LinearInterpolator constructor?

No, not really. :-/

Do I need to post my proposal to scala-language too? Who does decide which
students will be accepted? And how can I improve chances of being accepted?

Yeah, I think they want that.

BTW I think that it'd be better to include documentation into repo.

Unfortunately, it's not easy to integrate with github wiki pages -- each wiki
page is a separated repo. However, documentation could be hosted on
http://www.scalanlp.org/. Of course, this is only an idea, we don't have to do

it. What do you think about that?

Each wiki page is a single file, yes? g...@github.com:scalanlp/breeze.wiki.git (We can do a git submodule if you want.)

I think that's the best place for it.

The problem with docs in submodules is that documentation commits are not

associated with code commits. If we had documentation in the same repository,
then there would be no problem like "which breeze version does this documentation
describe?". Today I discovered Github Pages. It supports Jekyll, so it can

generate fancy looking pages from Markdown docs! Maybe we could give it odds?

Aren't they? Isn't a git submodule stored as a particular commit to that repo? So "tag releases/v0.6's doc submodule is to commit 1a2b3c4d5e"

Regardless, I'm happy to switch to Jekyll if you think that will be better.

I have a fantasy that involves writing an inverse doctest where code snippets in markdown docs are treated as tests. I don't think it would take that long. Maybe I should just do it.

-- David

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/22216be0-a755-4c7d-b433-2764d0a26397%40googlegroups.com.

Christopher Medrela

unread,

Mar 16, 2014, 12:15:39 PM3/16/14

to scala-l...@googlegroups.com, scala-...@googlegroups.com, dl...@cs.berkeley.edu

On Saturday, March 15, 2014 5:47:00 PM UTC+1, David Hall wrote:

On Fri, Mar 14, 2014 at 8:21 AM, Christopher Medrela <chris....@gmail.com> wrote:
BTW I think that it'd be better to include documentation into repo.

Unfortunately, it's not easy to integrate with github wiki pages -- each wiki
page is a separated repo. However, documentation could be hosted on
http://www.scalanlp.org/. Of course, this is only an idea, we don't have to do

it. What do you think about that?

Each wiki page is a single file, yes? g...@github.com:scalanlp/breeze.wiki.git (We can do a git submodule if you want.)

I think that's the best place for it.

The problem with docs in submodules is that documentation commits are not

associated with code commits. If we had documentation in the same repository,
then there would be no problem like "which breeze version does this documentation
describe?". Today I discovered Github Pages. It supports Jekyll, so it can

generate fancy looking pages from Markdown docs! Maybe we could give it odds?

Aren't they? Isn't a git submodule stored as a particular commit to that repo? So "tag releases/v0.6's doc submodule is to commit 1a2b3c4d5e"

Well, I don't know how submodules work in details. However, I have a really nice

experience from Django where both code and documentation live in the same

repository. As lexspoon wrote, this simplify contribution process because

there is one workflow for both code and docs. And just having everything in

one repo is easier than using submodules.

I have a fantasy that involves writing an inverse doctest where code snippets in markdown docs are treated as tests. I don't think it would take that long. Maybe I should just do it.

I agree, doctests would be really nice. I've added this idea to the list of

low-priority todos in my proposal.

I left a question on my gist proposal [1]. David, please have a look there.

[1] https://gist.github.com/chrismedrela/9348472

Christopher Medrela

unread,

Mar 22, 2014, 2:04:50 PM3/22/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

I left comments on linear-interpolation branch (https://github.com/chrismedrela/breeze/commits/linear-interpolation).

Christopher Medrela

unread,

Apr 5, 2014, 9:11:01 AM4/5/14

to scala-...@googlegroups.com, scala-l...@googlegroups.com, dl...@cs.berkeley.edu

I got stuck in compilation errors and I can't compile even master branch! I've described my problem here: https://github.com/scalanlp/breeze/issues/214.

Reply all

Reply to author

Forward