GSoC

278 views
Skip to first unread message

David Hall

unread,
Feb 11, 2014, 12:54:08 PM2/11/14
to scala-...@googlegroups.com
Hey everyone,

This year I'm planning on trying to get Breeze sponsored by GSoC through Scala itself. They're happy to have us, but want some ideas. Here's what I'm thinking:

* Multiple dimensional arrays: Numpy supports arbitrarily dimensioned. Breeze, being strongly typed, currently only supports 1-dimensional vectors and 2-dimensional matrices. We'd like to extend support to higher-order arrays. One possible avenue is to use Shapeless's HLists.
* Spire/Algebird interoperability: There are currently a number of different math libraries for Scala. Spire (and, to a lesser extent, Algebird) have fairly fleshed out algebraic hierarchies for Rings, Semirings, Monoids, and the like. Breeze has its own parallel hierarchy, but it is less developed. It would be good to unify these, or at least to provide an interoperability layer.
* Another, more ambitious project is to tackle a GPU-backed extension to Breeze, either using (J)CUDA or OpenCL/JavaCL. Breeze is generic enough to support multiple implementations seamlessly, but we have to actually do it. 
* Improving Breeze-Viz. Breeze-Viz is a visualization library inspired by matplotlib. However, it offers a limited range of operations. Breeze-Viz needs to be revamped to support more kinds of visualizations, and to take advantage of modern Scala design patterns. (More generally, we need a great visualization library for Scala. Currently there are none.)

We need a list by tomorrow. I need to flesh these out some more, but I wanted to see what ideas people had. 

(For reference, an idea needs about 150 words of write-up: http://www.scala-lang.org/news/2013/03/20/gsoc13.html)

Jason Baldridge

unread,
Feb 11, 2014, 1:47:10 PM2/11/14
to scala-...@googlegroups.com
Spire/Algebird interop and GPU both look most interesting to me.


--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/CALW2ey1kxbVCHJu5PK__iPDw9x7s70XERNdkhbrWJx_0anbrPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Jason Baldridge
Associate Professor, Dept. of Linguistics, UT Austin
Co-founder & Chief Scientist, People Pattern
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

matthew Drescher

unread,
Feb 11, 2014, 2:02:13 PM2/11/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu
GPU-backend is exciting!
 algebird interop seems like a great idea as well.  

Francisco José Valverde Albacete

unread,
Feb 11, 2014, 1:20:44 PM2/11/14
to scala-...@googlegroups.com
Hi all,

I'm working on semiring algebra and what would *really* save my life is  Breeze and Spire/Algebird being woven together.

What I need is matrices with entries in a semiring and at least the standard matrix operations (both dense AND sparse) (spire already provides a naive matrix implementation over the min-plus semiring).  (I was willing to put as much work as needed into that but just learning the basic patterns for Scala took me the whole of last summer so I cannot pretend to understand all of the subtleties going into the basic matrix classes in Breeze... :(  What I can promise is to stock up the library with as much semiring-based ML techniques as I can produce once the basic framework is there! ;) )

Also, computational linguistics is moving very much towards using tensors, so your first proposal  also looks promising... ¿Perhaps a mix of the two? X)

The third one is not to crucial for me (being a Spark user), and I have lost enough fights to Matlab's plot to know that the fourth is quite needed in a new language (for scientists/engineers), but *rather hard to get right*.

My 1.46 cents!

Regards and thanks for asking,

Francisco Valverde

David Hall

unread,
Feb 11, 2014, 4:32:53 PM2/11/14
to scala-...@googlegroups.com
Ok, great.

Any students lurking that might want to do GSoC and who want to chime in?

-- David


David Hall

unread,
Feb 13, 2014, 1:33:34 PM2/13/14
to scala-...@googlegroups.com
Ok, we need someone (or someones) to agree to be backup mentor to those projects. I'm limiting this to people who have contributed non-trivial code changes. This is just for the case that I drop dead, I think. (If someone wants to be lead mentor on one of these projects, I'd happily do that, too.)

-- David

David Hall

unread,
Feb 14, 2014, 1:04:55 AM2/14/14
to scala-...@googlegroups.com
Ok... anyone? Just for if I drop dead? Jason? Kenta? Martin? ... Bueller?

ktakagaki

unread,
Feb 14, 2014, 1:44:30 AM2/14/14
to scala-...@googlegroups.com
I could help with a new viz, I have some ideas about how the API could be, so that it would (1) work both on screen and when printed for publication, and (2) would be easily extensible for end-user custom graphs...

I was thinking graphics objects-based (more like mathematica instead of axis based like matlab/mathplotlib) with maybe a scalafx backend, and fine-tuned customization (colors, lines, opacities, dsashing, filling...) by passing option objects like I've been trying in breeze.signal

Kenta

ktakagaki

unread,
Feb 14, 2014, 1:46:23 AM2/14/14
to scala-...@googlegroups.com

ktakagaki

unread,
Feb 14, 2014, 1:46:24 AM2/14/14
to scala-...@googlegroups.com

David Hall

unread,
Feb 14, 2014, 1:47:10 AM2/14/14
to scala-...@googlegroups.com
Ok, sounds good. Thanks!

Actually, Matthew Drescher, could I prevail upon you to take over the GPU mentoring if I die? (And anything else... anyone?) The burden is approximately 0 if I don't get hit by a bus.




--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

matthew Drescher

unread,
Feb 14, 2014, 2:31:43 AM2/14/14
to scala-...@googlegroups.com
Absolutely ! :)) 

Btw if there's anything else I can do on the gpu front (while we are both actually alive ) , I would be thrilled.

Cheers
You received this message because you are subscribed to a topic in the Google Groups "Scala Breeze" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-breeze/Z_2_WIwpjdI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-breeze...@googlegroups.com.

To post to this group, send email to scala-...@googlegroups.com.

Jason Baldridge

unread,
Feb 14, 2014, 11:05:48 AM2/14/14
to scala-...@googlegroups.com

David Hall

unread,
Feb 14, 2014, 12:30:07 PM2/14/14
to scala-...@googlegroups.com

a.e.nik...@gmail.com

unread,
Feb 21, 2014, 3:07:01 AM2/21/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu
Hi my name is Alexander Nikolaichik, i came from http://scala-lang.org/gsoc/2014.html and would like to code ndarrays for scala. Now I would try to briefly introduce myself.

Mainly I am 4-s year student(out of 5) of Belarusian State University, faculty of applied math and computer science. My secondary place of study is Yandex school of data analysis(first year out of 2).

Fisrt time i met Scala in autumn of 2012 while taking course of Functional Programming Principles in Scala on coursera. Since 2012 i ended Principles of Reactive Programming course(it could be named as Scala-2 course) and wrote quite many small proramms related to my university/coursera/other study. 

So i like scala and am trying to practice in it when possible, but it looks like, I need some bigger project and more responsibility  for code to continue learning and improving scala skills.

Why breeze and ndarrays? Taking project for GSoC is a big responsibility, so as first variant i choose project with the lowest chances to fail. I perfectly understand why practically always O(n^2) is not acceptable for data structure operations, why some time O(n) or even O(log n) is bad, and what means using of N additional memory. Also i got some experience  in writing some classic collections in C++.

Already at that point i should warn you, that i will spend 3 weeks in army in July with no possibility to continue development, but instead of those weeks i could start development much more earlier and slowly write code all the time before summer.

David Hall

unread,
Feb 21, 2014, 4:18:26 AM2/21/14
to a.e.nik...@gmail.com, scala-...@googlegroups.com
Hi Alexander,

Thanks for writing! It's great that you've worked to learn so much about Scala on your own.

 I want to be sure that this project would be a good match for you. I would not say this is a project with a particularly low chance of failure.  On the contrary, the ndarray project is very challenging: you need to be extremely comfortable with Scala's type system. You must understand how libraries like Shapeless work at a fairly deep level.

If you think this project is a good match for your current knowledge of Scala, I'd like to see some of the code you've written in Scala . You can either point me to some github code, send me some code directly, or we could maybe come up with a short project that would test some of the same skills. The code would need to show your ability to work with generics and implicits in Scala. What do you think?

-- David


a.e.nik...@gmail.com

unread,
Feb 22, 2014, 2:46:11 AM2/22/14
to scala-...@googlegroups.com, a.e.nik...@gmail.com, dl...@cs.berkeley.edu
Making of test task is highly preferable variant. It will give experience to me and more confidence of my competence to you.

пятница, 21 февраля 2014 г., 12:18:26 UTC+3 пользователь David Hall написал:

David Hall

unread,
Feb 25, 2014, 2:03:12 PM2/25/14
to a.e.nik...@gmail.com, scala-...@googlegroups.com
Hi Alexander,

That sounds good. I can think of a few tasks:

The best one is probably making a Transposed trait and adding support for Counter2s: https://github.com/scalanlp/breeze/issues/70

Let me know if you run into trouble. I don't expect you to be able to just jump in and solve this without any guidance, so please ask questions as you get stuck. I do want to be sure that you more or less  know enough Scala (and linear algebra!) to do things like this though.

-- David

a.e.nik...@gmail.com

unread,
Mar 1, 2014, 2:34:38 PM3/1/14
to scala-...@googlegroups.com, a.e.nik...@gmail.com, dl...@cs.berkeley.edu
Hi David,

Maybe I am missing something about goal, but  after looking around i see two variants:
1) Inplace transpose: we don't touch data just reload operations for manipulations with it. Looks like it could be achieved by extending Counter2 and reloading apply,contains, update methods and iterators.
Pros:
    transpose in O(1) time and O(1) additional memory
Cons:
    Solution is not generic and ugly
    contains with one param would work slow. Maybe something else.

2)We changing underlying data. If we transforming data, why not return the new one  instead of wraping old one into Transposed trait? If we transforming data and don't wrap into Transposed trait:
Pros:
    solution generic
    after transpositions, operations doesn't become slower
Cons:
    Time for transpose O(n) and O(n) additional memory

Am I missing something? If not, which way is preferable?

вторник, 25 февраля 2014 г., 22:03:12 UTC+3 пользователь David Hall написал:

sac...@hotmail.com

unread,
Mar 4, 2014, 4:45:17 PM3/4/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu
Dear David,

I am writing to express my interest in the Scala project 'Spire/Algebird  Interoperability'. I am a second year student in Applied Maths at McGill with a good knowledge of abstract algebra and an interest in the Scala language.

Looking forward to hearing from you,

Sacha

Piotr Moczurad

unread,
Mar 10, 2014, 5:34:38 PM3/10/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu
 Hello David,

I would very much like to take part in the GSoC project this year with your team. I'm interested in the GPU extensions for Breeze. I'm a second-year computer science student at the AGH University of Science and Technology in Krakow, Poland. I've done both of the Coursera Scala courses (Principles of Functional Programming and Principles and Reactive programming) and I have some general experience in FP (Scala as well as Erlang). At the University we do a lot of Java, so I do have a Java background. Some time ago I picked up CUDA GPU programming which kind of became my favorite thing. That's why I think that a combination of both Scala and GPU programming is a great idea.

I hope that I can be of some help during the project.

Looking forward to hearing from you,
Piotr

David Hall

unread,
Mar 13, 2014, 3:31:18 PM3/13/14
to sac...@hotmail.com, scala-...@googlegroups.com
Hi Sacha,

Sorry for the long delay.

I've been in communication with another student who has been working with Erik (of Spire) and myself to set up a plan for this project.

Is there another project that might interest you?

David Hall

unread,
Mar 13, 2014, 3:43:51 PM3/13/14
to Piotr Moczurad, scala-...@googlegroups.com
Hi Piotr,

This sounds good. Do you have ready access to a CUDA-compatible device? If so, take a look at https://github.com/dlwh/gust, get the tests running (should just be sbt test) and poke around the code a little. I have some basic things working already (matrix multiply, elementwise operations like addition, and kernels for basic numeric routines). We also need support for a number of other things (e.g. reduction/scan type operations like max, including column- and row-wise max).

Could you come up with the functionality you would like to add? (reductions, sparse matrix support) You might take a look at the functionality in the core of Breeze.

sac...@hotmail.com

unread,
Mar 14, 2014, 6:22:00 PM3/14/14
to scala-...@googlegroups.com, sac...@hotmail.com, dl...@cs.berkeley.edu
Hi David,

Not to worry.  I was only able to complete half of Odersky's online course due to various university commitments but it was enough to fall in love with the language. This summer, I have four months free to follow my interests, really getting to grips with Scala being one of them. So I'm interested in anything that will enable me to gain some experience using the Scala language. I don't have the most technical background but I am a quick learner.

Sacha 

David Hall

unread,
Mar 17, 2014, 2:22:26 AM3/17/14
to scala-...@googlegroups.com, sac...@hotmail.com
On Fri, Mar 14, 2014 at 3:22 PM, <sac...@hotmail.com> wrote:
Hi David,

Not to worry.  I was only able to complete half of Odersky's online course due to various university commitments but it was enough to fall in love with the language. This summer, I have four months free to follow my interests, really getting to grips with Scala being one of them. So I'm interested in anything that will enable me to gain some experience using the Scala language. I don't have the most technical background but I am a quick learner.

Hrm, one thing we need are some of the more common "specially shaped" matrices: diagonal, triangular, etc. That plus working on sparse matrices could fill up a summer, depending.

That said, I think I'm probably going to run out of bandwidth. I need someone else to step forward to take on any new students. Happy to have anyone step in for any of the ones brought up so far. I'll continue to help, but there's a limit to what I can advise.

 

Sacha 

On Thursday, 13 March 2014 15:31:18 UTC-4, David Hall wrote:
Hi Sacha,

Sorry for the long delay.

I've been in communication with another student who has been working with Erik (of Spire) and myself to set up a plan for this project.

Is there another project that might interest you?


On Tue, Mar 4, 2014 at 1:45 PM, <sac...@hotmail.com> wrote:
Dear David,

I am writing to express my interest in the Scala project 'Spire/Algebird  Interoperability'. I am a second year student in Applied Maths at McGill with a good knowledge of abstract algebra and an interest in the Scala language.

Looking forward to hearing from you,

Sacha


On Tuesday, February 11, 2014 6:54:08 PM UTC+1, David Hall wrote:
Hey everyone,

This year I'm planning on trying to get Breeze sponsored by GSoC through Scala itself. They're happy to have us, but want some ideas. Here's what I'm thinking:

* Multiple dimensional arrays: Numpy supports arbitrarily dimensioned. Breeze, being strongly typed, currently only supports 1-dimensional vectors and 2-dimensional matrices. We'd like to extend support to higher-order arrays. One possible avenue is to use Shapeless's HLists.
* Spire/Algebird interoperability: There are currently a number of different math libraries for Scala. Spire (and, to a lesser extent, Algebird) have fairly fleshed out algebraic hierarchies for Rings, Semirings, Monoids, and the like. Breeze has its own parallel hierarchy, but it is less developed. It would be good to unify these, or at least to provide an interoperability layer.
* Another, more ambitious project is to tackle a GPU-backed extension to Breeze, either using (J)CUDA or OpenCL/JavaCL. Breeze is generic enough to support multiple implementations seamlessly, but we have to actually do it. 
* Improving Breeze-Viz. Breeze-Viz is a visualization library inspired by matplotlib. However, it offers a limited range of operations. Breeze-Viz needs to be revamped to support more kinds of visualizations, and to take advantage of modern Scala design patterns. (More generally, we need a great visualization library for Scala. Currently there are none.)

We need a list by tomorrow. I need to flesh these out some more, but I wanted to see what ideas people had. 

(For reference, an idea needs about 150 words of write-up: http://www.scala-lang.org/news/2013/03/20/gsoc13.html)

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

Somename Somesurname

unread,
Mar 19, 2014, 5:02:04 PM3/19/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu
Hi David, i decided to make some simple start for ndarray. And at the very beginning i got fundamental question: should ndarray in scala be "multidimensional container of items of the same type and size" as described in scipy doc(http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) or should it have possibility to contain elements of different types?  

So depending on answer order of my actions is:
1)Try(or not)  making data array generic. I suppose shapeless could help here.
2)Implement array manipulation routines(Needed subset of mentioned in http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html )
3)Make syntax of this manipulations plain and handy( [:] and so on - nice notations)
4)Basic array creation routines
5)Arithmetic and comparison operations
6)Other stuff(basing on description of ndarray operations provided here http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)

One more question: is it preferable to use utest as testing library? (There was issue for breeze migration on it)

repo link: https://github.com/deathnik/scala_ndarray

David Hall

unread,
Mar 20, 2014, 6:36:07 PM3/20/14
to scala-...@googlegroups.com
Thanks for trying to take this on!

I like the separation between ndshape and ndarray.

The biggest issues are 

1) I'd like to make the shape of the array (well, the arity of the shape) statically known at compile time. This is the hardest thing about making an ndarray I'll be happy with. 
2) Don't use List[T] by default. The default should always be IndexedSeq[T] or Seq[T], unless you actually need a singly linked list or structure sharing.
3) Performance: You've demonstrated competency at using higher order functions and such, but all those closures and things have terrible performance implications for operations that should be fast.

More minorly:
 def apply[T <: LinearSeq[Int]](indexes: T). There's no reason to make this a type param.

-- David


--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.

Somename Somesurname

unread,
Mar 21, 2014, 5:51:25 AM3/21/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu


пятница, 21 марта 2014 г., 1:36:07 UTC+3 пользователь David Hall написал:
Thanks for trying to take this on!

I like the separation between ndshape and ndarray.

The biggest issues are 

1) I'd like to make the shape of the array (well, the arity of the shape) statically known at compile time. This is the hardest thing about making an ndarray I'll be happy with. 
        Ok, I will pay close attention to shapeless and will try to do so.
2) Don't use List[T] by default. The default should always be IndexedSeq[T] or Seq[T], unless you actually need a singly linked list or structure sharing.
        Okey.
3) Performance: You've demonstrated competency at using higher order functions and such, but all those closures and things have terrible performance implications for operations that should be fast.
        I Will rewrite ndshape's apply method and think of efficient way of iterating. I suppose that other operations could be left "as is" because their time complexity depend on dimension number, which couldn't be very high.

Somename Somesurname

unread,
Mar 23, 2014, 5:29:45 AM3/23/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu

So now array could be created from HList or Tuple of an arbitrary size and it has dimension number check at compile time.

var nd = NDArray(arr, (2,3))

nd = NDArray(arr, 3::4:: Hnil) //ok

nd = NDArray(arr, (2,3,4)) //compilation error

Also made small refactoring according to your notes.


Plan for some next day's:

1)I will introduce NDFlags – container for specific information like is array owning data or just an view. Could be very useful in later development

2)Will try to get familiar with some scala testing framework(Is utest preferable?) and will write some tests.

3)Will implement merging list of n-dim arrays into 1 n+1-dim array, reshaping, axis swapping. Is there some other valuable transformations which I forgot?


How often should I write reports to you and which form of them is preferable?


Jianbo Ye

unread,
Apr 22, 2014, 7:34:10 PM4/22/14
to scala-...@googlegroups.com, dl...@cs.berkeley.edu


On Tuesday, February 11, 2014 12:54:08 PM UTC-5, David Hall wrote:
Hey everyone,

This year I'm planning on trying to get Breeze sponsored by GSoC through Scala itself. They're happy to have us, but want some ideas. Here's what I'm thinking:

* Multiple dimensional arrays: Numpy supports arbitrarily dimensioned. Breeze, being strongly typed, currently only supports 1-dimensional vectors and 2-dimensional matrices. We'd like to extend support to higher-order arrays. One possible avenue is to use Shapeless's HLists.

I think the most challenging thing is to to implement high-order tensor, but it is really very useful. Here are three types of operations: 


Contraction operation (fold & foldin) are really powerful, please consider it in high priority. I am not very comfortable with scala type system for now. 
So as far as I see, instead of implementing a generic NDarray, a fully supported 3d tensor could be more useful. 
Reply all
Reply to author
Forward
0 new messages