GSoC: Rewriting scipy.ndimage in Cython

121 views
Skip to first unread message

AMAN singh

unread,
Mar 9, 2015, 9:24:06 PM3/9/15
to scikit...@googlegroups.com, ralf.g...@gmail.com
Hi developers

My name is Aman Singh and I am currently a second year undergraduate student of Computer Science department at Indian Institute of Technology, Jodhpur. I want to participate in GSoC'15 and the project I am aiming for is porting scipy.ndimage to cython. I have been following scipy for the last few months and have also made some contributions. I came across this project on their GSoC'15 ideas' page and found it interesting.
I have done some research in the last week on my part. I am going through Cython documentation, scipy lecture on github and Richard's work of GSoC' 14 in which he ported cluster package to cython. While going through the module scipy.ndimage I also found that Thouis Jones had already ported a function ndimage.label() to cython.  I can use that as a reference for the rest of the project.

Please tell me whether I am on right track or not. If you can suggest me some resources which will be helpful to me in understanding the project, I would be highly obliged. Also, I would like to know that how much part of ndimage is to be ported under this project since it is a big module.  
Kindly provide me some suggestions and guide me through this.

Regards,

Aman Singh

Stéfan van der Walt

unread,
Mar 10, 2015, 2:22:18 AM3/10/15
to scikit-image
Hi Aman

On Mon, Mar 9, 2015 at 9:52 AM, AMAN singh <ug201...@iitj.ac.in> wrote:
> Please tell me whether I am on right track or not. If you can suggest me
> some resources which will be helpful to me in understanding the project, I
> would be highly obliged. Also, I would like to know that how much part of
> ndimage is to be ported under this project since it is a big module.
> Kindly provide me some suggestions and guide me through this.

Thanks for your interest in GSoC 2015! Please have a look at the
issues for scikit-image, and try and submit a few PRs so that we can
work together and get to know you a bit better.

Thanks!
Stéfan

Ralf Gommers

unread,
Mar 15, 2015, 1:45:36 PM3/15/15
to scikit...@googlegroups.com
On Tue, Mar 10, 2015 at 7:21 AM, Stéfan van der Walt <ste...@berkeley.edu> wrote:
Hi Aman

On Mon, Mar 9, 2015 at 9:52 AM, AMAN singh <ug201...@iitj.ac.in> wrote:
> Please tell me whether I am on right track or not. If you can suggest me
> some resources which will be helpful to me in understanding the project, I
> would be highly obliged. Also, I would like to know that how much part of
> ndimage is to be ported under this project since it is a big module.
> Kindly provide me some suggestions and guide me through this.

Hi Aman, the idea is to port the whole module. I think you should make a plan for that. We are aware that it's a large job, and whether or not it's feasible to complete all of ndimage within one GSoC depends on how fast you will go. Compared to porting scipy.cluster last year I'd guess that ndimage is >2x more work. However, Richard last year implemented new features in addition to completing the port, so for a fast student I expect it to be possible to complete the whole module. I would expect the main challenge to be to make the Cython version (close to) as fast as the current C code.
 
Thanks for your interest in GSoC 2015!  Please have a look at the
issues for scikit-image, and try and submit a few PRs so that we can
work together and get to know you a bit better.

@all: it's maybe good to know that Aman has already submitted 5 PRs to Scipy (4 small ones merged, 1 larger one for which the bottleneck is on our side): https://github.com/scipy/scipy/pulls?q=is%3Apr+author%3Abewithaman+is%3Aclosed

@Aman: the majority of expertise and mentoring power will likely come from the scikit-image devs, so it would be good to submit a few scikit-image PRs as Stefan says. Feel free to ping me - I read the scikit-image mailing list but not Github activity.

Cheers,
Ralf


Stéfan van der Walt

unread,
Mar 23, 2015, 4:32:07 AM3/23/15
to scikit-image
Hi folks,

On Sun, Mar 15, 2015 at 10:45 AM, Ralf Gommers <ralf.g...@gmail.com> wrote:
> @all: it's maybe good to know that Aman has already submitted 5 PRs to Scipy
> (4 small ones merged, 1 larger one for which the bottleneck is on our side):
> https://github.com/scipy/scipy/pulls?q=is%3Apr+author%3Abewithaman+is%3Aclosed

I wasn't aware--thanks for the heads-up!

Stéfan

AMAN singh

unread,
Mar 24, 2015, 8:35:54 PM3/24/15
to scikit...@googlegroups.com, ralf.g...@gmail.com, AMAN singh
Hi Everyone

I have made a basic draft of my proposal here.
Please review it and suggest modifications.

@Ralf and @stefanv thanks for the suggestions.

Regards,
Aman

Stéfan van der Walt

unread,
Mar 24, 2015, 9:00:59 PM3/24/15
to scikit-image
Hi Aman

On Tue, Mar 24, 2015 at 5:34 PM, AMAN singh <ug201...@iitj.ac.in> wrote:
> I have made a basic draft of my proposal here.
> Please review it and suggest modifications.

I would suggest that, instead of filling out the leaves of the tree,
we start by fully porting one piece of functionality.

It would be good if you could construct at least a list of top level
functions to be ported. The timeline is currently a bit vague.

Regards
Stéfan

Jaime Fernández del Río

unread,
Mar 24, 2015, 11:26:56 PM3/24/15
to scikit...@googlegroups.com
On Tue, Mar 24, 2015 at 5:34 PM, AMAN singh <ug201...@iitj.ac.in> wrote:
Hi Everyone

I have made a basic draft of my proposal here.
Please review it and suggest modifications.

Hi Aman,

This may not be 100% true for all the functionality, but I believe that the gist of the ndimage module is in the 4-5 object-like constructs in ni_support, namely:

  • NI_Iterator in its three flavors: point, subspace and line iterator,
  • NI_LineBuffer and
  • NI_FilterIterator.
Closely linked to this is the choice of a method to deal with multiple dtypes, a question for which I don't think there is an obvious answer. Since performance is critical, you may want to take a look at bottleneck's use of templates that are pre-processed before cythonizing and compiling.

If you get these right, then rather than the leaves of the tree, you will have built a solid foundation, more like the the trunk: porting all the other modules is then going to mostly be little more than an exercise in translation. So I would suggest that you devote more time to getting these fundamental questions right, as some trial and error is going to be inevitable.

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

Ralf Gommers

unread,
Mar 25, 2015, 3:13:57 AM3/25/15
to scikit...@googlegroups.com
This sounds like great advice.

Comments on the timeline:
- week 1 "reading through the code": I think you should cover, and will have covered this, in the communitiy bonding period. At the start of week 1 you should be at the point where you start tacking the problem. Probably by doing what Jaime says above.

- unit tests, docs and benchmarks: these cannot be separated from writing code. Each PR should have decent unit test coverage and a decent docstring. Plus since performance is critical you have to benchmark your code as you go. The only thing you could reserve time for at the end is writing some longer documentation (maybe a tutorial), benchmarks in ASV format (see https://github.com/scipy/scipy/tree/master/benchmarks) and some minor cleanups.

Regarding "I will also use better algorithms when possible to improve the time complexity of the functions": it is important to not mix porting code from C to Cython with changing the algorithm, because when output of a function doesn't match the current Scipy output you don't know whether porting or algorithm changes are the cause.

Cheers,
Ralf



AMAN singh

unread,
Mar 26, 2015, 3:40:57 PM3/26/15
to scikit...@googlegroups.com, ralf.g...@gmail.com, tho...@gmail.com
Thank you everyone for your insightful comments.
I have tried to incorporate your suggestion in the proposal.  Kindly have a look at the new proposal here and suggest the improvements.

Thanks once again.
Regards,

Aman Singh



On Tuesday, March 10, 2015 at 6:54:06 AM UTC+5:30, AMAN singh wrote:

Ralf Gommers

unread,
Mar 27, 2015, 5:27:23 AM3/27/15
to AMAN singh, scikit...@googlegroups.com, Thouis Jones
On Thu, Mar 26, 2015 at 8:40 PM, AMAN singh <ug201...@iitj.ac.in> wrote:
Thank you everyone for your insightful comments.
I have tried to incorporate your suggestion in the proposal.  Kindly have a look at the new proposal here and suggest the improvements.

Hi Aman, this looks quite good to me. For the timeline I think it will take longer to get the iterators right and shorter to port the last functions at the end - once you get the hang of it you'll be able to do the last ones quickly I expect.

Cheers,
Ralf


Jaime Fernández del Río

unread,
Mar 27, 2015, 10:04:12 AM3/27/15
to scikit...@googlegroups.com
On Fri, Mar 27, 2015 at 2:27 AM, Ralf Gommers <ralf.g...@gmail.com> wrote:


On Thu, Mar 26, 2015 at 8:40 PM, AMAN singh <ug201...@iitj.ac.in> wrote:
Thank you everyone for your insightful comments.
I have tried to incorporate your suggestion in the proposal.  Kindly have a look at the new proposal here and suggest the improvements.

Hi Aman, this looks quite good to me. For the timeline I think it will take longer to get the iterators right and shorter to port the last functions at the end - once you get the hang of it you'll be able to do the last ones quickly I expect.

That sounds about right. I think that breaking down the schedule to what function will be ported what week is little more than wishful thinking, and that keeping things at the file level would make more sense. But I think you are getting your proposal there.

One idea that just crossed my mind: checking your implementation of the iterators and other stuff in support.c for correctness and performance is going to be an important part of the project. Perhaps it is a good idea to identify, either now or very early on the project, a few current ndimage top level functions that use each of those objects, if possible without interaction with the others, and build a sequence that could look something like (I am making this up in a hurry, so don't take the actual function names proposed too seriously, although they may actually make sense):

Port NI_PointIterator -> Port NI_CenterOfMass, benchmark and test
Port NI_LineBuffer -> Port NI_UniformFilter1D, benchmark and test
...

This would very likely extend the time you will need to implement all the items in support.c. But by the time you were finished with that we would both have high confidence that things were going well, plus a "Rosetta Stone" that should make it a breeze to finish the job, both for you and anyone else. We would also have an intermediate milestone (everything in support ported plus a working example of each being used, with correctness and performance verified), that would be a worthy deliverable on its own: if we are terribly miscalculating task duration, and everything slips and is delayed, getting there could still be considered a success, since it would make finishing the job for others much, much simpler.

One little concern of mine, and the questions don't really go to Aman, but to the scipy devs: the Cython docs on fused types have a big fat warning at the top on support still being experimental. Also, this is going to bump the version requirements for Cython to a very recent one. Are we OK with this?

Similarly, you suggest using Cython's prange to parallelize computations. I haven't seen OpenMP used anywhere in NumPy or SciPy, and have the feeling that parallel implementations are left out on purpose. Am I right, or would parallelizing were possible be OK?

Jaime

-- 

Jérôme Kieffer

unread,
Mar 30, 2015, 1:30:31 AM3/30/15
to scikit...@googlegroups.com
On Fri, 27 Mar 2015 07:04:10 -0700
Jaime Fernández del Río <jaime...@gmail.com> wrote:

> Similarly, you suggest using Cython's prange to parallelize computations. I
> haven't seen OpenMP used anywhere in NumPy or SciPy, and have the feeling
> that parallel implementations are left out on purpose. Am I right, or would
> parallelizing were possible be OK?

OpenMP is tricky under MacOSX: 10.7-10.9 had no support at all (they
use clang <3.6). Since 10.10, the support is incomplete, well at least
many code I tested fail with OpenMP (they run under linux and windows),
I noticed wrong results, not only failure to compile !

Of course on can install gcc or icc, but this is not in Python's philosophy

--
Jérôme Kieffer <goo...@terre-adelie.org>

Ralf Gommers

unread,
Apr 3, 2015, 5:08:46 PM4/3/15
to scikit...@googlegroups.com
On Fri, Mar 27, 2015 at 3:04 PM, Jaime Fernández del Río <jaime...@gmail.com> wrote:
On Fri, Mar 27, 2015 at 2:27 AM, Ralf Gommers <ralf.g...@gmail.com> wrote:


On Thu, Mar 26, 2015 at 8:40 PM, AMAN singh <ug201...@iitj.ac.in> wrote:
Thank you everyone for your insightful comments.
I have tried to incorporate your suggestion in the proposal.  Kindly have a look at the new proposal here and suggest the improvements.

Hi Aman, this looks quite good to me. For the timeline I think it will take longer to get the iterators right and shorter to port the last functions at the end - once you get the hang of it you'll be able to do the last ones quickly I expect.

That sounds about right. I think that breaking down the schedule to what function will be ported what week is little more than wishful thinking, and that keeping things at the file level would make more sense. But I think you are getting your proposal there.

One idea that just crossed my mind: checking your implementation of the iterators and other stuff in support.c for correctness and performance is going to be an important part of the project. Perhaps it is a good idea to identify, either now or very early on the project, a few current ndimage top level functions that use each of those objects, if possible without interaction with the others, and build a sequence that could look something like (I am making this up in a hurry, so don't take the actual function names proposed too seriously, although they may actually make sense):

Port NI_PointIterator -> Port NI_CenterOfMass, benchmark and test
Port NI_LineBuffer -> Port NI_UniformFilter1D, benchmark and test
...

This would very likely extend the time you will need to implement all the items in support.c. But by the time you were finished with that we would both have high confidence that things were going well, plus a "Rosetta Stone" that should make it a breeze to finish the job, both for you and anyone else. We would also have an intermediate milestone (everything in support ported plus a working example of each being used, with correctness and performance verified), that would be a worthy deliverable on its own: if we are terribly miscalculating task duration, and everything slips and is delayed, getting there could still be considered a success, since it would make finishing the job for others much, much simpler.

That sounds like an excellent idea to me.
 
One little concern of mine, and the questions don't really go to Aman, but to the scipy devs: the Cython docs on fused types have a big fat warning at the top on support still being experimental. Also, this is going to bump the version requirements for Cython to a very recent one. Are we OK with this?

We're using fused types in more places in Scipy now. They've been around for a while, and apart from that you have to be careful with using multiple usages of a fused type in a single function (which explodes the generated code and binary size) I don't remember many problems with it. Maybe worth asking the Cython devs why they haven't removed that warning yet?
 
Similarly, you suggest using Cython's prange to parallelize computations. I haven't seen OpenMP used anywhere in NumPy or SciPy, and have the feeling that parallel implementations are left out on purpose. Am I right, or would parallelizing were possible be OK?

Yep, that has been on purpose so far. That could change of course, but it would need significant discussion and an overall strategy first. OpenMP proposals for individual functions have always been rejected before. So would be better to remove it from this GSoC proposal.

Ralf

AMAN singh

unread,
Apr 4, 2015, 2:06:28 PM4/4/15
to scikit...@googlegroups.com
Hi everyone

@Jaime Thanks for the suggestion. This is really a great idea I will follow this excellent strategy while rewriting the module. 

@Stefanv I was not able to add the suggestions of Jaime since my proposal was locked . Can you please allow me revise my proposal? I want to include Jaime's suggestion in it.


Regards, 
Aman Singh
Reply all
Reply to author
Forward
0 new messages