nipype scalability

basile

unread,

Oct 4, 2013, 9:41:11 AM10/4/13

to nipy...@googlegroups.com

Hello nipype world,

I have general questions on the scalability of nipype, being running a pipeline of increasing size and running into some issues with memory:
- how big (how many nodes) have you been running a nipype pipeline? (I am running a pipeline on an iterables of over 600+ items and then performing something like 50 nodes over these).
- do you also run into memory shortage?
I have been trying to track memory leaks in the nodes I am running but only found that a big amount of space is taken by tuples and dicts of nipype nodes, interfaces, runtimes etc...
Should I continue to dig into finding leaks this or this memory usage is due to overhead of nipype multiplied by graph expansion?

I might also consider moving my pipelines to use caching.memory if this is only overhead, but I wonder if I can avoid re-running all the nodes?

Bests.

basile

basile

unread,

Oct 4, 2013, 10:58:20 AM10/4/13

to nipy...@googlegroups.com

Also it seems that much of the overhead is in the provenance data which are loaded/produced? even when the computed results are loaded ( guppy reports Total size =473352 bytes so multiplied by a huge amount of nodes it might be the problem? ). Do the node that have been ran (and all it's overhead) stays in memory for whole execution and even when workflow is finished?

basile

Chris Filo Gorgolewski

unread,

Oct 4, 2013, 11:06:50 AM10/4/13

to nipy...@googlegroups.com

Lots of questions! I have successfully run workflows with thousands of nodes without memory issues, but I was using condor DAGMan plugin. This way each node is executed in a different instance of a virtual machine so there are no memory leaks (which are more poor GC than leaks since it's python). It would be great if someone could look into decreasing the memory footprint of nipype (this could be you!).

Best,

Chris

--

---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Satrajit Ghosh

unread,

Oct 4, 2013, 11:11:20 AM10/4/13

to nipy-user

hi basile,

indeed the provenance data as it is currently recorded is extensive. i'm submitting a PR soon to change this so that it doesn't actually occupy memory except in short bursts (i'm a little surprised that this actually sticks around - i already thought we were loading results on the fly).

the provenance will be controlled via an execution flag and will be off by default for the next release. the workflow provenance cannot be generated with the graph submission engines easily and that issue needs to be addressed.

but reducing memory footprint is a good thing.

cheers,

satra

--

Reply all

Reply to author

Forward