I have written a function that runs functions in separate processes. I hope you can help me improving it, and I would like to submit it to the Python cookbook if its quality is good enough.
I was writing a numerical program (using numpy) which uses huge amounts of memory, the memory increasing with time. The program structure was essentially:
for radius in radii: result = do_work(params)
where do_work actually uses a large number of temporary arrays. The variable params is large as well and is the result of computations before the loop.
After playing with gc for some time, trying to convince it to to release the memory, I gave up. I will be happy, by the way, if somebody points me to a web page/reference that says how to call a function then reclaim the whole memory back in python.
Meanwhile, the best that I could do is fork a process, compute the results, and return them back to the parent process. This I implemented in the following function, which is kinda working for me now, but I am sure it can be much improved. There should be a better way to return the result that a temporary file, for example. I actually thought of posting this after noticing that the pypy project had what I thought was a similar thing in their testing, but they probably dealt with it differently in the autotest driver [1]; I am not sure.
Here is the function:
def run_in_separate_process(f, *args, **kwds): from os import tmpnam, fork, waitpid, remove from sys import exit from pickle import load, dump from contextlib import closing fname = tmpnam() pid = fork() if pid > 0: #parent waitpid(pid, 0) # should have checked for correct finishing with closing(file(fname)) as f: result = load(f) remove(fname) return result else: #child result = f(*args, **kwds) with closing(file(fname,'w')) as f: dump(result, f) exit(0)
To be used as:
for radius in radii: result = run_in_separate_process (do_work, params)
> I have written a function that runs functions in separate processes. I > hope you can help me improving it, and I would like to submit it to > the Python cookbook if its quality is good enough.
> I was writing a numerical program (using numpy) which uses huge > amounts of memory, the memory increasing with time. The program > structure was essentially:
> for radius in radii: > result = do_work(params)
> where do_work actually uses a large number of temporary arrays. The > variable params is large as well and is the result of computations > before the loop.
> After playing with gc for some time, trying to convince it to to > release the memory, I gave up. I will be happy, by the way, if > somebody points me to a web page/reference that says how to call a > function then reclaim the whole memory back in python.
> Meanwhile, the best that I could do is fork a process, compute the > results, and return them back to the parent process. This I > implemented in the following function, which is kinda working for me > now, but I am sure it can be much improved. There should be a better > way to return the result that a temporary file, for example. I > actually thought of posting this after noticing that the pypy project > had what I thought was a similar thing in their testing, but they > probably dealt with it differently in the autotest driver [1]; I am > not sure.
> Here is the function:
> def run_in_separate_process(f, *args, **kwds): > from os import tmpnam, fork, waitpid, remove > from sys import exit > from pickle import load, dump > from contextlib import closing > fname = tmpnam() > pid = fork() > if pid > 0: #parent > waitpid(pid, 0) # should have checked for correct finishing > with closing(file(fname)) as f: > result = load(f) > remove(fname) > return result > else: #child > result = f(*args, **kwds) > with closing(file(fname,'w')) as f: > dump(result, f) > exit(0)
> To be used as:
> for radius in radii: > result = run_in_separate_process (do_work, params)
> somebody points me to a web page/reference that says how to call a > function then reclaim the whole memory back in python.
> Meanwhile, the best that I could do is fork a process, compute the > results, and return them back to the parent process. This I
That's my favorite way to ensure that all resources get reclaimed: let the operating system do the job.
> implemented in the following function, which is kinda working for me > now, but I am sure it can be much improved. There should be a better > way to return the result that a temporary file, for example. I
You can use a pipe. I.e. (untested code):
def run_in_separate_process(f, *a, **k): import os, sys, cPickle pread, pwrite = os.pipe() pid = os.fork() if pid>0: os.close(pwrite) with os.fdopen(pread, 'rb') as f: return cPickle.load(f) else: os.close(pread) result = f(*a, **k) with os.fdopen(pwrite, 'wb') as f: cPickle.dump(f, -1) sys.exit()
Using cPickle instead of pickle, and a negative protocol (on the files pedantically specified as binary:-), meaning the latest and greatest available pickling protocol, rather than the default 0, should improve performance.
Thanks Mike for you answer. I will use the occasion to add some comments on the links and on my approach.
I am programming in Python 2.5, mainly to avoid the bug that memory arenas were never freed before. The program is working on both Mac OS X (intel) and Linux, so I prefer portable approaches.
On Apr 11, 3:34 pm, kyoso...@gmail.com wrote: [...]
> I found a post on a similar topic that looks like it may give you some > ideas:
I see the comment about using mmap as valuable. I tried to use that using numpy.memmap but I wasn't successful. I don't remember why at the moment. The other tricks are problem-dependent, and my case is not like them (I believe).
I probably got the idea from a previous thread by him or somebody else. It should be much earlier than March, though, as my program was working since last year.
So, let's say the function I have written is an implementation of Alex's architectural pattern. Probably makes it easier to get in the cookbook:)
On Apr 11, 3:58 pm, a...@mac.com (Alex Martelli) wrote: [...]
> That's my favorite way to ensure that all resources get reclaimed: let > the operating system do the job.
Thanks a lot, Alex, for confirming the basic idea. I will be playing with your function later today, and will give more feedback. I think I avoided the pipe on the mistaken belief that pipes cannot be binary. I know, I should've tested. And I avoided pickle at the time because I had a structure that was unpicklable (grown by me using a mixture of python, C, ctypes and pyrex at the time). The structure is improved now, and I will go for the more standard approach..
On Apr 11, 4:36 pm, malkaro...@gmail.com wrote: [...]
> .. And I avoided pickle at the time > because I had a structure that was unpicklable (grown by me using a > mixture of python, C, ctypes and pyrex at the time). The structure is > improved now, and I will go for the more standard approach..
Sorry, I was speaking about an older version of my code. The code is already using pickle, and yes, cPickle is better.
Still trying the code. So far, after modifying the line:
After playing with Alex's implementation, and adding some support for exceptions, this is what I came up with. I hope I am not getting too clever for my needs:
import os, cPickle def run_in_separate_process_2(f, *args, **kwds): pread, pwrite = os.pipe() pid = os.fork() if pid > 0: os.close(pwrite) with os.fdopen(pread, 'rb') as f: status, result = cPickle.load(f) os.waitpid(pid, 0) if status == 0: return result else: raise result else: os.close(pread) try: result = f(*args, **kwds) status = 0 except Exception, exc: result = exc status = 1 with os.fdopen(pwrite, 'wb') as f: try: cPickle.dump((status,result), f, cPickle.HIGHEST_PROTOCOL) except cPickle.PicklingError, exc: cPickle.dump((2,exc), f, cPickle.HIGHEST_PROTOCOL) f.close() os._exit(0)
Basically, the function is called in the child process, and a status code is returned in addition to the result. The status is 0 if the function returns normally, 1 if it raises an exception, and 2 if the result is unpicklable. Some cases are deliberately not handled, like a SystemExit or a KeyboardInterrupt show up as EOF errors in the unpickling in the parent. Some cases are inadvertently not handled, these are called bugs. And the original exception trace is lost. Any comments?
After playing a little with Alex's function, I got to:
import os, cPickle def run_in_separate_process_2(f, *args, **kwds): pread, pwrite = os.pipe() pid = os.fork() if pid > 0: os.close(pwrite) with os.fdopen(pread, 'rb') as f: status, result = cPickle.load(f) os.waitpid(pid, 0) if status == 0: return result else: raise result else: os.close(pread) try: result = f(*args, **kwds) status = 0 except Exception, exc: result = exc status = 1 with os.fdopen(pwrite, 'wb') as f: try: cPickle.dump((status,result), f, cPickle.HIGHEST_PROTOCOL) except cPickle.PicklingError, exc: cPickle.dump((2,exc), f, cPickle.HIGHEST_PROTOCOL) f.close() os._exit(0)
It handles exceptions as well, partially. Basically the child process returns a status code as well as a result. If the status is 0, then the function returned successfully and its result is returned. If the status is 1, then the function raised an exception, which will be raised in the parent. If the status is 2, then the function has returned successfully but the result is not picklable, an exception is raised. Exceptions such as SystemExit and KeyboardInterrupt in the child are not checked and will result in an EOFError in the parent.