I started to use PyXG more seriously and now I face a problem that I'm
unable to solve. The cluster works fine, I run Leopard with Macports
Python 2.6.4 and I can retrieve the computed results. However, after
having submitted some hundred jobs, I get the following error:
============
........
Job submitted with id: 9760
Job submitted with id: 9761
Job submitted with id: 9762
Job submitted with id: 9763
Job submitted with id: 9764
== retrieving results ... ==
Python(8243,0xa00fb720) malloc: *** mmap(size=274432) failed (error
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "./GA.py", line 588, in <module>
ga.run()
File "./GA.py", line 572, in run
self.p.evolve()
File "./GA.py", line 405, in evolve
jm.job(j).results(silent=True)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 434, in job
self._updateJobs()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 389, in _updateJobs
result = xgridParse(cmd)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 233, in xgridParse
return NSString.stringWithString_(result[1]).xGridPropertyList()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 197, in xGridPropertyList
lines = str.splitlines()
MemoryError
============
and very similarly:
============
........
Job submitted with id: 11960
Job submitted with id: 11961
Job submitted with id: 11962
Job submitted with id: 11963
Job submitted with id: 11964
== retrieving results ... ==
Python(33329,0xa00fb720) malloc: *** mmap(size=413696) failed (error
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "./GA.py", line 591, in <module>
ga.run()
File "./GA.py", line 575, in run
self.p.evolve()
File "./GA.py", line 408, in evolve
jm.job(j).results(silent=True)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 434, in job
self._updateJobs()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 389, in _updateJobs
result = xgridParse(cmd)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 233, in xgridParse
return NSString.stringWithString_(result[1]).xGridPropertyList()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/xg.py", line 204, in xGridPropertyList
str = sep.join(lines)
MemoryError
============
It has nothing to do with the cluster, if I simply rerun my script it
will again run smoothly until the memory is filled up again.
What I have looked into so far:
- Removing the NSString extension altogether, but my xgrid CLI
apparently still returns old style plists. (BTW: is this solved in
Snow Leopard?)
- PyObjC: I suspected some missing "release" in the extended NSString
class. My (so far limited) understanding of Objective-C is that using
NSString.stringWithString() should autorelease the memory after usage.
And this is how it's done in PyXG. The PyObjC bridge should then take
care of it.
- JobSpecification dictionary: I suspected the JobSpecification
dictionary to be getting too big. However, the 3 files that are sent
to the agents (a C++ exe and two config files) are not more than 1 MB
in total. Manual inspection of jobs didn't reveal anything unusual.
In order to exclude a problem of my code, I extracted the main bits
that call PyXG. The dummy jobs consist of a script "printnumber.py"
that will print 42 to stdout and sleep for 15 seconds. I hacked
together the following to submit these to the cluster:
========
#!/usr/bin/env python
import os
from xg import *
conn = Connection(hostname='xxx',password='xxx')
cont = Controller(conn)
g = cont.grid(0)
jm = JobManager(0,conn)
jobs = []
gencurr = 0
genmax = 1000
n = 30
while gencurr < genmax:
for i in range(0, n):
j = g.submit(cmd='./printnumber.py',indir='./work/')
jobs.append( int(j.jobID) )
for k in jobs:
jm.job(k).results()
gencurr =+ 1
========
When running this script I get a similar error (again, after a few
hundred submissions/retrievals):
========
....
Job stdout saved in file: xgridjob-17482.out
Job stdout saved in file: xgridjob-17483.out
Job stdout saved in file: xgridjob-17484.out
Python(18601,0xa00fb720) malloc: *** mmap(size=507904) failed (error
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error
========
Does anyone have an idea what could fill up the memory or how I could
proceed to resolve the problem?
Thanks
Beat
Some thoughts:
* There is definitely a memory leak or someone is holding onto
reference too long. You simply run out of memory...
* I have had some luck using http://code.google.com/p/pympler/ to
track down memory issues. Can be a bit subtle to use
but the devs are quite helpful on their list.
* If the memory leak is at the Obj-C level, running it with gdb could
help track down the leak.
> It has nothing to do with the cluster, if I simply rerun my script it
> will again run smoothly until the memory is filled up again.
>
> What I have looked into so far:
> - Removing the NSString extension altogether, but my xgrid CLI
> apparently still returns old style plists. (BTW: is this solved in
> Snow Leopard?)
> - PyObjC: I suspected some missing "release" in the extended NSString
> class. My (so far limited) understanding of Objective-C is that using
> NSString.stringWithString() should autorelease the memory after usage.
> And this is how it's done in PyXG. The PyObjC bridge should then take
> care of it.
> - JobSpecification dictionary: I suspected the JobSpecification
> dictionary to be getting too big. However, the 3 files that are sent
> to the agents (a C++ exe and two config files) are not more than 1 MB
> in total. Manual inspection of jobs didn't reveal anything unusual.
I am not too familiar with the subtleties of the PyObjC memory model,
the python mac mailling list
(where the pyobjc devs hang out) might have some ideas. But I do
think gdb would help track
the problem down.
Well, it definitely helps that you have this simplified version that
shows the problem.
I would probably check out pympler and gdb.
Cheers,
Brian
On Jan 13, 2010, at 3:44 AM, brupp wrote:
> - PyObjC: I suspected some missing "release" in the extended NSString
> class. My (so far limited) understanding of Objective-C is that using
> NSString.stringWithString() should autorelease the memory after usage.
> And this is how it's done in PyXG. The PyObjC bridge should then take
> care of it.
Did you create an autorelease pool with
NSAutoreleasePool.alloc().init() and clear it with each loop? PyObjc
will manage reference counts for you, but you still need to make the
autoreleasepool available.
See http://svn.red-bean.com/pyobjc/tags/pyobjc-1.3/Doc/api-notes-macosx.html#class-nsautoreleasepool
.
Cheers
FR
---------------------------------------------
Francis Reyes M.Sc.
215 UCB
University of Colorado at Boulder
gpg --keyserver pgp.mit.edu --recv-keys 67BA8D5D
8AE2 F2F4 90F7 9640 28BC 686F 78FD 6669 67BA 8D5D
> Did you create an autorelease pool with
> NSAutoreleasePool.alloc().init() and clear it with each loop? PyObjc
> will manage reference counts for you, but you still need to make the
> autoreleasepool available.
>
> Seehttp://svn.red-bean.com/pyobjc/tags/pyobjc-1.3/Doc/api-notes-macosx.h...
Thanks for this idea, that may do the trick. If so, we should put the
pool in pyxg itself
so users don't have to worry about this.
Cheers,
Brian
Are you retreiving the results for all 0:i jobs on each iteration of
the while loop? If so, moving the 'for k in jobs:' loop outside the
while loop may help.
>
>
> When running this script I get a similar error (again, after a few
> hundred submissions/retrievals):
> ========
> ....
> Job stdout saved in file: xgridjob-17482.out
> Job stdout saved in file: xgridjob-17483.out
> Job stdout saved in file: xgridjob-17484.out
> Python(18601,0xa00fb720) malloc: *** mmap(size=507904) failed (error
> code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> Bus error
> ========
>
>
> Does anyone have an idea what could fill up the memory or how I could
> proceed to resolve the problem?
>
> Thanks
>
> Beat
>
> --
> You received this message because you are subscribed to the Google Groups "PyXG" group.
> To post to this group, send email to py...@googlegroups.com.
> To unsubscribe from this group, send email to pyxg+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pyxg?hl=en.
>
>
>
>
+1 Anytime we create (directly or indirectly) an ObjC object instance,
we need to have an NSAutoreleasePool in place. PyObjC will create one,
but it won't get drained very often in a console app like this. Better
to create our own around any memory intensive calls.
In this case, I'm not sure pyxg can really solve the problem unless we
put an autorelease pool around every method in the module (I've used a
decorator for this before):
from functools import wraps
def autorelease(func):
# email-code; not tested
@wraps(func)
def wrapped(*args, **kw):
pool = NSAutoreleasePool.alloc().init()
try:
func(*args, **kw)
finally:
pool.drain()
del pool
return wrapped
and then @autorelease all the methods in pyxg. The downside to this is
that autorelease pools aren't free; there is a setup cost. A bit of
profiling with and without the pool would tell us whether it's an
issue or not.
In this case, it may be better done on the client side (in Beat's
code) with an autorelease pool for each iteration of the while loop.
>
> Cheers,
>
> Brian
> .
>>
>> Cheers
>>
>> FR
>>
>> ---------------------------------------------
>> Francis Reyes M.Sc.
>> 215 UCB
>> University of Colorado at Boulder
>>
>> gpg --keyserver pgp.mit.edu --recv-keys 67BA8D5D
>>
>> 8AE2 F2F4 90F7 9640 28BC 686F 78FD 6669 67BA 8D5D
>
> +1 Anytime we create (directly or indirectly) an ObjC object instance,
> we need to have an NSAutoreleasePool in place. PyObjC will create one,
> but it won't get drained very often in a console app like this. Better
> to create our own around any memory intensive calls.
>
> In this case, I'm not sure pyxg can really solve the problem unless we
> put an autorelease pool around every method in the module (I've used a
> decorator for this before):
Very nice, I can see that this might be a great approach to this
issue. Do you mind
if you include this code in pyxg if it turns out to be useful?
> from functools import wraps
> def autorelease(func):
> # email-code; not tested
> @wraps(func)
> def wrapped(*args, **kw):
> pool = NSAutoreleasePool.alloc().init()
> try:
> func(*args, **kw)
> finally:
> pool.drain()
> del pool
>
> return wrapped
>
> and then @autorelease all the methods in pyxg. The downside to this is
> that autorelease pools aren't free; there is a setup cost. A bit of
> profiling with and without the pool would tell us whether it's an
> issue or not.
Well, I am not sure we use NSObjects in all methods, so we might be able to
get away with using this on only a small number of methods.
If performance is an issue we could do the following...
* Setup a global NSAutoreleasePool
* Use a global variable to track how many times pyxg methods have been called
that use the pool.
* Decorate all our methods and have the decorator drain the global pool every
N method calls.
That would get rid of the overhead of creating pools each method call
and of calling
drain too often. Do you think this would work?
> In this case, it may be better done on the client side (in Beat's
> code) with an autorelease pool for each iteration of the while loop.
Yes, that is also an option, but I think people will often forget to do this...
Brian
--
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu
elli...@gmail.com
Of course you/we can use it.
Yes, I think it would work. Seems like an overly complex solution
until we have evidence that the pool-per-method strategy is too slow.
I may have mis-represented my feelings earlier. I believe the
per-method strategy is fine. I was just putting down, for posterity,
the idea that if there are performance issues, we should revisit that
strategy.
>
>> In this case, it may be better done on the client side (in Beat's
>> code) with an autorelease pool for each iteration of the while loop.
>
> Yes, that is also an option, but I think people will often forget to do this...
True, true.
>
> Brian
>
> --
> Brian E. Granger, Ph.D.
> Assistant Professor of Physics
> Cal Poly State University, San Luis Obispo
> bgra...@calpoly.edu
> elli...@gmail.com
>
I thought so, but for better or worse, I have become more license
conscious over time :)
> Yes, I think it would work. Seems like an overly complex solution
> until we have evidence that the pool-per-method strategy is too slow.
> I may have mis-represented my feelings earlier. I believe the
> per-method strategy is fine. I was just putting down, for posterity,
> the idea that if there are performance issues, we should revisit that
> strategy.
Great, this sounds like a good plan. Hopefully that is the root of
the problem...
Cheers,
Brian
Thanks for all replies and suggestions so far. Actually, yes, I have
to retrieve the results with this extra loop. This is the reason I
included it in this simplified code as well.
In the meantime I found that I accidentally used PyObjC verion 2.0.3
instead of 2.2, but a quick test with the new version showed the same
problem.
I'll look at the other suggestions tomorrow, the NSAutoreleasePool
idea looks promising!
Best,
Beat
FYI, I decorated my function with the code that Barry provided and it
solves the issue. The process is happily running and I have a constant
memory usage now (not quantified, just looking at the numbers). Thank
you very much!
As for how to integrate it with pyxg (if, at all) I let you decide. At
least the issue is documented now :)
Cheers,
Beat
Great! Thanks to Barry for the decorator!
> As for how to integrate it with pyxg (if, at all) I let you decide. At
> least the issue is documented now :)
Barry said we can include the decorator in pyxg, so let's put the
source code of the
decorator in the source code (near the top where there are other utilities).
Cheers,
Brian
> Cheers,
>
> Beat