Memory leaks with simple NDB application

Nikolaj

unread,

Jan 13, 2013, 2:05:16 PM1/13/13

to appengine-ndb-discuss

Hi all,

I have been experimenting with task handlers that page through a
number of entities and run into some severe memory problems.

I've included my test application below:
- Setup handler to insert a number of entities with ~25k of data each
- Basic task handler that initiates a request with a longer deadline
to total the length of data in all of the entities
- Page task handler that accomplishes the same using cursors and paged
queries.

There is a constant called BATCH_SIZE which can be modified to test
various scenarios. In order to stay within the instance memory limits
I am regularly clearing the NDB context cache.

However, the results are not great:

BasicTaskHandler:
- BATCH_SIZE = 25, Success, instance memory usage grows to 100MB and
remains after request.
- BATCH_SIZE = 50, Success but memory usage grows to 143MB and
instance is killed after request due to high memory.
- BATCH_SIZE = 100, Failure due to high memory usage.

PageTaskHandler:
- Failure due to memory usage on all batch sizes.

It seems crazy to think that the application can only count 25
entities at a time without failing (I'm aware there is a great deal of
overhead, but theoretically 25 * 25k = 625k). There also seems to be
some problem with leaking memory between requests.

I would appreciate any insight from the list.

Thanks,

Nikolaj

from google.appengine.api import taskqueue
from google.appengine.ext import webapp, ndb
from google.appengine.ext.webapp import util

import logging

BATCH_SIZE = 25

class Entity(ndb.Model):
text = ndb.TextProperty()

class SetupHandler(webapp.RequestHandler):
def get(self):
for n in xrange(100):
entities = [Entity(text='a' * 25000) for n in xrange(100)]
ndb.put_multi(entities)

ndb.get_context().clear_cache()

class BasicTaskHandler(webapp.RequestHandler):
def get(self):
self.queue()

def post(self):
if self.request.headers.get('X-AppEngine-TaskRetryCount',
'0') != '0':
logging.warning("Preventing retry.")
return

total = 0

entity_query = Entity.query()

for entity in entity_query.iter(batch_size=BATCH_SIZE):
total += len(entity.text)

ndb.get_context().clear_cache()

logging.info("Total: %d" % total)

@classmethod
def queue(cls):
task = taskqueue.Task(url='/task/basic')
taskqueue.Queue().add(task)

class PageTaskHandler(webapp.RequestHandler):
def get(self):
self.queue()

def post(self):
if self.request.headers.get('X-AppEngine-TaskRetryCount',
'0') != '0':
logging.warning("Preventing retry.")
return

total = 0

entity_query = Entity.query()
cursor = ndb.Cursor()

while True:
entities, cursor, more =
entity_query.fetch_page(BATCH_SIZE, start_cursor=cursor)

for entity in entities:
total += len(entity.text)

ndb.get_context().clear_cache()

if not more:
break

logging.info("Total: %d" % total)

@classmethod
def queue(cls):
task = taskqueue.Task(url='/task/page')
taskqueue.Queue().add(task)

app = webapp.WSGIApplication([
('/setup', SetupHandler),
('/task/basic', BasicTaskHandler),
('/task/page', PageTaskHandler)
], debug=True)

def main():
util.run_wsgi_app(app)

if __name__ == '__main__':
main()

Prateek Malhotra

unread,

Jan 14, 2013, 7:01:55 PM1/14/13

to appengine-ndb-discuss

Have you tried using use_cache=False? This will disable NDB from caching entities in its local cache (in-memory cache) which should help prevent the build up of entities within memory. You can also disable memcache by setting use_memcache=False (use these arguments whenever you put/get/fetch)

Hope this helps!

-Prateek

Nikolaj

unread,

Jan 16, 2013, 5:40:56 AM1/16/13

to appengine-...@googlegroups.com

Using ndb.get_context().clear_cache() should free the same memory that disabling the NDB context cache prevents. Disabling the memcache should not make such a significant difference to the instance memory either.

My application will ultimately make use of the other beneficial behavior that the context provides so I think this works well as a basic test case.

I am trying to understand if there is a bug or I am doing something wrong.

Nikolaj

Nikolaj

unread,

Feb 3, 2013, 8:33:57 AM2/3/13

to appengine-...@googlegroups.com

I would really appreciate some insight from Guido if possible.

I feel like this is a serious issue that makes it very difficult to write programs to iterate through a meaningful number of entities. Clearly something is going wrong memory-wise and I've provided what I think is a pretty simple test case.

Thanks in advance!

Nikolaj

Jason Collins

unread,

Feb 4, 2013, 10:16:00 AM2/4/13

to appengine-...@googlegroups.com

I had worked on some memory tuning in the past and had a related discussion with Guido. Some snippets:

Me: You had previously asked why I was using so much memory. I still don't know. I measured my PB using the technique outlined in the Wade Holst post, and my average PB size was 3172 (bytes I assume) and I am dealing with sets of 500 of them. Doesn't seem unreasonable to me. Thoughts?

Guido: Large RAM: after decoding the serialized pb into a pb object and
then converting that into a Model instance, your memory use multiplies
rapidly, possibly by 10x - 100x.

Me: If PB deserialization uses 10-100x extra RAM, why not keep the PB around on the model instance to use when memcaching? Of course, you'd need to throw it away if/when the model was updated....

Guido: Assuming you are suggesting to keep the *serialized* PB around, the
reason is that our RPC machinery makes that impossible -- the PB has
already been deserialized into an object before we get it. In
addition, the serialized form I write to memcache is somewhat
optimized (it doesn't contain the key).

Me: My model instances measured out at around 10kB (versus 3kB for PB). So I'm still at a loss on the large memory usage in my example.

Guido: Hm, yeah, that's weird. 500 times that would amount to 5 MB. I suppose
you measured an ndb.Model subclass instance? Could you measure a pb
instance retrieved from ent._to_pb()? IIRC the pbs may end up much
bigger than the entities themselves, due to the way they are
structured. I am assuming you are using something like q.fetch(500) --
this will set the batch size to 500, and although that gets truncated
to 300, I think we'd still have at least this in memory
simultaneously: 300 entities, 300 pbs, 300 encoded pbs. You could
reduce this by calling e.g. q.fetch(500, batch_size=50) -- this
increases the number of batches, but decreases the number of pbs in
memory simultaneously. If the pbs were 10x the size of the entities
that might make a difference.

Me: In-memory representation of the PB (entity._to_pb()) is around 95kB. That would definitely chew up the missing RAM. Mystery solved, it seems.

Guido: Yes, I expected as much, and there's not much we can do about this --
you just have to reduce the complexity of your model... :-( Hopefully
my suggestion of reducing batch size will help. (You may have to set
prefetch_size as well, their interaction is a bit mysterious to me.)

Me: Fair enough. Something in there seems inherently inefficient though, no? Your next 20% project?

Guido: There are too many low-level abstraction layers involved to get
traction on this. The problem is that the conversion between
serialized and pb object happens at a very low level which is hard to
even get access to.

PS, is your dev_appserver Python binary running in 32-bit or 64-bit
mode? Production is 64-bit mode and that could increase the size even
more... (all pointers and "long int" fields double in size -- meaning
that a short string object essentially doubles in size...)

Me: Based on my unscientific numbers, the bigger win might be avoiding the double-serialization for the memcache interaction by keeping the encoded PB around (and thus costing some memory and the "no-encoded-key-because-its-already-in-the-memcache-key" optimization).

Guido: Alas, I don't have access to the serialized PB. Even the low-level datastore_rpc.py and datastore_query.py don't have access to it. They only ever see the PB *object* -- which is the thing that is ~100KB in size in your case, so you definitely don't want to hang on to that. The serialized PB is never even made available to Python code -- the low-level RPC machinery receives the bytes in a C++ buffer and calls the C++ code to deserialize it into a PB object (representing a PutResponse object, which is just an array of EntityProto objects).

Nikolaj, what I find odd about your case is that your model is very simple (1 attribute); my experiments referred to above were using a more complicated model, in particular, with a repeated=True property.

j

Nikolaj

unread,

Feb 5, 2013, 5:55:25 AM2/5/13

to appengine-...@googlegroups.com

Hi Jason,

Thanks so much for your detailed response.

It seems like there is more to this story. There are no problems if the handler simply fetches a few hundred entities and returns, so the problem cannot be that the batch size is keeping too many entities around.

It suggests either that entities are not being cleared from memory or that more entities are fetched than the batch size specifies. It is also curious that the test case using ndb.Query.fetch_page fails at any batch size.

Can you suggest a way for me to gather more relevant or precise measurements? I appreciate some of my analysis is quite unscientific.

Nikolaj

Jason Collins

unread,

Feb 6, 2013, 10:40:44 PM2/6/13

to appengine-...@googlegroups.com

Well Nikolaj, it's not much help, but I did repro your results. I tried, map(), iter() and fetch_page() all using BATCH_SIE=100 and prefetch_size=0. All of them blew up a 128MB F1 front-end.

map() approach

===========

total = [0]

def sumchars(entity):

total[0] += len(entity.text)

entity_query = Entity.query()

entity_query.map(sumchars, batch_size=BATCH_SIZE, prefetch_size=0)

logging.info("Total: %d" % total[0])

iter() approach

==========

total = 0

entity_query = Entity.query()

for entity in entity_query.iter(batch_size=BATCH_SIZE, prefetch_size=0):

total += len(entity.text)

logging.info("Total: %d" % total)

fetch_page() approach

================

total = 0

entity_query = Entity.query()

cursor = ndb.Cursor()

while True:

entities, cursor, more = entity_query.fetch_page(BATCH_SIZE, prefetch_size=0, start_cursor=cursor)

for entity in entities:

total += len(entity.text)

if not more:

break

logging.info("Total: %d" % total)

All three of these approaches exceed the 128MB limit.

I decided to give it a whirl using the legacy ext.db approach with run(). This succeeded; it was able to compute the total number of characters.

ext.db's run() approach

=================

total = 0

entity_query = Entity.all()

for entity in entity_query.run(batch_size=BATCH_SIZE):

total += len(entity.text)

logging.info("Total: %d" % total)

Again, this latter approach succeeds. So it would appear that something in the ext.ndb code itself is using the extra RAM.

BTW, the instance cache should not be a factor here because as I understand it, the instance cache does not get involved with query results at all.

I might have some time to poke around the NDB code some more...

j

Nikolaj

unread,

Feb 7, 2013, 4:50:05 AM2/7/13

to appengine-...@googlegroups.com

Jason, this is a huge help! I think your legacy result removes some suspicion from the question of the serialized PB's size.

Interestingly, the NDB handlers (except for fetch_page) can actually succeed with a smaller BATCH_SIZE.

Finally, I found that the query results do enter the context cache. I originally saw that Guido recommended to clear the cache for longer running query iterations in another post to the group.

Nikolaj

Nikolaj

unread,

Mar 6, 2013, 10:02:07 AM3/6/13

to appengine-...@googlegroups.com

Hey Jason,

I was wondering if you have any suggestions for locating the source of the problem.

I am happy to spend the hours debugging and tracing the memory leak but I need a strategy and some idea of how to measure the usage. I am also wondering if the problem can be reproduced locally on the SDK (even if the high memory doesn't cause the request to abort).

Many thanks,

Nikolaj

On Thursday, February 7, 2013 3:40:44 AM UTC, Jason Collins wrote:

Jason Collins

unread,

Mar 6, 2013, 2:17:05 PM3/6/13

to appengine-...@googlegroups.com

Hi Nikolaj,

I had taken a look through the NDB code to see if I might see where the issue would lie, but nothing jumped out. Sorry, I don't have any immediate suggestions as to how to locate the issue.

j

Robert Kluin

unread,

Mar 27, 2013, 2:02:56 PM3/27/13

to appengine-...@googlegroups.com

Are you guys able to reproduce this locally or only on appspot?

Robert

> --
> You received this message because you are subscribed to the Google Groups
> "appengine-ndb-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to appengine-ndb-di...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Nikolaj

unread,

Mar 28, 2013, 3:45:09 PM3/28/13

to appengine-...@googlegroups.com

Hi Robert,

I was thinking of experimenting to see if it can be reproduced locally but I haven't tried this yet. You would have to monitor the usage carefully against a control (e.g. ext.db) because the SDK doesn't abort the request due to memory.

The lack of interest in such a critical issue is really surprising to me. It makes it difficult to write applications when you can't page through medium-sized sets of data without crashing the instance.

Nikolaj

Guido van Rossum

unread,

Mar 28, 2013, 3:49:27 PM3/28/13

to appengine-...@googlegroups.com

FYI. I'm not sure that this list is monitored by anyone on the Google App Engine team.

--
You received this message because you are subscribed to the Google Groups "appengine-ndb-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to appengine-ndb-di...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
--Guido van Rossum (python.org/~guido)

Jason Collins

unread,

Mar 28, 2013, 4:28:57 PM3/28/13

to appengine-...@googlegroups.com

Robert - it was a long time ago that I had done my tests, but I think they only manifested on appspot.

j

Robert Kluin

unread,

Apr 1, 2013, 4:24:59 PM4/1/13

to appengine-...@googlegroups.com

I have seen this frequently as well. I usually manually work through it by aggressively deleting variables and making explicit calls to gc. That seems to let me work through larger datasets, perhaps up to 1 million entities depending on your models.

In the past, I have not seen it locally, but I probably don't have large enough datasets. If it happens locally, maybe we could try using memory profilers, but that is really, really tedious.

Robert

Joachim Krebs

unread,

Jul 2, 2013, 6:34:16 AM7/2/13

to appengine-...@googlegroups.com

I have reproduced the problem (and found some interesting additional errors).

I have also posted the issue to the tracker again. Please star the issue so we can get a resolution: