Oh, bummer. Yeah, there's an RPC size limit, and when an entity gets bigger and bigger it creeps up such that it gets too big to update anymore.
There are a few more cases when migrations will appear to be stuck or run forever, but they tend to be symptoms of the same underlying issue:
- imbalanced namespaces. If you use 5000 namespaces, but 99% of your entities are in 1 namespace, the tool shards by namespace and most of the entities will be serviced by a single worker instance. If you write new entities to this namespace faster than the single worker copies them to the new application, your migration will run forever
- writing entities faster than the migration tool can map over keys. This is a typical case where if you are serving hundreds or thousands of queries per second (we call this QPS) and new entities are being written at that rate, the mapper cannot keep up. The mapper is what is responsible for sharding the entities into buckets so they can be copied in parallel
In most cases, the migrations while your app is still in read-write is still going, it's just taking a really long time and the ETA doesn't accurate reflect that because it does not take into account the incoming stream of new entities. If you're seeing this effect, when you do the initial copy, the best solution is to figure out a method by which you can slow the writes down so the migration tool far outpaces it on the map step. We've recently pushed an update that makes migrations run even faster, so hopefully you should be seeing these issues less frequently.
--
Ikai Lan
Developer Programs Engineer, Google App Engine