Loading data into automatic scaled instance memory

76 views
Skip to first unread message

John Smith

unread,
Nov 5, 2015, 7:42:49 AM11/5/15
to Google App Engine
Each instance of my application module (default) needs to load data from the datastore into local memory (for performance/cost reasons), where it will be read-only.
A separate module updates the data in the datastore (updater), and the data needs to be refreshed (entirely) in each default instance every several minutes (preferably when the data is updated by the updater module).
Loading the data into instance memory takes more time than is reasonable for a single user request.

I would have used a thread (goroutine) with a background context to reload the data in each (default) instance, but background context is not supported in automatically scaled modules.

How should I update each (default) instance's memory when the data is updated by the updater module?

I am using the go runtime environment.

Nick (Cloud Platform Support)

unread,
Nov 9, 2015, 3:05:01 PM11/9/15
to Google App Engine
Typically, when operating a memory cache in this way, you will need some way for your cache layers to signal information to each other. Each caching solution out there, including one you might roll yourself in, say, some hash tables in memory, will have different semantics and methods.

One solution in this situation would be to use lazy loading, so that each request for a resource will check its cache validity via an in-memory flag, and serve from memory cache afterwards if valid. Requests notifying the instance memory of cache invalidation events will be quick to service by the instance when interleaved with user requests. Invalidation events will be communicated by a simple syntax, such as a list of keys to invalidate, or more complex, nested data structures, or data with meanings which your instance memory cache knows how to interpret in some other more specific, complex way. 

The problem with such a method is, especially in automatic scaling where instances are anonymous and not addressable, that each instance memory will need its own method of getting notified. Memcache itself can be used to coordinate memory caching regardless of instances, as a datacenter service common to all instances.

You could also set up a memcached box on Compute Engine.

Another method, as you identified, would involve running a coroutine which is able to poll for cache invalidation events and update the in-memory cache appropriately. The issue is, as you've noticed, in vanilla App Engine, is that background processing on a regular basis outside of the lifetime of a request is somewhat limited. You can check out func RunInBackground() for manual scaling modules.

To schedule regular "events" which would prompt the instances holding memory caches to themselves revalidate would be the cron service, although again since this is done via HTTP requests, to be handled by a given instance, that wouldn't be able to address cache invalidation event notifications to each instance in an automatic scaling scheme, only to the instance which caught the request. You could use basic scaling to overcome that limitation, providing a scalable yet finite and addressable pool of instances which can receive cache invalidation event notifications according to your custom specification, perhaps triggered by any one of them catching the cron request and notifying its siblings in turn, deciding to wait for a response / handle errors as needed.

Another final option is to use Managed VMs, which do allow full access to threading, process control, the filesystem, network interface, and will enable pretty much any pattern you can think of implementing. 

I hope this broad discussion on background processing and cache invalidation in the context of the Cloud Platform is helpful as an introduction, although be aware that there are possibly other patterns you could put together which have desirable properties along some axes while suffering limitations along some others.

Christian F. Howes

unread,
Nov 10, 2015, 5:34:34 PM11/10/15
to Google App Engine
can you slice up your data into 1mb chunks that you can load into the app engine memcache?  we do that a lot with our data and use expiration times to evict keys and trigger re-loads quite a lot in our system.

cfh
Reply all
Reply to author
Forward
0 new messages