Best practices for concurrent datastore calls

374 views
Skip to first unread message

John Beckett

unread,
Jul 7, 2016, 4:04:12 AM7/7/16
to google-appengine-go
I realise that goroutines don't actually run concurrently on AppEngine as they are all on a single process.  However, using them can make use of the time where my code is waiting for the datastore to respond to a different request.

My goal here is to get about 100 entities from the datastore (in as concurrent a fashion as possible), and then do some simple processing on them.

Are there any best practices or suggestions on how best to do this?  Some points that I'm thinking about include which timeout value to use and how many goroutines to spin up at a time.

Derek Perkins

unread,
Jul 7, 2016, 11:28:56 AM7/7/16
to google-appengine-go
Just use the GetAll call and don't worry about coding goroutines separately. Not to be too pedantic about it, but things do run concurrently on App Engine, just not in parallel. https://blog.golang.org/concurrency-is-not-parallelism. Because of that, it's unlikely that using goroutines for your processing will increase performance, especially on the low CPU of the instances.

Ronoaldo José de Lana Pereira

unread,
Jul 7, 2016, 7:48:20 PM7/7/16
to Derek Perkins, google-appengine-go
In the case you described I second what Derek says: fetching 100 entities with moderate size is reasonable to run on GAE as a single operation, if you don't hit any limits: request size from data store, number of entities fetched or the datastore rpc timeout.

But, when the limits are not enough for a single operation, I usually try to visualize the problem as a set of steps: read, process, update. Usually, I run a query to fetch N entries in batches of size M, and for each entity fetched, launch a go routine to process it. Then, as they get processed, I send them to a channel to "buffer" them and batch put when buffer is size M. The pattern is something like this:

in := make(chan(MyEntity)) // MyEntity.ID is used to build Key; is set when fetching and used when putting.
out := make(chan(MyEntity))
wg := &sync.WorkGroup{}

wg.Add(3)
go run fetch(in, wg)
go run process(in, out, wg)
go run put(out, wg)
wg.Wait()

- fetch does GetAll in batches (so we don't hit dead-lines or limits), sending them to process via in; fetch closes in when the query is done.
- process is a for range over in that mutates/process the entity then sends them to out. You can process items here serially, then close out when done. You need more synchronization if you want to use one go routine for each entity.
- put, in turn, is a for loop reading from out, that appends to a slice and PutMulti when len(slice) is grater than 100, until chan is closed.

Working group is used to synchronize work, i.e., we wait until all entities were read, processed and saved; just add defer wg.Done() at the beginning of each function. Handling errors is a bit more complicated: I usually log them, but you may as well put an error channel that all use to aggregate the errors and handle them after wg.Wait().

The benefit of the pattern above is that your program will not block reading or writing to data store. In theory, this pattern allows you to process any size of items (N) in batches of size (M), concurrently. M is your upper bound for in-memory stuff (roughly).

This works well for small data sets (N between 1 and 10000), but depends on how much time you need to process each entry. I usually split a larger data set in ranges (basically, changing the query in fetch() func) then making multiple parallel operations as multiple task queues (fan-in). Fan-out is more complicated and use-case dependent. I mostly can come out with a single data store entity to control progress over multiple tasks by updating a "done" field or something.

Hope this helps!

--
You received this message because you are subscribed to the Google Groups "google-appengine-go" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengin...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Ronoaldo Pereira

John Beckett

unread,
Jul 7, 2016, 8:58:04 PM7/7/16
to Derek Perkins, google-appengine-go
While that is how I would do it were I simply getting based on keys, in this case, I have to get based on a query.  As far as I'm aware there is no way to batch query the datastore.

--
You received this message because you are subscribed to a topic in the Google Groups "google-appengine-go" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine-go/nAZvStUPkYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengin...@googlegroups.com.

John Beckett

unread,
Jul 7, 2016, 9:03:44 PM7/7/16
to Ronoaldo José de Lana Pereira, Derek Perkins, google-appengine-go
Thanks Ronoaldo, that is helpful.

Do you have any example code that you are able to share where you've implemented this process?

--
You received this message because you are subscribed to a topic in the Google Groups "google-appengine-go" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine-go/nAZvStUPkYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengin...@googlegroups.com.

Matthew Zimmerman

unread,
Jul 7, 2016, 10:45:43 PM7/7/16
to John Beckett, Ronoaldo José de Lana Pereira, Derek Perkins, google-appengine-go
See also, an oldie but a goodie, https://talks.golang.org/2013/highperf.slide#22

Derek Perkins

unread,
Jul 8, 2016, 9:50:38 PM7/8/16
to google-appengine-go, de...@derekperkins.com
While that is how I would do it were I simply getting based on keys, in this case, I have to get based on a query.  As far as I'm aware there is no way to batch query the datastore.

Are you expecting to get all 100 from a single query? The datastore package buffers / batches behind the scenes for you, so it's not the equivalent of running a 'Get' per record. You could still use the same style that Ronoaldo mentioned. 

I would personally avoid that until you have profiled your application and determined that this specific query is a hotspot throttling performance. It's much more complex and easy to get wrong, while not necessarily speeding up the process.

Ronoaldo Pereira

unread,
Aug 11, 2016, 10:30:41 AM8/11/16
to google-appengine-go, rono...@gmail.com, de...@derekperkins.com
Hi John, sorry for the late reply. I can't share the code because this one is bound to a private product at my company. I have to build a POC with this to serve as a sample for a class I'm writting, and when I can I'll share it in this thread.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine-go+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Ronoaldo Pereira

--
You received this message because you are subscribed to a topic in the Google Groups "google-appengine-go" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine-go/nAZvStUPkYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine-go+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages