goon: an NDB-like autocaching library

486 views
Skip to first unread message

Matt Jibson

unread,
Jan 20, 2013, 12:18:43 AM1/20/13
to google-ap...@googlegroups.com
I've been working on an autocaching, NDB-like library for Go. It's interesting enough to be shown to others now. There are a number of other similar libraries out there (gaelic, gaego/ds, cachestore), but none of them do all of the most useful things that NDB does:

* in-memory, request-scoped caching by key
* query result caching
* transactions in new contexts that memory-cache results
* intelligent multi get/put where only needed keys are fetched from memcache then datastore
* simple, NDB-like API

goon does all of these things and greatly simplifies using the datastore.

code: https://github.com/mjibson/goon

Vladimir Mihailenco

unread,
Jan 20, 2013, 8:58:46 AM1/20/13
to Matt Jibson, google-ap...@googlegroups.com
I define *every* model as following:

type MyModel struct {
    Key *ds.Key `datastore:"-"`
}

and such approach looks much simpler for me than goon.Entity, although requires reflect. Just a note.

Matt Jibson

unread,
Jan 26, 2013, 1:23:25 AM1/26/13
to ben...@gmail.com, google-ap...@googlegroups.com
No: making it a drop-in replacement is specifically not a goal. One of the features of goon is that the API is simplified a bit compared to the datastore API. I've listed some examples:



On Fri, Jan 25, 2013 at 7:56 PM, <ben...@gmail.com> wrote:
Thanks for making this! Any plans to make it  drop-in compatible with the datastore api (like cachestore) , so we can do, for example, goonstore.Put()? That would make it a lot easier to give it a try.

--



Jeff Huter

unread,
Jan 31, 2013, 12:06:43 PM1/31/13
to google-ap...@googlegroups.com, ben...@gmail.com
On Saturday, January 26, 2013 1:23:25 AM UTC-5, Matt Jibson wrote:
No: making it a drop-in replacement is specifically not a goal. One of the features of goon is that the API is simplified a bit compared to the datastore API. I've listed some examples:


I notice goon supports both in-memory and memcache caching.  Have you performed any benchmarks comparing performance between the two caching schemes?  gaelic currently only supports memcache caching, but it would be fairly easy to add in-memory caching as well.  I'm just not certain that the gain in performance is worth the trouble since both are "memory" caches and I'd expect (possibly wrongly) that performance would be similar.

Jeff

Matt Jibson

unread,
Jan 31, 2013, 5:29:54 PM1/31/13
to Jeff Huter, google-ap...@googlegroups.com, Hocine B
goon tries the per-request in-memory cache, then memcache, then datastore. It's not a choice between in-memory or memcache - it always uses both. This is the same (default) behavior as NDB. The benefit of the in-memory cache is the same key requested twice in one request will only fetch from memcache/datastore one time. Not huge, granted, but useful because you can stop caring about duplicating work in a request.



Jeff

--
You received this message because you are subscribed to the Google Groups "google-appengine-go" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengin...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

jaim...@gmail.com

unread,
Apr 17, 2014, 11:58:00 AM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B
Unless I am missing something... all the libraries mentioned in this thread seem to suffer from a potentially bad race condition.

I was researching an NDB equivalent for Go appengine, but none of these solutions seem to properly implement the thing that NDB actually does for you. That is, correctly solve the consistency problem between memcache and the datastore. There is a race between your cache invalidations/updates on Put() and cache populations on Get() that can result in stale values written to your memcache. None of these implementations are properly using the CAS functionality that memcache exposes to guard against this.

For a simple description of the problem see the psuedo code under "Handling memcache failures gracefully":

For an explanation of how NDB solves this problem, see:

I poked around in the source code for Goon, cachestore, and gaego/ds. All seem to do the incorrect thing.

If I am mistaken, please disregard my objections and do let me know. Otherwise, I am going to try and roll a separate solution that follows NDB a bit more closely.

Thanks,
-Jaime

Jeff Huter

unread,
Apr 17, 2014, 12:07:46 PM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B, jaim...@gmail.com
I need check out those articles.  But, gaelic "may" address this to some extent.  My usage model is to use the 'Lock' option when code is going to update a value and to EnforceLock when the update occurs.  EnforceLock essentially uses CAS to ensure the memcache value wasn't changed between read and update.

But, I'd be somewhat surprised if this truly addresses the race condition you mention.  Lock/EnforceLock were not designed to address the issue you mention, but to address the issue of concurrent updates clobbering each other.

Matthew Zimmerman

unread,
Apr 17, 2014, 12:14:20 PM4/17/14
to jaim...@gmail.com, google-appengine-go, Jeff Huter, Hocine B
I agree goon doesn't handle this. I looked into it a little and for
the benefits it gave me, I didn't find it worth the complexity and
speed hindrance with the time I had available.

That being said, please don't roll your own but submit patches. There
are already far too many solutions in this space in my opinion.
> --
> You received this message because you are subscribed to the Google Groups
> "google-appengine-go" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengin...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

jaim...@gmail.com

unread,
Apr 17, 2014, 12:20:01 PM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B, jaim...@gmail.com
I didn't poke around in gaelic's source. So I can't comment on it. But the details depend on the semantics of the cache. If you are doing a "cache on read, invalidate on update" cache, then you would need one scheme for locking. If you are using a "cache on read, update cache on update", then yes, part of the race is actually what you describe (concurrent puts).

You still have an issue however because of racing Get() and Put(). Which is why most schemes like NDB do cache on read only, since it avoids the racing Puts entirely. 

-Jaime

jaim...@gmail.com

unread,
Apr 17, 2014, 12:23:11 PM4/17/14
to google-ap...@googlegroups.com, jaim...@gmail.com, Jeff Huter, Hocine B
I wish I knew about Goon earlier. That being said, we have already written our own persistence layer and have a ton of data already in appengine in production that we probably won't be able to migrate.

Goon's custom struct fields would rule that out unfortunately since it would require migrating all of our entities :(.

I am happy to post back with some go code though once I solve this problem, which you should be able to adopt into your projects. FWIW my solution probably will more than likely not open sourced any time soon (if ever). So it won't be competing in this space :).

-Jaime

Jeff Huter

unread,
Apr 17, 2014, 12:36:43 PM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B, jaim...@gmail.com
It's a cache on read, invalidate on update schematic.  So, if I understand you, it "may" be addressing this issue correctly.  With that said, I do not recommend gaelic for general use.  It's one of the first go packages I wrote.  The API is not stable and needs a lot of work.  With that said, I'm using it on my low traffic hobby site and it seems to be working well there.  Redoing the API and releasing may be in the works at some point.  But, I keep hoping to find a replacement that I like so there's one less bit of code to maintain.

Jeff Huter

unread,
Apr 17, 2014, 2:36:10 PM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B, jaim...@gmail.com
I took a quick look at NDB.  Perhaps I'm not understanding something, but NDB does not appear to completely address the problem.  The NDB source code actually lists many TODO items regarding puts.  It locks the cache entry on gets.  But, makes no checks on put.  Thus, there appears to be nothing in NDB to prevent concurrent puts from overwriting each other with the last to 'put' winning the race.

This is backwards from my use case.  Providing clients with stale data (views) was not a problem.  But, I needed to ensure only one client was able to successfully update the entity.  I use the memcache to provide an 'Undo' feature.  The client with the 'lock' may make several updates to the cached entity without committing the changes to datastore.   Once the client is done making the multi-step changes to the entity in memcache.  It can commit the changes to the datastore, which also removes the entity from the memcache.  If the client decides to 'Undo', the entity is simple removed from memcache and pulled from the datastore.


On Thursday, April 17, 2014 11:58:00 AM UTC-4, jaim...@gmail.com wrote:

Jaime Yap

unread,
Apr 17, 2014, 2:47:09 PM4/17/14
to Jeff Huter, Hocine B, google-ap...@googlegroups.com, Jeff Huter

Forgot to reply to the wider list. So forgive the double send :).
...

On my phone so I can't type up a more in depth reply. But racing puts are part and parcel of the appengine datastore.

This is what transactions are for.

The caching layer does nothing but invalidate (not mutate, but invalidate) on put. As such, it is not the job of NDB to deal with racing puts. NDB ensures the read side cache gets populated when reading from the datastore

Stale reads are bad because... Well ... If you only invalidate on put. If you are read heavy without writes then you will always kick back stale data. Memcaches whole purpose is to make reads faster. A caching layer that returns stale data is IMHO broken.

NDB deals with making reads fast while keeping them correct.

Jeff Huter

unread,
Apr 17, 2014, 3:11:18 PM4/17/14
to google-ap...@googlegroups.com, Jeff Huter, Hocine B, Jeff Huter, jaim...@gmail.com
Transactions don't fix the racing puts problem though.

Client A & B may both use transactions to concurrently put the same entity.  Both may succeed.  The datastore now has which ever happened to finish last.

gaelic's Lock/EnforceLock logic should cause one of those to fail.  The failing client may then pull the entity from the datastore and determine whether it still wants to update the entity.

Jaime Yap

unread,
Apr 17, 2014, 3:38:36 PM4/17/14
to Jeff Huter, google-ap...@googlegroups.com, Hocine B, Jeff Huter
Transactions exist specifically to address racing writers. It's the whole point of it :).

Transactions in appengine allows you to atomically fail on write if the entity was modified since you last read it. You can specify a retry count to have it auto-retry up to some number of times. If it exhausts the retry, the transaction, and all operations done inside it, will fail atomically.

@Jeff if client A and client B both do something of the form (forgive my pseudo code):

start transaction:
  read entity. 
  modify entity. 
  put entity.

If A or B "race on put" it means that one of them will be putting an entity that is stale with respect to what they read last from the datastore. The whole point of a transaction is to make one of them fail and retry, which necessarily involves re-reading the entity (to un-stale it) before modifying and putting. 

"Last one wins" is still always the case, however transactions allow you to assert that the world still looks like what you think it should when you go to update it. This addresses racing puts.

The python docs do a better job of explaining this via some terse code samples:

-Jaime

Matt Jibson

unread,
Apr 17, 2014, 11:32:02 PM4/17/14
to jaim...@gmail.com, google-appengine-go, Jeff Huter, Hocine B
Can you give me an example of why goon's custom struct fields rule that out? They allow you to specify the kind, id, and parent. I understand needing to add the id and parent fields, but those aren't serialized to the datastore, so I'm not sure why you would need to migrate entities.


To unsubscribe from this group and stop receiving emails from it, send an email to google-appengin...@googlegroups.com.

Jaime Yap

unread,
Apr 18, 2014, 10:13:12 AM4/18/14
to Matt Jibson, Hocine B, google-ap...@googlegroups.com, Jeff Huter

I really don't have any familiarity with Goon other than poking through the source to check for the consistency enforcement mechanisms, and briefly looking at the usage via the doc. I had assumed incorrectly that it would be persisted with the struct in the datastore. It wasn't obvious from the brief glance I took :).

That being said, our persistence layer does a lot of stuff for us outside of just acting like a memory cache. Porting to Goon is probably not in the cards any time soon since the actual semantics of the caching layer intersects with our hook points for ensuring safe schema migrations. Which is another thing our persistence layer handles for us.

Jon

unread,
Apr 29, 2014, 3:39:27 PM4/29/14
to google-ap...@googlegroups.com, Matt Jibson, Hocine B, Jeff Huter, jaim...@gmail.com
The only unfixed consistency issue I can find with the NDB caching strategy is here https://code.google.com/p/appengine-ndb-experiment/issues/detail?id=84 and it is a minor one that seems highly unlikely to happen. There is also a proposed fix attached to the issue which I think will work. I see a few more closed issues on cache consistency that occurred before memcache CAS was available. Are there any other documented cases that anyone knows about?

I updated my datastore API (https://github.com/qedus/nds) to use the identical strategy that Python's NDB uses - including the above mentioned minor caching issue... The API doesn't do anything clever (other than the memory and memcaching) and has identical function signatures to appengine/datastore.

The basic unit tests work but there need to be several more until I am confident it is robust. I also want to fix the above mentioned NDB cache consistency issue with this library. Any input would be greatly appreciated.

> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages