Is Memcache really needed?

59 views
Skip to first unread message

Anders

unread,
Nov 17, 2008, 12:29:54 AM11/17/08
to Google App Engine
No doubt Memcache in GAE can improve speed performance, but isn't that
something that can be hidden from application developers? GAE could
automatically cache things like results for datastore queries in a for
application developers transparent way, i.e. I as an application
developer would then only have to write code for making say datastore
queries and not have to bother about caching and other system-level
performance issues which will be taken care of automatically by the
App Engine behind the scene.

Not only would the application code be cleaner. The caching and other
performance optimizations could be done 'optimally' by the Google App
Engine team who work directly with the server-side system-level code.

Anders

unread,
Nov 17, 2008, 1:05:33 AM11/17/08
to Google App Engine
And another thing: In a distributed system, must there really be any
functional difference between RAM and disk memory? Of course, disk
access is magnitudes slower than accessing RAM, but as I see it, at
least theoretically, in a distributed system RAM could be made to
function as a permanent storage.

Think of a single huge distributed virtual memory where data is stored
in a duplicated and fail-safe way. Then the disk space would only be
used for swapping memory pages to and from faster memory such as RAM
(and in the future maybe even other forms of fast memory). Only at the
very lowest level (Distributed OS 'kernel' level) would developers
need to know if some chunk of memory is physically stored on disk or
on some other form of memory.

Anders

unread,
Nov 17, 2008, 1:56:47 AM11/17/08
to Google App Engine
And how well does the Memcache scale? Is a sharded Memcache needed to
prevent a performance bottleneck when the load increases? Is it
similar to a sharded counter for datastore records: When the load is
low then using a single counter for datastore records works fine, but
when the load increases then the single counter becomes a bottleneck
when it is being hit by many simultaneous requests.

Or is the Memcache already sharded behind the scene, or in some other
way already made to scale well?

Roberto Saccon

unread,
Nov 17, 2008, 2:13:32 AM11/17/08
to Google App Engine
Interesting questions. I don't know anything (beside the API) about
it. But the amount of available memcache is limited. I want to have as
much as possible for me and I am willing to put some coding effort in
to get my slice. And I hope others are lazy and put in less effort or
none at all and leave a bigger slice of memcache for me, so I hope
memcache won't ever be done automatically.

Anything wrong with this way of thinking (beside of the moral
aspect) ?

regards
Roberto

Anders

unread,
Nov 17, 2008, 2:35:28 AM11/17/08
to Google App Engine
The documentation says: "...Memcache...is accessible by multiple
instances of your application"

I don't know if this means that each server instance has its own
Memcache or that all the instances access a single Memcache. In any
case, when the Memcache needs to be invalidated for a particular key,
then the Memcache needs to be invalidated for all instances of the
application. So the Memcache cannot be entirely separate for each
instance of the application.

I have seen many developers implement all kinds of clever caches which
have been like reinventing the wheel many times over, plus caching
often leads to nasty bugs in the code and I suspect that the
performance in many cases is in practice not significantly increased
or not increased at all. A lot of time, effort and resources spent to
create messy caching code that was never needed in the first place and
only makes the application much more difficult, costly and time-
consuming to develop, maintain and modify.

Greg

unread,
Nov 17, 2008, 3:13:26 AM11/17/08
to Google App Engine
> I don't know if this means that each server instance has its own
> Memcache or that all the instances access a single Memcache.

All instances access the "same" memcache - although this may be
distributed behind the scenes, I don't know about that.

I agree with you that it would be elegant to have automatic caching,
but that would impose some limitations - currently you can memcache
anything (including complex objects), but you can only store limited
data types in the datastore; and you can do (limited) querys on the
datastore but not on memcache.

Ideally we'd have a transparently cached queryable object store to
replace the datastore and memcache, but I guess this would be a
significant amount of development. Maybe Google should hire the Zope
guys to build it.

Cheers!
Greg.

Anders

unread,
Nov 17, 2008, 3:26:44 AM11/17/08
to Google App Engine
Yes, removing Memcache from the GAE API is probably not a good idea.
There must have been some reason why it was added.

What I'm skeptic about is if it will be a good idea for me to use it
or if I will only be doing something unnecessary or even worse,
shooting myself in the foot if the Memcache does not scale well and if
my site will start to get massive traffic (my application still has
very low traffic but I want to be prepared for the eventuality of an
exponential traffic increase :-).

Ben Nevile

unread,
Nov 17, 2008, 9:15:20 AM11/17/08
to Google App Engine
Hi Anders,

In my experience with standard RMDBS, memcache is more useful for
caching arbitrary data and less useful for caching model objects. If
you have a really tuned database layer a lot of your queries will
already be resident in the database's cache, and memcache provides
only a modest efficiency gain. With GAE that seems to be less true -
in one of my apps, caching my User model reduced response time by
about 150ms.

However, where I have found memcache most valuable is in caching more
arbitrary data constructs. Caching fragments of your rendered HTML,
for instance, can be really effective. Or perhaps there's a big chunk
of JSON that is often requested, but really only needs to be updated
once a minute. Removing the memcache API would be really really
unfortunate.

re: whether or not you should implement it, I would say NO. until you
have a whole whack of users, concentrate on more important things.

Ben

Joel Odom

unread,
Nov 17, 2008, 9:31:59 AM11/17/08
to google-a...@googlegroups.com
One cool application of memcache:

I have an RPC application that hits remote functions maybe once every few seconds per user.  Part of my security scheme involves matching user ids to session keys.  Hitting the data store for this on every call eats resources, but memcache brings a lot more efficiency.


Wooble

unread,
Nov 17, 2008, 9:41:41 AM11/17/08
to Google App Engine


On Nov 17, 3:26 am, Anders <i...@blabline.com> wrote:
> What I'm skeptic about is if it will be a good idea for me to use it
> or if I will only be doing something unnecessary or even worse,
> shooting myself in the foot if the Memcache does not scale well and if
> my site will start to get massive traffic

Memcache was designed to run Livejournal. I don't know what your
application is, but it seems a bit optimistic to think you'll have
scaling issues that they don't.

Anders

unread,
Nov 17, 2008, 10:29:41 AM11/17/08
to Google App Engine
Hi Ben,

Yes I agree, caching whole chunks of data that are computing intense
to render and that change only now and then would certainly be worth
caching. And that has to be done on the application level.

Anders

unread,
Nov 17, 2008, 10:37:50 AM11/17/08
to Google App Engine
Ok, Memcache is probably made to scale, but it could potentially be a
scaling problem if the Memcache becomes a performance bottleneck if it
is hit by many simultaneous requests. But to implement a sharded
Memcache seems likely a bit over the top. Otherwise Google would have
recommended that, just as they have recommended to use sharded
counters I guess.

Jon McAlister

unread,
Nov 17, 2008, 12:30:48 PM11/17/08
to Google App Engine
On Nov 17, 12:13 am, Greg <g.fawc...@gmail.com> wrote:
> > I don't know if this means that each server instance has its own
> > Memcache or that all the instances access a single Memcache.
>
> All instances access the "same" memcache - although this may be
> distributed behind the scenes, I don't know about that.

For any particular "key", all instances will talk to the same memcache
backend. Note that we can easily have different keys hosted on
different backends, though, thanks to the simplicity of the memcache
API (i.e. lack of transactions). This is how we can shard one app's
memcache data on to multiple machines.

Jon McAlister

unread,
Nov 17, 2008, 12:37:14 PM11/17/08
to Google App Engine
The limitations and design concerns you're talking about here are not
unique to App Engine. The tradeoffs between disk and memory, and the
tradeoffs between different kinds of cache (e.g. disk buffer cache, L1/
L2, memcache), are core computer science. What's the latency? The
throughput? The cache-hit rate? Does the kernel or does the app have a
better understanding of the data access patterns? If the operating
system was not able to solve these issues and remove them from the
mind-set of the developer, then it's unlikely that App Engine will :-P

In the case of automatic caching of the datastore, it's the case that
the application does have a better opportunity to cache than we could
from behind the scenes. The app knows better what is the tradeoff
between data staleness and end-user latency. The app knows how to keep
the cached data consistent with the true data. If we tried to do these
things within the datastore, since it is an API that is perfectly
consistent and guarantees that new writes immediately appear in query
results, we wouldn't be able to achieve a good cache-hit rate as every
datastore write would force us to invalidate every single query-cache
for an app. The app can do much better than we can for datastore
caching.

Jon

Anders

unread,
Nov 17, 2008, 12:52:13 PM11/17/08
to Google App Engine
On Nov 17, 6:30 pm, Jon McAlister <jon...@google.com> wrote:
>
> For any particular "key", all instances will talk to the same memcache
> backend. Note that we can easily have different keys hosted on
> different backends, though, thanks to the simplicity of the memcache
> API (i.e. lack of transactions). This is how we can shard one app's
> memcache data on to multiple machines.

Ah! That explains it. This also means that an idea I had about
sharding the Memcache will work. Let's say that we have a key named
'indexpage'. We can then shard the Memcache by adding an index, say
0..99 to the key, so instead of just accessing a single key
'indexpage' we can randomly access keys with the index added to it,
such as: 'indexpage_32', 'indexpage_7', 'indexpage_85' etc.

Barry Hunter

unread,
Nov 17, 2008, 2:06:49 PM11/17/08
to google-a...@googlegroups.com
Before you go and implement that, do you have any evidence that
memcache could be a bottle neck?

Otherwise it sounds like a case of possible premeture optimization.
Not withstanding the fact that as I understand memcache 'shards' by
hashing the key - but that leads no garenteers that your keys will end
up on seperate instances.

And that also leads to more work as you now have to generate your page
100 times (which if you using the cache right is probably expensive)

From my experience of Memcache (not on AppEngine) - its very quick at
dealing out the same result multiple times. And if memcache is truly
distributed on AppEngine - and it doesnt do it already, then there is
always the possiblility of edge caching really hot items (say on the
machine itself) - which your sharding would instantly make less
effective. (memcache - can itself be cached - which as Jon points out
the datastore cant)

I guess what trying to say is if at all possible you should leave the
'scaling' to the platform, its only where that is not possible (like
counters) that you should consider it yourself, (like you say in your
opening post!)



> >
>



--
Barry

- www.nearby.org.uk - www.geograph.org.uk -

Anders

unread,
Nov 17, 2008, 2:32:06 PM11/17/08
to Google App Engine
On Nov 17, 8:06 pm, "Barry Hunter" <barrybhun...@googlemail.com>
wrote:
No, I'm not even planning to use Memcache at all yet. I think caching
should only be done when actually needed otherwise it may as you say
very well be a premature implementation. And the sharded Memcache idea
was only for the hypothetical case of having truly massive traffic
hitting the same key in the Memcache. Like millions of users just
bombarding a single key in the Memcache at the same time. By sharding
the key itself then that could potentially solve a performance
bottleneck. When the key is sharded, then each key can be served by
different machines. When only one key is used then all requests are
served by the same Memcache backend, which hypothetically (maybe not
in practice) could become a performance bottleneck. But for any site
smaller than say YouTube :-) it will probably not increase performance
by sharding the Memcache, and for ordinary and even big loads sharding
the Memcache means that it will actually create less performance
because, as you pointed out, the page has to be generated 100 times
(or whatever the number of shards is) every time the Memcache needs to
be refreshed.

Anders

unread,
Nov 17, 2008, 4:36:36 PM11/17/08
to Google App Engine
Hmm... I looked at the dashboard for my application. It shows that a
datastore query in a frequently accessed page takes a lot of CPU time.
It's a simple query but it must be the query that takes most of the
CPU time because the rest of the code for that page is basically only
a few print statements. Maybe it could be a good idea to use Memcache
for that page.

Jon McAlister

unread,
Nov 24, 2008, 7:48:46 PM11/24/08
to Google App Engine
Correct, independent of the volume of one's app, it's nearly always
fruitful to use memcache in order to reduce end-user latency.

Jon
Reply all
Reply to author
Forward
0 new messages