Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Algorithm for automatic cache invalidation
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  18 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jakub Łopuszański  
View profile  
 More options Apr 27 2012, 6:14 pm
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Fri, 27 Apr 2012 15:14:49 -0700 (PDT)
Local: Fri, Apr 27 2012 6:14 pm
Subject: Algorithm for automatic cache invalidation

*There are only two hard things in Computer Science: cache invalidation and
naming things.
-- Phil Karlton

Hi, I’d like to share with you an algorithm for cache invalidation that
I’ve came up with, and successfully implemented in a real world
application. I am a software architect at nk.pl, a Polish social network
successfully competing with FB, and a founder and backend architect of
Vanisoft, a startup in which I had more freedom to experiment with a
different architecture.

Perhaps my idea is not original, but I could not find a description of
similar concept anywhere, so even if this is reinventing the wheel, maybe
it will help propagate the knowledge and save somebody else’s time. When I
was designing the backend for Vanisoft’s project ZnaszMnie (currently over
400k players) all I could find in the internet about ways to cache database
queries results was either wrong (such as no cache invalidation at all), or
ineffective (trying to keep of a list of all cached results and
invalidating them one by one).

I assume you have more than one front-end machines, some databases and
cache which are reachable from frontends. Perhaps you have local cache on
each front-end machine, which is really fast, but inaccessible from other
machines (an thus subject to cache invalidation problems). If so, then my
approach can correctly provide cached answer to a query by using one
round-trip to global cache, and one to local (the cost of later is usually
neglible), and never serve stale data (and the last part is the most
important for me).

I assume, that even if your database is not SQL, you could think about your
queries as expressible in SQL, using SELECT, UPDATE, or DELETE. This is
needed only to understand the presentation of the algorithm, which could be
easily adapted to many other cases.

To keep things simple let us consider a binary relation (a table with two
columns, or a set of points in 2-dimensional space, if you prefer). The
algorithm works for any number of dimensions, but it is easier to grasp for
2-D case.

So let’s think about about a table `songs` with columns “song_id” and
“author_id”, where obviously some songs may have several authors and
vice-versa. Again, for simplicity assume, that these are integers, while
the algorithm works for any (serializable) data type.

For each query, I’d like to introduce a concept of “subspace” on which it
operates. If you choose the interpretation of binary relation as a set of
points in 2-dimensional space, then subspace, is what you get by fixing
some of dimensions.
Some examples -- a list of queries with their subspaces:
INSERT INTO songs (song_id, author_id) VALUES (1,2)  ::  [song_id=1,
author_id=2]
SELECT COUNT(*) FROM songs WHERE author_id = 7 ::  [author=7]
DELETE FROM songs WHERE song_id = 13 :: [song_id=13]
DELETE FROM songs :: []

Updates are a bit more complicated, and I’d prefer to think about them as a
DELETE followed by INSERT, for example:
UPDATE songs SET author_id=13 WHERE author_id=7 AND song_id=3
can be thought of as :
DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3,
author_id=7]
INSERT INTO songs (song_id, author_id) VALUES (3,13)   :: [song_id=3,
author_id=13]
I know this is not equivalent, and may cause some race conditions, but I
believe it does not really matter in what follows.

As you can see in our example, a subspace can be, 0, 1 or 2-dimensional,
which depends on the scope on which the query operates. If we are unsure
about the actual scope, we need to upperbound it for my algorithm to work.
For example :
SELECT COUNT(*) FROM songs WHERE song_id < 13 :: []
here we could not say much about rows which are important for this query,
so we upperbound it with the whole space. Please note how subspace has
nothing to do with the columns returned by select, but rather by the rows
scanned during it.

In general, the subspace is determined by all equality constraints found in
the query. This is such a trivial correspondence, that you can (and should)
let your backend library deduce it automatically from the query string. Or
if you use some object oriented query builder, you should get it for free.

For 2-dimensional space, there are 4 kinds of subspaces :
- the whole space, given by no constraints at all : []
- a point, given by two constraints, for example [song_id=3, author_id=7]
- a vertical, or horizontal line, given by one constraint, for example
[song_id=3], or [author_id=7]

If we have two queries A and B, such that A is a write (DELETE or INSERT),
and B is a read (SELECT), then a cached result for B should be invalidated
after query A, if subspace of A intersects subspace of B. (Actually this is
a sufficient condition, not necessary).
For example a query
DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3,
author_id=7]
affects results of
SELECT COUNT(*) FROM songs WHERE author_id = 7 ::  [author=7]
and you can tell it just by comparing [author=7] with [song_id=3,
author_id=7], without looking at the queries. Of course this is some safe
approximation : sometimes cache invalidation is not necessary, but
performing it will ensure the correctness of algorithm.

Now, notice, that there may be infinitely many queries, which may need to
be invalidated, as there are many subspaces that intersect with a given
one. For example a query with subspace [author=7] must invalidate all
queries with subspace [song_id=1], as well as [song_id=2], etc... Moreover
there are infinitely many possible queries with the same subspace.
Frameworks that try to keep track of all intersecting queries are doomed to
either store everything in a large, permanent storage (which kind of
defeats the purpose of in-memory cache to me), or eventually forget about a
query which needs to be invalidated (which is an error to me).

The solution is much simpler though, and does not involve operating on
infinitely many cache keys... only exponentially many. But, don’t worry, it
is exponential in the number of dimensions, not the size of dataset. For
example for a binary relation we need to invalidate 2^2 = 4 keys. If you
have a table with many columns, please don’t be alarmed neither -- you just
need to choose a few columns on which you’d like to focus, and crop the
subspace to those few columns -- you will invalidate thinks more often than
necessary, but will not get hit by exponential blow. I’ve never experienced
a case with more than 3 relevant columns, and 2^3 = 8 is not a big number.

We need some shorter notation for subspaces. Let us write them as
2-dimensional vectors (similar to rows in the table). Instead of
[song_id=3,author_id=7], let us write (3,7), and for [song_id=3], we will
use (3,*). Whole subspace is (*,*), and [author_id=4] is just (*,4).

Now the tricky part. Let us introduce a new wildcard symbol “?”, which will
mean something along lines of “all subspaces which have a number here”. For
example (?,3) is a shorthand for the infinite set of subspaces
{(0,3),(1,3),(2,3),...}. There is no particular query that has subspace
(?,3). This artificial notation corresponds to all queries which have fixed
second dimension.

I assume that your global cache (such as memcache) supports atomic
increment operation, which is very convenient for the task of maintaining a
“revision number”, which we will associate with each subspace (artificial
or not). When I say “with each”, I don’t mean buying infinitely large
computer -- we will just allocate memory for subspaces actually used in
production, and those unused (or rarely used, if you have LRU garbage
collector) will simply take no memory at all.
A short note about evictions : if get or increment results in a miss,
simply set the revision to current timestamp.

I also assume that you know the idea of generational caching, that is,
using a key name with appended revision to store cached data (such as
“get_songs_count.123”). This way a single operation of incrementing a
revision, can invalidate all data stored at key names which depended on it.
This seems like a waste of memory, but LRU garbage collector can take care
about leftovers. The nice thing about it, is that while a single integer
(the revision number) must be stored in a global cache, the potentially
large query result can be stored in local cache, from where it can be
quickly retrieved without any bottlenecks of global cache. This is actually
the only way (except for using silly small TTLs) for invalidating data
stored in local cache that I know and use.

Now the algorithm. Or actually two algorithms, one for read-only queries,
and one for writes.

When you perform a read query A, first determine its subspace. I assume you
have d columns, so the subspace can be expressed as a d-arry vector of
numbers and stars, as explained above. For example a query “SELECT song_id
FROM songs WHERE author_id = 3” corresponds to a vector (*,3).
Then compute at most all possible variations of this d-arry vector, by
using a single substitution rule : you can put “?” in place of a number.
For example, if your original vector was (*,3), then it results in two
vectors: (*,?) and the original. If on the other hand, your query had
subspace (7,3), then you would end up with four variations:
(7,3),(?,3),(7,?),(?,?).
Then, using a bulk (multi)get, fetch from the global cache revisions for
those (at most 2^d) subspaces.
Concatenate revisions into one big revision (for example 123.3.114.14).
Then fetch from the local cache the cached result of the query, which
should reside at the key which is a concatenation of the query string (or
hash of it if you prefer) and the big concatenated revision.
If there is a miss in the local cache, try in ...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Artur Ejsmont  
View profile  
 More options Apr 27 2012, 7:44 pm
From: Artur Ejsmont <ejsmont.ar...@gmail.com>
Date: Sat, 28 Apr 2012 09:44:25 +1000
Local: Fri, Apr 27 2012 7:44 pm
Subject: Re: Algorithm for automatic cache invalidation

This post was just AWESOME!

Many thanks for time spent and so much detail.

Gratulacje, na prawde doskonala robota.

Thanks

Art
On Apr 28, 2012 8:14 AM, "Jakub Łopuszański" <qbo...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Apr 27 2012, 11:33 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Fri, 27 Apr 2012 20:33:15 -0700 (PDT)
Local: Fri, Apr 27 2012 11:33 pm
Subject: Re: Algorithm for automatic cache invalidation

Please correct me if I am wrong, but your description WRT to subspaces (and
an implementation thereof) seems to be the equivalent to the following:

1. Whenever you cache a row with column values (c1, c2, c3, ...) from table
X, add a references to indexes for each of the columns of the data you are
caching (there is an index for X.c1, X.c2, etc., as well as table Y with
columns c10, c11... as Y.c10, Y.c11, ...)
2. When you need to invalidate your cache for a given class of rows (you
use song_id=3, author_id=7), you can look up those cached entries in your
indexes, find the intersecting subset, and discard those cache entries.

Is it novel? I don't know, I'm not terribly well versed in the realm of
cache invalidation algorithms. Though elements of your method (in
particular his use of '*' and '?') are reminiscent of the Linda language's
tuple spaces, and how they are thought of as just vectors to match patterns
against and execute upon.

Hilariously enough, by thinking of it in terms of tuple spaces (and having
one available), you can get a distributed cache manager for free!

Regards,
 - Josiah

P.S. I'm not a regular contributor to memcached, but your post was just
linked from the Redis list, and I'd thought I'd come over to say hello.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jakub Łopuszański  
View profile  
 More options May 6 2012, 3:58 pm
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Sun, 6 May 2012 12:58:23 -0700 (PDT)
Local: Sun, May 6 2012 3:58 pm
Subject: Re: Algorithm for automatic cache invalidation

W dniu sobota, 28 kwietnia 2012 05:33:15 UTC+2 użytkownik Josiah Carlson
napisał:

> Please correct me if I am wrong, but your description WRT to subspaces
> (and an implementation thereof) seems to be the equivalent to the following:

> 1. Whenever you cache a row with column values (c1, c2, c3, ...) from
> table X, add a references to indexes for each of the columns of the data
> you are caching (there is an index for X.c1, X.c2, etc., as well as table Y
> with columns c10, c11... as Y.c10, Y.c11, ...)
> 2. When you need to invalidate your cache for a given class of rows (you
> use song_id=3, author_id=7), you can look up those cached entries in your
> indexes, find the intersecting subset, and discard those cache entries.

I'm not sure if I understand your proposition, but it sounds like you are
only caching individual rows, and not arbitrary queries results. That makes
sense if in your application you always SELECT single rows by specifying
equality constraints for all columns. But, even then, what my algorithm
would do, is quite different from your description.
You mention "indexes", but I am not sure if I understand correctly, what
kind of indexes they are. From your description it seems that you expect
them to be very fast for both updates and selects (perhaps something quite
trivial to achieve with Redis, but not so obvious how to achieve this with
MySQL, or memcached). Anyway, if you just mean something that can answer a
query "find all rows which match song_id=3 and author_id=7", then it might
end up being either:
a) slow, but reliable and persistent, sql-like index (which doesn't seem to
improve performance of anything)
b) fast, but unreliable, ram-only list
Perhaps I am missing something, but having something which is both reliable
and fast, you would actually solve the original problem of providing fast
answers to any query (as a two stage process: find matching rows ids, fetch
their content). If you can do that quickly, then you're done.

I think that my solution is trying to address a different problem: how to
cache more complicated queries then single row retrieval in the presence of
a queries modifying multiple rows at once.

> Is it novel? I don't know, I'm not terribly well versed in the realm of
> cache invalidation algorithms. Though elements of your method (in
> particular his use of '*' and '?') are reminiscent of the Linda language's
> tuple spaces, and how they are thought of as just vectors to match patterns
> against and execute upon.

Thanks for hint about Linda language.

> Hilariously enough, by thinking of it in terms of tuple spaces (and having
> one available), you can get a distributed cache manager for free!

I'm not sure if I understand this. Could you elaborate?

> Regards,
>  - Josiah

> P.S. I'm not a regular contributor to memcached, but your post was just
> linked from the Redis list, and I'd thought I'd come over to say hello.

neither am I.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options May 8 2012, 2:02 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Tue, 8 May 2012 11:02:21 -0700 (PDT)
Local: Tues, May 8 2012 2:02 pm
Subject: Re: Algorithm for automatic cache invalidation

Different yes, but potentially functionally equivalent. The method that I
described can easily be applied to arbitrary select queries as long as row
ids are returned with the other row data. I didn't go into it because I
believed that step was obvious, as that kind of thing is already done by
any decent RDBMS. But to connect the dots:

Assume that you select some subset of rows X using query Y. You need to
have some way of fetching those cached rows via the query itself, so
presumably you would store a reference to X via some derivation of Y,
something like (using a simple Python hash syntax, though obviously storing
it in memcached or some other server would be different):

cache[sha256(Y).digest()] = X

After storing that reference in your cache, you would also store references
to all of the ids in X using the indexes that I described.

> You mention "indexes", but I am not sure if I understand correctly, what
> kind of indexes they are. From your description it seems that you expect
> them to be very fast for both updates and selects (perhaps something quite
> trivial to achieve with Redis, but not so obvious how to achieve this with
> MySQL, or memcached).

I mean something that is the rough equivalent of a 2-level hash with sets.
In Python syntax for the structure itself:

INDEX = {
    'table_name': {
        14: set(['ref_25', 'ref_87', 'ref13'])
    }

}

The only two operations that are necessary are:
1. Add a reference to row A in table B to cache value C
2. Invalidate the cache entries that reference row ids V from table W

Anyway, if you just mean something that can answer a query "find all rows

> which match song_id=3 and author_id=7", then it might end up being either:
> a) slow, but reliable and persistent, sql-like index (which doesn't seem
> to improve performance of anything)
> b) fast, but unreliable, ram-only list
> Perhaps I am missing something, but having something which is both
> reliable and fast, you would actually solve the original problem of
> providing fast answers to any query (as a two stage process: find matching
> rows ids, fetch their content). If you can do that quickly, then you're
> done.

That's not the intent, but there is also a very fast in-memory SQL-speaking
database with persistence called AlchemyDB (which is built on top of
Redis). I know, it's a bit blasphemous to mix SQL and noSQL, but the
author's execution is quite impressive.

I think that my solution is trying to address a different problem: how to

> cache more complicated queries then single row retrieval in the presence of
> a queries modifying multiple rows at once.

The complicated part of your scheme is how to invalidate the data. Keeping
references to ids in the tables (as I describe) lets you invalidate data by
chasing down references in the simple index for invalidation. It can also
be trivially expanded to offer a variety of other referential options.

Is it novel? I don't know, I'm not terribly well versed in the realm of

The idea of tuple spaces is to have a region of memory that stores tuples.
This memory can be on a single machine, or distributed across a disparate
cluster of machines. The only limitation is the semantics of the space
itself.

The basic semantic API for the space is:
1. Insert a tuple
2. Find me a tuple that matches the pattern Z (which removes the tuple)

The latter of the two can be asynchronous, that is, it can wait for a given
time period for that matching tuple. Some implementations require that you
pass both the pattern to match the tuple as well as the code that will
execute on that tuple.

There are some minor mental gymnastics required to fit a caching scheme
into a tuple space. And for a cache, obviously the first thing you would do
upon matching a tuple for a read is to reinsert that tuple in the space.
Actual implementations of tuple spaces end up being roughly equivalent to a
distributed RPC server, or even a distributed task queue server (which is
why I stopped working on the one I was writing in 2004), but they are
useful abstractions for thinking of other ideas, and the algorithms for
matching tuples efficiently are good exercises.

Regards,
 - Josiah

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jakub Łopuszański  
View profile  
 More options May 11 2012, 1:12 am
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Thu, 10 May 2012 22:12:43 -0700 (PDT)
Local: Fri, May 11 2012 1:12 am
Subject: Re: Algorithm for automatic cache invalidation

So you suggest :
1. to build a biderectional graph, where the first direction (which you
called "cache") maps a query to the rows, and the second one (which you
called "INDEX['table_name']") maps a row to queries which depend on it.
2. keep the first part in a fast cache
3. keep the second part in a reliable database

What bugs me is: since this is a symetric graph, one should expect that the
size of "index" will be at least as large as size of "cache", which means
that you will need a lot (potentially infinitely many) of storage to keep
track of all queries that the system generated at any point in time. You
would need some routine that would check what queries where evicted from
cache and thus can be also removed from index, to keep the size of index
bounded.

Moreover I do not see how exactly would that work in case of queries which
modify many rows.
It seems to me that you would have to lock, select these rows, find all
(potentially infinitely many) queries depednent on these rows in index,
invalidate them, delete rows, release the lock.

Sounds pretty ineffective and as I mentioned in the begining of my post --
I saw such systems and wanted something better.

W dniu wtorek, 8 maja 2012 20:02:21 UTC+2 użytkownik Josiah Carlson napisał:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options May 11 2012, 4:02 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Fri, 11 May 2012 01:02:52 -0700 (PDT)
Local: Fri, May 11 2012 4:02 am
Subject: Re: Algorithm for automatic cache invalidation

I believe we are talking past each other.

When you are performing a write query against rows matching (author_id=7,
song_id=3), presumably you would get a group of subspaces
(7,3),(*,3),(7,*),(*,*). The latter subspace doesn't really help you at
all, as it merely references all of your cached data involving
author_id,song_id pairs. But here's the thing: the indexes that I described
will give you the equivalent of your (*,3) and (7,*) subspaces, the
intersection of which is exactly (7,3).

So, as long as you index your columns independently, you can invalidate all
of the necessary cache entries. It doesn't require invalidating infinitely
many rows (unless you've got a magic machine that has infinite memory, and
is caching infinitely many rows from an infinite number of queries), and it
also doesn't require a huge explosion of space. Worst-case, 2x.

But maybe I'm misreading what you are claiming to have built, as discussion
about subspaces, bipartite matching, claimed infitities, really just
results in excluding 95%+ of engineers/programmers from the conversation (I
only got into it because I saw parallels with other technology I've seen).
That said, I doubt Facebook or Twitter implemented something that they
described like you are describing, and I'd bet a beer that Google didn't.
Not because they couldn't, maybe because they didn't think of it, but
probably because they looked at the problem in a different way*.

If you truly believe that you have created something that is unique, I
would urge you to release it. If it really is a game changer, people will
pay you to use it.

Regards,
 - Josiah

* Relational database queries were made fast by never performing joins,
almost always using covering indexes, heavy sharding, and many read slaves
(think X read slaves for each of the Y shards, where X and Y were
surprisingly large). Combine all of this with huge id -> row caches for
entire databases, as much as possible pre-calculated via mapreduce, and the
use of bigtable whenever possible (no joins, queries only via composite
primary key index, data distributed to 3+ machines so you got results from
the fastest of them at any time), and you get high performance without the
apparent level of complexity that you seem to be dealing with.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jakub Łopuszański  
View profile  
 More options May 11 2012, 9:00 am
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Fri, 11 May 2012 06:00:45 -0700 (PDT)
Local: Fri, May 11 2012 9:00 am
Subject: Re: Algorithm for automatic cache invalidation

So let me present you an example so you could present your idea on a real
world examples.
Suppose you have a web page, where you can view all songs of a given
author, sorted by song_id, with pagination size set to 10, 20, or 50
results per page or any other value (say 42 results) if they wish.
Therefore a database queries needed are something like:
SELECT song_id FROM songs WHERE author_id = @A ORDER BY song_id ASC LIMIT
@L OFFSET @O;
Since a user can use many values (10,20,50, etc.) for @O, and many possible
offsets @L (actually any natural number) you will get a huge number of
different queries all referencing the same row with author_id=7, song_id=3.
Let say there are like 100 different queries which relate to this row.
Here are some 2 examples of such queries:
SELECT song_id FROM songs WHERE author_id = 7 ORDER BY song_id ASC LIMIT 2
OFFSET 8;
SELECT song_id FROM songs WHERE author_id = 7 ORDER BY song_id ASC LIMIT 2
OFFSET 9;
You suggested to compute a digest for each such query actually perfromed
and store them in a cache along with a value.
Then for each hash, and each row in the result set you want (if I
understand) to save the reference from that row to that hash in a separate
persistent data structure.
At this point you have a persistent data structure with 100 more entries
than the original database.

Now, if you remove the row author_id=7 and song_id=3 from the database (or
add it, or modify), you have to fetch these 100 hashes from the index, and
invalidate them.
This is a correct solution, but not as effective as mine, as it must
perform 100 operations, and requires 100 times more storage than necessary.
But that's not the end of the problems.
Suppose you have a query like this:
DELETE FROM songs WHERE author_id = 7.
Suppose that there are 1000 songs by this author.
In order to perform a proper invalidation you need to first look up all
rows affected by this query, and then, for each such row you need to find
all hashes related to these rows. Of course given a reasonable index
structure you do not need a JOIN for that, but rather something like
"SELECT DISTINCT(hash) FROM index WHERE author_id = 7", however this may
still require scaning 100*1000 entries in the index, as each of 1000
deleted songs could be referenced by 100 queries.

Did I understand you correctly?

And, by the way, in nk.pl we have around 500 database servers, and I know
what sharding is. Therefore a solution which does not require a central
"index" larger than the sharded database itself, such us mine, makes more
sense to me.

W dniu piątek, 11 maja 2012 10:02:52 UTC+2 użytkownik Josiah Carlson
napisał:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options May 11 2012, 12:13 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Fri, 11 May 2012 09:13:41 -0700 (PDT)
Local: Fri, May 11 2012 12:13 pm
Subject: Re: Algorithm for automatic cache invalidation

Okay! Now this is perfect. A driving use-case with an example. Now I can
see where you are coming from, and I can also see how this would be an
issue for you. Before I get into the point-by-point, first a little story.

A man goes to the doctor and says, "It hurts when I push here." The doctor
says, "So don't push there."

End of story. This will become relevant momentarily.

On Friday, May 11, 2012 6:00:45 AM UTC-7, Jakub Łopuszański wrote:

> So let me present you an example so you could present your idea on a real
> world examples.
> Suppose you have a web page, where you can view all songs of a given
> author, sorted by song_id, with pagination size set to 10, 20, or 50
> results per page or any other value (say 42 results) if they wish.

Rule # 3 in building scalable systems: there is no customized query. If you
need to paginate on variable-sized pages, you create larger pages, then
sub-paginate.

Case in point: When Google produces search results for a query, they aren't
making a query for the 10-entry page 1 for the first load, followed by
another query for the 10-entry page 2 when you click on that link, etc. No.
They make a query for the first X documents up front (historically I
believe they made X somewhere in the range of 100-1000). They then cache
that, and serve sub-results out of the cache for some modest duration.

This happens at YouTube, Facebook, Twitter, and any other site that scales.

Therefore a database queries needed are something like:

If this was my problem, I wouldn't allow arbitrary offsets. I'd do as rule
#3 says and pre-calculate blocks of whatever would cover 80/95/99% of
queries (depending on how desperate I was).

I'd also be building a covering index, putting enough memory in my DB
servers, and letting the DB cache that index. You can't get much more
optimal for a B-Tree index than that query. You have an index on
(author_id, song_id). In the index, you match the first column exactly, and
order by the second. Assuming that index locking is rare (or at least
optimized), your DB should be able to handle 5-10k requests out of that
index every second.

Alternatively, this is also a use-case where Redis shines. This data would
fit perfectly into what is known as a ZSET (a sorted set). You can think of
it like a hash table with keys and values (we call them members and scores,
respectively), where you can reference objects by keys, and you can
reference objects by their sorted order which is defined over the tuple
(value, key). I would set value = author_id, and key = packed
representation of song_id concatenated with the rowid. Then to make queries
against that index, I would simply ZRANGEBYSCORE <indexkey> <author_id>
<author_id> WITHSCORES LIMIT <offset> <count> . On my little 2.4 ghz dev
box, I can move almost 25k requests/second against that query sequentially
(one returns before the other is performed), and in the range of
50-60k/second from multiple clients. If I had a more modern processor (and
more than 2 cores), I could move 75k-100k/second, and I can expect to get
2/3-3/4 that performance if I ran replicas of Redis on the same machine
tied to cores other than core 0.

To make that Redis cache be viable, I would do as I've done and recommended
for over 2 years now: In whatever language runtime you are using, implement
a post-commit hook that induces a caching task over any rows with
inserts/updates/deletes. With Postgres allowing for a variety of languages
to be built-in, you could even implement your post-commit hook inside of
your database, letting it update your remote caches in real time (you also
get row-level locking and other nice things). No need for cache
invalidation, as your cache is your index, and your index is kept fresh.

So, long story short: I wouldn't use the system that you propose (or the
system that I proposed as a variant) to cache that query. I'd use Redis.
And if I was limited to *not* use Redis, I'd use covering indexes + block
caches, keep statistics on what queries were common, and pre-cache those
queries with rolling cache invalidation. Alternatively, I'd at least give
Riak and their secondary indexes a shot, which is supposed to let you do
the same kinds of queries across your cluster of machines, while also
coming close to scaling linearly with every new box.

Now, if you remove the row author_id=7 and song_id=3 from the database (or

Yes, but I was operating under the assumption that the types of queries you
were caching were pre-sanitized from the perspective of wanting to scale.
If you want to scale that system to be 10x or 100x bigger (as the world is
somewhat larger than Poland), you can't even think about caching queries
that make requests for 2 rows at arbitrary offsets from an index. You cache
bigger blocks, or you distribute your index.

And, by the way, in nk.pl we have around 500 database servers, and I know

> what sharding is. Therefore a solution which does not require a central
> "index" larger than the sharded database itself, such us mine, makes more
> sense to me.

That makes sense everywhere. But this particular driving use-case isn't
generally applicable for the Facebooks, Twitters, Googles, YouTubes, Ebays,
Amazons, etc., of the world. Why? Because they follow the rulebook on
scaling (which explicitly says *not* to cache those tiny queries).

Regards,
 - Josiah

W dniu piątek, 11 maja 2012 10:02:52 UTC+2 użytkownik Josiah Carlson

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jakub Łopuszański  
View profile  
 More options May 11 2012, 12:52 pm
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Fri, 11 May 2012 09:52:21 -0700 (PDT)
Local: Fri, May 11 2012 12:52 pm
Subject: Re: Algorithm for automatic cache invalidation

Hm. I was expecting you to explain how your algorithm would work for this
particular example. The example (if you couldn't tell from stupid column
names and use case) was fictional for the ease of presentation.
Instead of offsets, and limits, you could think on many other different
parameters of a query such as:
- constraints on other columns  (for example AND  type =HEAVY_METAL)
- different sorting (for example SORT song_id DESC)
- different columns in result (for example SELECT song_id, type)
- different aggregation in result (for example SELECT COUNT(*) )

If you claim that each row of database can be returned only by O(1)
different queries in your application, then I agree that your solution
makes some sense for your application.

Unfortunately, this might not be the case for other people, so if you plan
to build a framework, that would be a stupid assumption to make.
Sadly, this is an assumption found in several ORMs that I've seen, and now
(after reading your posts) I understand why this frameworks are still in
use...

But even if there are only O(1) queries for each row which return it, you
still have to deal somehow with the problem of perpendicular queries.
Allow me another simplistic example with three columns: x,y,z.
Now, suppose the queries to the system are always of the following form:
SELECT * FROM space WHERE x = @X AND y=@Y
Clearly, each row of this space can be returned by at most one query. So in
your terms "index" for each row will contain reference to only one query
(the one with @X and @Y equal to his).
Now, imagine a query:
DELETE FROM space WHERE z = 17.
Now, you need to invalidate all queries from this plane. There might be
millions of it. Sure, you can argue, that this invalidation can be
amortized by the cost of caching earlier. But will that explanation really
bother the end user, which will have to wait for several seconds?

P.S. it hurts me when I ask for explanation on a simplistic example, and
what I get is a rant on how stupid the example is...

P.S.2. Poland is small (>40 mln people), but nk.pl has like 12 mln active
users monthly (similar to facebook), and 4 billion pageviews daily. As one
of architects, I think I know how to keep this great social network working
fast, and wanted to share an idea on how to remove the limits imposed by
traditional way of thinking about how queries should look like. I really
believe that end user experience should not be limited by some _artificial_
constraints from the backend architecture. I believe that the assumption
that there should be a fixed number of possible queries is artificial.

W dniu piątek, 11 maja 2012 18:13:41 UTC+2 użytkownik Josiah Carlson
napisał:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Perrin Harkins  
View profile  
 More options May 11 2012, 1:57 pm
From: Perrin Harkins <per...@elem.com>
Date: Fri, 11 May 2012 13:57:58 -0400
Local: Fri, May 11 2012 1:57 pm
Subject: Re: Algorithm for automatic cache invalidation

On Fri, Apr 27, 2012 at 6:14 PM, Jakub Łopuszański <qbo...@gmail.com> wrote:
> Hi, I’d like to share with you an algorithm for cache invalidation that I’ve
> came up with, and successfully implemented in a real world application.

This may be a silly question, but have you benchmarked your cached
application against just going straight to the database?  I've always
had the impression that keeping a perfect cache of a database that
beats it in performance was not possible because the overhead of cache
invalidation (both in the cache and in the application) would ruin the
performance gains on reads.  Caches usually beat database performance
by sacrificing accuracy (e.g. allowing race conditions and non-ACID
behavior) and freshness of data.

To put it another way, if it was possible to have an up-to-date cache
that outperforms an RDBMS with the same data, wouldn't the makers of
that RDBMS simply build that cache into their product?

- Perrin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options May 11 2012, 2:19 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Fri, 11 May 2012 11:19:20 -0700
Local: Fri, May 11 2012 2:19 pm
Subject: Re: Algorithm for automatic cache invalidation

On Fri, May 11, 2012 at 9:52 AM, Jakub Łopuszański <qbo...@gmail.com> wrote:
> Hm. I was expecting you to explain how your algorithm would work for this
> particular example. The example (if you couldn't tell from stupid column
> names and use case) was fictional for the ease of presentation.
> Instead of offsets, and limits, you could think on many other different
> parameters of a query such as:
> - constraints on other columns  (for example AND  type =HEAVY_METAL)
> - different sorting (for example SORT song_id DESC)
> - different columns in result (for example SELECT song_id, type)
> - different aggregation in result (for example SELECT COUNT(*) )

None of these exclude any of the alternate methods of caching that I
describe. In fact, every one of them can still be served out of a
proper covering index, including queries like "SELECT song_id,
count(*) ... GROUP BY song_id" or "SELECT sum(listen_count)". The
complexity and sophistication of modern database indexes cannot be
overstated. When used properly, they are borderline miraculous when
combined with a good query parser and processor.

> If you claim that each row of database can be returned only by O(1)
> different queries in your application, then I agree that your solution makes
> some sense for your application.

I make no such claim. Further, I could even construct an adversary for
my system such that every cached query result references a particular
row, and updating that row could invalidate the entirety of my cache
(incidentally, I could do the same for the system that you described).

Thankfully, reality isn't an adversary, and we have the ability to
choose the queries, what we cache, how we cache, etc. (the "how" is
what we are discussing here).

> Unfortunately, this might not be the case for other people, so if you plan
> to build a framework, that would be a stupid assumption to make.
> Sadly, this is an assumption found in several ORMs that I've seen, and now
> (after reading your posts) I understand why this frameworks are still in
> use...

So are you saying that I'm making stupid assumptions?

I claim that my assumptions about database indexes, queries, and
caching behavior, are quite sane. When building a system initially,
you make different decisions than the system you make at scale. You've
been going through growing pains in scaling your system, and your
choices and thoughts about what your system needed to support drove a
sequence of decisions that lead you to your subspace caching.

If you were to sit down today with zero code, but had half an hour to
dig through the features and functionality of your site, could get
factor-of-2 approximation of any number you needed, and were tasked
with 95% feature parity; I can just about guarantee that you would
build a different system. You would build more generic solutions in
advance, as you would see common patterns now that you didn't see when
you were building it the first time. Those other caching methods I
described in my last email? Those were designed, built, and made
standard by people who looked at their systems and asked, "If I were
building from scratch today, what would I do differently?"

It could be that your caching system offers features and functionality
that allow it to do XYZ that other caching systems are unable to do.
But for a systems architect, the better question is always, "Do I need
it to do XYZ, or can it do ABC instead?" As engineers, architects,
programmers, developers, etc., we do not work in a vacuum. Every
solution we choose is based on what we have available, and what we
think we can build in the time we have. Until/unless you release your
system, it is no more useful than a flying car: great in theory, but
not available, so not a solution.

I know you are aware of everything that I've described WRT indexes,
caching, etc. I mentioned those things not for your benefit (except to
remind you that the examples you've chosen don't require your system),
but for the benefit of the dozens/hundreds/thousands of future
engineers/architects that are looking for solutions to their
cache/scaling issues, but don't ask questions (a small vocal minority
of people with questions actually ask them, the rest search for
answers to questions that others have asked). They are going to wander
into this thread, see your description, and say "wow, I should build
that." But that is *exactly* the wrong answer! They shouldn't build
it! They should be using one of the standard practices for caching
that I described, or if you've released your software, they should see
if it offers them what they are looking for.

> But even if there are only O(1) queries for each row which return it, you
> still have to deal somehow with the problem of perpendicular queries.
> Allow me another simplistic example with three columns: x,y,z.
> Now, suppose the queries to the system are always of the following form:
> SELECT * FROM space WHERE x = @X AND y=@Y
> Clearly, each row of this space can be returned by at most one query. So in
> your terms "index" for each row will contain reference to only one query
> (the one with @X and @Y equal to his).
> Now, imagine a query:
> DELETE FROM space WHERE z = 17.
> Now, you need to invalidate all queries from this plane. There might be
> millions of it. Sure, you can argue, that this invalidation can be amortized
> by the cost of caching earlier. But will that explanation really bother the
> end user, which will have to wait for several seconds?

It wouldn't be viable in that situation if you were doing immediate
invalidation, but the same goes for your system. If you are caching
millions of queries, and those millions of queries must be
invalidated, there is no free lunch. Either you invalidate now (paying
the price), or you use one of the variety of ways to lazily invalidate
your cache.

Incidentally, I also wouldn't be caching millions of queries like
this. This is again an example where a covering index solves the
problem correctly and gracefully, and finding a way of keeping a
distributed and consistent covering index for this table will solve
more problems than your caching system.

> P.S. it hurts me when I ask for explanation on a simplistic example, and
> what I get is a rant on how stupid the example is...

I never said that the example was stupid. I just said that the example
provided had alternate solutions that were at least as good, if not
better, than your proposed solution. These alternate solutions are
known to scale, are not surprising, and are supported by dozens of
existing software packages.

You proposed that your solution to the problem of database query
caching was unique (in that you hadn't seen anything else like it),
and further that it was better. In order for it to be legitimately
better for database query caching (or database caching in general), it
will be competing against the current crop of standard practices
(which are standard because they work). If you are not okay with
people like me saying "why are you doing X with that system, Y would
work better at scale" for the examples you that you bring up, then
bring up a real example.

> P.S.2. Poland is small (>40 mln people), but nk.pl has like 12 mln active
> users monthly (similar to facebook), and 4 billion pageviews daily. As one
> of architects, I think I know how to keep this great social network working
> fast, and wanted to share an idea on how to remove the limits imposed by
> traditional way of thinking about how queries should look like. I really
> believe that end user experience should not be limited by some _artificial_
> constraints from the backend architecture. I believe that the assumption
> that there should be a fixed number of possible queries is artificial.

You brought up your idea. Why? Presumably to get feedback, opinions,
ideas to make it work better, maybe even to get responses like "This
post was just AWESOME!" But here's the thing: ideas are easy,
execution is hard. You say that you have built this system, and it is
amazing. I believe that you have built it. But right now, over email?
You can't actually prove anything about the usefulness of your system,
especially not with toy driving use-cases and examples.

On the other hand, if you release it, people who are even smarter than
me will actually take a look at it, and will examine it based on its
features, merits, and drawbacks in certain use-cases. And if it is
better, people will use it. But right now, it's material to you, but
an idea to everyone else. And ideas are cheap.

Regards,
 - Josiah

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "REMOVE ME" by Erik Seaberg
Erik Seaberg  
View profile  
 More options May 11 2012, 4:58 pm
From: Erik Seaberg <eseab...@adbrite.com>
Date: Fri, 11 May 2012 13:58:32 -0700
Local: Fri, May 11 2012 4:58 pm
Subject: REMOVE ME

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Seaberg  
View profile  
 More options May 11 2012, 5:03 pm
From: Erik Seaberg <eseab...@adbrite.com>
Date: Fri, 11 May 2012 14:03:02 -0700
Local: Fri, May 11 2012 5:03 pm
Subject: Re: REMOVE ME
Sorry everyone, I was following the instructions in
http://support.google.com/groups/bin/answer.py?hl=en&answer=46608 and didn't
expect "REMOVE ME" to be broadcast to the group like an ordinary message.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Algorithm for automatic cache invalidation" by Jakub Łopuszański
Jakub Łopuszański  
View profile  
 More options May 22 2012, 6:08 pm
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Tue, 22 May 2012 15:08:07 -0700 (PDT)
Local: Tues, May 22 2012 6:08 pm
Subject: Re: Algorithm for automatic cache invalidation

Well this is already deployed to a production system and works just great.
The reason this is faster is that in some situations you can not easily
shard a database as some queries would cross shards boundaries. Of course
this means that a regular DB will not scale unless you change your queries.
On the other hand a cache is easier to distribute/shard/scale.
It is also easier (and in fact I do that on production) to have a memcache
runing on each frontend machine (while this would be a strange idea to put
shards of database on frontends which are usually diskless).
Of course one could argue, that you should just avoid complicated queries
and shard everything, but even then, the cost of communicating over network
between a frontend and mysql backend can be larger than a cost of a single
multiget to a memcached (even if the network latency is similar, I believe
that network stack of memcahed is one of the fastest).

In my particular example I have 6 frontend diskless machines, each running
apache2 and memached, and single database and single global memcached. Most
queries are resolved without any network communication at all, as they
result in local cache hit.

I think that developers of mysql have no way (or incentive) to build as
powerfull cache into a database, as no single database has not as much RAM,
not as many network cards and not as many CPUs as a cloud of 55 memcaches
used in nk.pl

W dniu piątek, 11 maja 2012 19:57:58 UTC+2 użytkownik Perrin Harkins
napisał:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Spadim  
View profile  
 More options May 22 2012, 10:52 pm
From: Roberto Spadim <robe...@spadim.com.br>
Date: Tue, 22 May 2012 23:52:56 -0300
Local: Tues, May 22 2012 10:52 pm
Subject: Re: Algorithm for automatic cache invalidation
i readed some nosql features at mysql and mariadb (a mysql trunk) they
are starting some nosql/cache solutions, could be nice check it
i know that oracle company is a 'problem'/'solution' to mysql open
source version, check others trunks, mariadb is a nice trunk and is
more 'opensource' with some interesting features
the memcache + db solution is nice, but i think in futures a more
unified solution will exists just wait or develop it =)

2012/5/22 Jakub Łopuszański <qbo...@gmail.com>:

--
Roberto Spadim
Spadim Technology / SPAEmpresarial

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jakub Łopuszański  
View profile  
 More options Feb 6, 6:32 am
From: Jakub Łopuszański <qbo...@gmail.com>
Date: Wed, 6 Feb 2013 03:32:37 -0800 (PST)
Local: Wed, Feb 6 2013 6:32 am
Subject: Re: Algorithm for automatic cache invalidation

As I really want this ideas to be well understood and wide spread I
prepared:
- a paper <http://vanisoft.pl/~lopuszanski/public/cache_invalidation.pdf>with nice ilustrations and proofs of correctness,
- a presentation<https://docs.google.com/presentation/d/15--B_I64Or54mXcakEpT8KkpAGfYl...>with more context and examples from everyday life of a programmer
- the source code<https://github.com/qbolec/PHP-Framework/tree/master/php/framework/rel...>of a working implementation used in "Znawcy
Futbolu" <http://nk.pl/aplikacje/znawcyfutbolu> and "Znasz Mnie?"<http://nk.pl/aplikacje/znaszmnie>, the
applications by Vanisoft

Hate it or ignore it, but at least do not say it is impractical, nor that I
did not try to explain it:)

W dniu sobota, 28 kwietnia 2012 00:14:49 UTC+2 użytkownik Jakub Łopuszański
napisał:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Artur Ejsmont  
View profile  
 More options Feb 6, 7:56 am
From: Artur Ejsmont <ejsmont.ar...@gmail.com>
Date: Wed, 6 Feb 2013 23:56:04 +1100
Local: Wed, Feb 6 2013 7:56 am
Subject: Re: Algorithm for automatic cache invalidation

Awesome !

Thanks a million , will read the paper right after my holiday.

Thanks a lot for sharing.

Art
On Feb 6, 2013 6:32 PM, "Jakub Łopuszański" <qbo...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »