Is memcache add() atomic on a multithreaded memcached?

976 views
Skip to first unread message

elSchrom

unread,
Oct 13, 2010, 12:30:44 PM10/13/10
to memcached
Hi everyone,

we have the following situation: due to massive simultaneous inserts
in mysql on possibly identical primary keys, we use the atomic
memcache add() as a semaphore. In a few cases we observed the
behaviour, that two simultaneous add() using the same key from
different clients both returned true (due to consistent hashing the
key has to be on the same machine).

Is it now possible, that the multithreaded memcached does return true
on two concurrent add() on the same key, if the requests are handled
by two different threads on the same machine?

Any information on this would be appreciated.

Kind regard,

Jerome

dormando

unread,
Oct 13, 2010, 1:47:04 PM10/13/10
to memcached
> Hi everyone,
>
> we have the following situation: due to massive simultaneous inserts
> in mysql on possibly identical primary keys, we use the atomic
> memcache add() as a semaphore. In a few cases we observed the
> behaviour, that two simultaneous add() using the same key from
> different clients both returned true (due to consistent hashing the
> key has to be on the same machine).
>
> Is it now possible, that the multithreaded memcached does return true
> on two concurrent add() on the same key, if the requests are handled
> by two different threads on the same machine?

It should not be possible, no. Be sure you've disabled the client
"failover" code.

Adam Lee

unread,
Oct 13, 2010, 2:11:23 PM10/13/10
to memc...@googlegroups.com
Yeah, we also have used this as a sort of crude locking mechanism on a site under fairly heavy load and have never seen any sort of inconsistency-- as dormando said, I'd make sure your configuration is correct.  Debug and make sure that they're both indeed setting it on the same server.  Or, if that's not possible, whip up a small script that iterates through all of your servers and see if the key exists on multiple servers.
--
awl

moses wejuli

unread,
Oct 13, 2010, 6:18:29 PM10/13/10
to memc...@googlegroups.com
... or you couldd use a concatenation of ur server ID/timestamp/query/unique client variable(s)/session.... etc.. (all hashed) as part of your (hashed) key... there's countless ways to make ur key unique... even in ur situation!!!

elSchrom

unread,
Oct 14, 2010, 3:52:53 AM10/14/10
to memcached
Thx for your replies so far. Failover is deactivated in our
configuration. This can not be the reason. I think I have to write a
little bit more
about the circumstances:

our 50+ consistent hashing cluster is very reliable on normal
operations, incr/decr, get, set, multiget, etc. is not a problem. If
we have a problem with keys on wrong servers in the continuum, we
should have more problems, which we currently have not.
The cluster is always under relatively high load (the number of
connections for example is very high due to 160+ webservers in the
front). We are now expecting in a very few cases, that this
locking mechanism does not work. Two different clients try to lock the
with the same object (if you want to prevent multiple inserts in a
database on the same
primary key you have to explicitly set one key valid for all clients
and not a key with unique hashes in it), it works millions of times as
expected (we are generating a large number of user triggered database
inserts (~60/sec.)
with this construct). But a handful of locks does not work and shows
the behaviour described. So now my question is again: is it thinkable
(even if it is very implausible), that
a multithreaded memd does not provide 100% sure atomic add()?

Kind regards,

Jerome

dormando

unread,
Oct 14, 2010, 4:00:47 AM10/14/10
to memcached
> our 50+ consistent hashing cluster is very reliable on normal
> operations, incr/decr, get, set, multiget, etc. is not a problem. If
> we have a problem with keys on wrong servers in the continuum, we
> should have more problems, which we currently have not.
> The cluster is always under relatively high load (the number of
> connections for example is very high due to 160+ webservers in the
> front). We are now expecting in a very few cases, that this
> locking mechanism does not work. Two different clients try to lock the
> with the same object (if you want to prevent multiple inserts in a
> database on the same
> primary key you have to explicitly set one key valid for all clients
> and not a key with unique hashes in it), it works millions of times as
> expected (we are generating a large number of user triggered database
> inserts (~60/sec.)
> with this construct). But a handful of locks does not work and shows
> the behaviour described. So now my question is again: is it thinkable
> (even if it is very implausible), that
> a multithreaded memd does not provide 100% sure atomic add()?

restart memcached with -t 1 and see if it stops happening. I already said
it's not possible.

elSchrom

unread,
Oct 14, 2010, 4:26:01 AM10/14/10
to memcached
Yeah, right. :-) Restarting all memd instances is not an option. Can
you explain, why it is not possible?

dormando

unread,
Oct 14, 2010, 4:31:49 AM10/14/10
to memcached
>
> Yeah, right. :-) Restarting all memd instances is not an option. Can
> you explain, why it is not possible?

Because we've programmed the commands with the full intent to be atomic.
If it's not, there's a bug... there's an issue with incr/decr that's been
fixed upstream but we've never had a reported issue with add.

I'm not sure what you want to hear. "They're supposed to be atomic, yes."
- that much is in the wiki too.

Dieter Schmidt

unread,
Oct 14, 2010, 5:39:12 AM10/14/10
to memc...@googlegroups.com
For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum. As result you will have a second insert into your db.

What do you think? Possible?


elSchrom <jerom...@googlemail.com> schrieb:

elSchrom

unread,
Oct 14, 2010, 5:46:17 AM10/14/10
to memcached
I sure thought, that you designed memd to behave exactly the same with
1 or many threads and it's good to hear, that there is no pending bug
concerning
atomicity of add() on multiple threads. The reason why someone posts
such a think on the mailinglist is to hear, what the opinion of a dev
is who has all the insight. :-)
So please understand my obstinately behaviour.
We are planning to run some tests concerning this behaviour, maybe I
can provide more detail in the future. But it will be hard to find
proof for a bug in this scenario. For that we have to build
a test scenario, with multiple instances trying to make an add() on
the same key on the exact same time on a consistent hashing cluster.

elSchrom

unread,
Oct 14, 2010, 7:15:06 AM10/14/10
to memcached
Hi Diez,

On 14 Okt., 11:39, Dieter Schmidt <flatl...@stresstiming.de> wrote:
> For me it sounds like a configuration problem on the webservers or an availability/accessability issue.
> If for example all machines are accessable the locking key resides on maschine x. If one of the servers webservers differers in cfg it can happen that the key is added a second time as new somewhere else in the continuum. As result you will have a second insert into your db.
>
> What do you think? Possible?


Possible for sure, but this should produce more problems like massive
redundant cached items, because some clients have a different type of
continuum. This is most likely not happening. The current failure rate
is smaller 0,0001% and they appear on different frontend-servers. It
feels like a very unlikely thing is happening here due to a massive
number of used add(), with a very rare number of failures.

>
> elSchrom <jerome.p...@googlemail.com> schrieb:

Dieter Schmidt

unread,
Oct 14, 2010, 8:01:02 AM10/14/10
to memc...@googlegroups.com
What happens if the add cmd failes because of an unlikely network error?

elSchrom <jerom...@googlemail.com> schrieb:

elSchrom

unread,
Oct 14, 2010, 8:12:52 AM10/14/10
to memcached


On 14 Okt., 14:01, Dieter Schmidt <flatl...@stresstiming.de> wrote:
> What happens if the add cmd failes because of an unlikely network error?

The situation is: two different clients are doing an add() with the
same key at the same time. Both are getting true (assuming that this
key has to be on the same machine,
it has to be an threading problem or a bug in add()). This breaks the
atomic behaviour, we are expecting. But we can not prove, that the key
is in that moment on
the same server, because it is highly volatile. It is just
speculation, because if keys are not stored correctly due to
consistent hashing problems, we should expect more problems.

>
> elSchrom <jerome.p...@googlemail.com> schrieb:

dormando

unread,
Oct 14, 2010, 11:45:25 PM10/14/10
to memcached

Can you give more info about exactly what the app is doing? What version
you're on as well? I can squint at it again and see if there's some minute
case.

Need to know exactly what you're doing though. How long the key tends to
live, how many processes are hammering the same key, what you're setting
the timeout to, etc.

Your behavior's only obstinate because you keep asking if we're sure if
it's atomic. Yes it's supposed to be atomic, if you think you've found a
bug lets talk about bug hunting :P

Tobias

unread,
Oct 15, 2010, 5:45:28 AM10/15/10
to memcached
> Can you give more info about exactly what the app is doing?

Something like this:

value = memcache.get("record" + x)

if (false == value && cache.add("lock" + x, "1", 60)) {

compute (expensive) record
insert record with Primary key x Into DB
memcache.set("record" + x, record);
memcache.delete("lock" + x);

} else {
// someone else is doing the expensive stuff
}

In a very few cases (<20 of 3 Million) we observed a "Duplicate entry"
Mysql-Error.



Adam Lee

unread,
Oct 15, 2010, 1:56:03 PM10/15/10
to memc...@googlegroups.com
Is it ever possible that your compute takes longer than your timeout?
--
awl

Tobias

unread,
Oct 17, 2010, 7:07:31 AM10/17/10
to memcached
> Is it ever possible that your compute takes longer than your timeout?

no, the return value of "memcache.delete("lock" + x) is true.

Les Mikesell

unread,
Oct 17, 2010, 12:07:19 PM10/17/10
to memc...@googlegroups.com
On 10/17/10 6:07 AM, Tobias wrote:
>> Is it ever possible that your compute takes longer than your timeout?
>
> no, the return value of "memcache.delete("lock" + x) is true.

But wouldn't that also be true if another process found the expired lock and set
a new one?

--
Les Mikesell
lesmi...@gmail.com

Reply all
Reply to author
Forward
0 new messages