Multiple instances question

62 views
Skip to first unread message

Bharathiraja P

unread,
Dec 25, 2015, 11:00:33 AM12/25/15
to memcached
Hi,

We have 18 memcache servers and we use Cache::Memcached client to store and retrieve values. Problem we see is, first server in the list gets more connections when compared to others. Any idea how to fix this?

--
Raja

Nicolas Motte

unread,
Dec 25, 2015, 11:30:48 AM12/25/15
to memc...@googlegroups.com
In some client libraries you can choose the hashing function, that will determine the distribution of your keys among the farm.
For instance, in the C client, you have a dozen of hashing functions: http://docs.libmemcached.org/hashkit_functions.html
There is no silver bullet, you need to study which one corresponds better to your data set.
I couldn t find the equivalent in Cache:Memcached, but I guess there is...

Even though this will change the distribution of your keys, that should not change the number of connections.
Each of your clients should keep an open connection to each node of the farm. Do you keep your connections opened or do you re-create them for  each call?
Could you also check if your keys are correctly balanced across your farm?

Cheers
Nico

--

---
You received this message because you are subscribed to the Google Groups "memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bharathiraja P

unread,
Dec 26, 2015, 10:17:22 AM12/26/15
to memc...@googlegroups.com
Thanks Nicolas for your reply.

We don't create new connection every time. I checked with tcpdump and I see more tcp resets happening on first host when compared to others.

--

---
You received this message because you are subscribed to a topic in the Google Groups "memcached" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/memcached/onKftIeaZ18/unsubscribe.
To unsubscribe from this group and all its topics, send an email to memcached+...@googlegroups.com.

Nicolas Motte

unread,
Dec 26, 2015, 11:37:24 AM12/26/15
to memc...@googlegroups.com
Hi Bharathiraja.

TCP reset can have many causes,not easy to troubleshoot remotely.
An easy one is a connection timeout (look at the diagram here: http://www.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf)
If it is the case, using tcp keepalive should solve this problem.
But if it was a connection timeout, i guess you would have this problem on every node..
You might have to investigate a bit further to find the root cause (different clients having the same IP maybe..)

But I m not sure to understand now, 
do you have more connections on your first server all the time (so you monitor simply the number of opened connections every minute) ?
or
do you have more re-connections on your first server (so you monitor the number of new connections on each server) ?

And how many connections and servers are we talking about exactly ?

Bharathiraja P

unread,
Dec 27, 2015, 11:20:46 PM12/27/15
to memcached
Yes. First server always has around 2k connections. Other hosts have <100

Connection resets also happens with the first server.

All the 18 servers have same configurations and tcp_ip settings.

Nicolas Motte

unread,
Dec 28, 2015, 3:26:15 AM12/28/15
to memc...@googlegroups.com
Ok, thx for the figures.

The only thing I could see to explain such a big difference is that most of your keys are hosted on the first server (~54%)
If I m correct, the client opens a connection to a server only when it needs to retrieve a key from it (then the connection stays open).
But if this is your problem, then the connections on other servers should increase slowly..

Can you tell me how many keys you have on each of your server ?
(you can have a look here: http://lzone.de/cheat-sheet/memcached if you don t already monitor your cluster. The command is "stat" and the metric is "curr_items")

Cheers
Nico

Edward Goldberg

unread,
Dec 28, 2015, 3:52:28 AM12/28/15
to memcached
Raja,

This issue is called the "Hot Key Problem".   What happens is that you have "One" or more keys are are very popular and used by all ( or most ) of the code.

For example:  We have a key for each singer, one of the keys is "Lady Gaga",  the rest are not very interesting  ( this is a real example! ).

So all of the posts to Gaga keep the server that is "selected" for GaGa Keys active.

The Hash is to select one of the servers for each key,  it does not:

1) Play FAIR with the distribution of Network packets,
2) Use the same number of connection for all servers.
3) Use the same amount of memory.

So the even spread of the keys over the servers needs to be done by a very well selected Key.

If you have a "Hot Key" then you see these issues of "Not equal" for memory, cpu,  network and connections.

The fix is to use a better Key that tends to have a nice spread over the servers.

If that is NOT possible,  then use more than one "Pool" of Memcache servers as needed to keep these Hot Keys from impacting other keys.

Feel free to ask for more help.  I have seen this issue and solved it for many projects.


Nicolas Motte

unread,
Dec 29, 2015, 3:30:13 AM12/29/15
to memc...@googlegroups.com
Hi Edward :)

Maybe I misunderstood Bharathiraja's question. 

If it was a hot key problem, the server hosting the hot key(s) would receive more traffic (as you explained with the Lady Gaga example), that's clear.
But how can you be sure it would have more client connections if the clients are using tcp-keepalive ? 
I can understand that there is a higher possibility that the client will connect to the server with the hot key, but over time the clients should open a connection to the other servers too (well, of course it depends on the nature of the traffic, but as there are 100 connections on the other servers, I assume some keys are used there too).
So even with a hot key, the connection should slowly increase on the other servers. That why i wanted to know if the 100 connections on each other server was increasing or was constant.

What I want to say is that I understand this *could* be a hot key problem, but can t be 100% for sure without more information.
Moreover the hot key does not explain the TCP resets that are observed (except if the traffic on the first server is so much bigger that the queries are time-outing, Bharathiraja would have to check the cpu on the server)

Cheers
Nico

Bharathiraja P

unread,
Dec 31, 2015, 1:31:15 AM12/31/15
to memcached
Thanks Nicolas and Edward for your help.

There was a problem in our code where it tries to set value without key which caused the problems.

TCP resets occurred when the server wants to use the socket for something else or to free up the socket so, it can be used later.
Reply all
Reply to author
Forward
0 new messages