It is not balanced. It is psuedo-random.
> We run ten Memcached instances across ten physical machines. They are
> all windows boxes (we are eventually converting to Linux) and at the
> default settings (-c 1024 –m 1024, etc). There are no load balancers
> between the Memcached servers.
I am not really sure what you would do with load balancers and memcached.
> I have looked at many days of data and it seems that one server is
> always processing the most req/sec and another server is always
> processing the least. In fact, the req/sec that the cluster processes
> seems to cascade across each server, and the order is always the same.
> I have provided a link to this graph (real world data) and would like
> someone's input as to why this might be happening. Could the load
> balancing at the webserver tier be the culprit?
Do you perhaps have one very popular piece of cache that is requested
much more than others? We have this at dealnews. Our front page cache
is requested much more than any other single piece of cache. In our
case we have enough other stuff that you can't really see it on the
graphs, but that one cache is always coming from the same server.
> I have also noticed some odd behavior when one server goes "down" and
> is no longer available to the cluster.
>
> In this case it seems that one server takes the brunt of the load
> instead of it being distributed across the other servers evenly. Does
> anyone know more about the failover algorithm as to how the load to a
> "down" server is redistributed across the cluster? I have provided
> another graph (real world data).
>
> http://meppum.com/random/spike.png
What client are you using? I have never seen such a thing. And in that
graph, which server went down? All the nodes keep a consistent line
there. I don't see a node dropping out of the graph.
Brian.
Do you mean you get more requests on one of the servers? More used
memory? More keys?
Of all of these only the last indicates a problem on memcached. Are
there any hot keys on your application? Are all the entries of similar
size? What client library are you using?
--
Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc
----------------------------------------------------------------
"One man’s theology is another man’s belly laugh." -- Robert A. Heinlein
Windows is not a language. What _client_ library are you using?
Brian.
The mapping from key to server happens in the client, so as long as all
the servers are up this is to be expected. The mapping doesn't change
when a server is restarted, so the order stays the same.
When a server is down, the load from this server will be redistributed
to the other servers, so the order on the remaining servers may change.
But when the server is up again, the original order should be restored.
> Spike.png is the one where the server went down. It is actually at
> zero the entire time for that dataset. I'm using the latest windows
> client 1.2.1. I'm thinking that the spike might be due to a piece of
> common data being access. The one thing that makes me second guess
> that idea is that the same server always experiences significantly
> more load than the others.
In http://meppum.com/random/balance.png the server with the highest load
is server5, not server1. I was wondering about that, but now I see that
you probably removed server1 from that graph so that the order of the
other servers is visible - both graphs show the same time period, right?
I also notice that all the servers except server1 have a very smooth
access pattern. But server1 has a jaggy access pattern - there are lots
of little spikes and troughs. Is it possible that these patterns come
from a single hot item? Or maybe some of your clients use only server1,
while others use all servers?
hp
--
_ | Peter J. Holzer | Openmoko has already embedded
|_|_) | Sysadmin WSR | voting system.
| | | h...@hjp.at | Named "If you want it -- write it"
__/ | http://www.hjp.at/ | -- Ilja O. on comm...@lists.openmoko.org
This makes sense, and might be the reason for the single server spike
in the spike.png graph. What surprised me is that the balance.png
graph shows the req/sec for each server, and the order the servers are
in that graph is aways the same, even if the servers are restarted. I
would expect that this order would change and appear more random. Any
insight as to why this might be happening?
That is correct. We also store session in memcached. But, as for the
single most requested key, it is the proxied cache of our front page.
Brian.