Well I see that you are not getting lots of replies so I will step in.
What I do is run a small CRON job that does a few Memcache actions
like:
Set key
Get key
Get status....
And watch the output for issues on each server. I use plain commands
like:
echo "stats" | nc localhost 11211
This way I see all of the interaction with no client software in the
way.
When an error shows up in these simple commands I send an Alert to the
RightScale Monitor System I use.
You may have a Munin or other monitor system.
Now I see the exact moment when the Memcached goes sour. This
provides a big hint as to why it is happening.
Issues I see are:
Out Of Memory errors.
Time out on connect.
No Connect.
After a few week of looking at the issues, I can get all of these
issues fixed and the code works great.
It does take a few hits at first to find issues like DNS falures, Set
up values set too low.
Please ask for help, Memcache is a great Cache....
Edward M. Goldberg
http://myCloudWatcher.com/