Write failures + server temporarily disabled

1,068 views
Skip to first unread message

Jozsef Rekedt-Nagy

unread,
Feb 11, 2014, 11:06:25 AM2/11/14
to memc...@googlegroups.com
Hi,

We are seeing an issue pretty similar to https://groups.google.com/forum/#!topic/memcached/nOHiaR77KDs

Every couple of seconds (and sometimes minutes, likely based on load) we are getting ResCode of 5, Memcached::RES_WRITE_FAILURE
Other than that we are also seeing ResCode 47, aka MEMCACHED_SERVER_TEMPORARILY_DISABLED, happens at the ~same frequency.

Any idea what might causes this or how to debug?



Stats from running instance:

stats
STAT pid 95260
STAT uptime 129519
STAT time 1392134446
STAT version 1.4.15
STAT libevent 1.4.14b-stable
STAT pointer_size 64
STAT rusage_user 873.509253
STAT rusage_system 4118.565899
STAT curr_connections 10
STAT total_connections 1610493
STAT connection_structures 507
STAT reserved_fds 20
STAT cmd_get 104449936
STAT cmd_set 3609304
STAT cmd_flush 1
STAT cmd_touch 0
STAT get_hits 99597865
STAT get_misses 4852071
STAT delete_misses 9
STAT delete_hits 189
STAT incr_misses 6417
STAT incr_hits 137834
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 21383
STAT cas_badval 18
STAT touch_hits 0
STAT touch_misses 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 19232674690
STAT bytes_written 260137182629
STAT limit_maxbytes 1048576000
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT hash_power_level 19
STAT hash_bytes 4194304
STAT hash_is_expanding 0
STAT bytes 937602395
STAT curr_items 400717
STAT total_items 3747110
STAT expired_unfetched 68105
STAT evicted_unfetched 2078947
STAT evictions 2964506
STAT reclaimed 72360
END

Thanks
Message has been deleted

Ryan McElroy

unread,
Feb 11, 2014, 6:36:59 PM2/11/14
to memc...@googlegroups.com
Hi, I'm not familiar with this the client errors -- but from a search, I'm guessing that you're using the PHP memcached library? Eg http://www.php.net/manual/en/memcached.constants.php...

Anyway, to tell what the server is actually complaining about (as opposed to how the client is reporting back), I like to use a network dump -- ngrep is my favorite. Can you see what the actual network traffic on the port is like during one of these failures? That might give some clues.

Based on what you've said so far, he's a shot in the dark: when the server is filing up (eg, not evicting) everything is fine, but when we need to evict as well, things get a little slower and so either the client is timing out or the server is having allocation trouble (I don't know if the actually happens, which is why I want to see what's coming over the wire). Once the client sees a few of these failure to write (or maybe just one), it puts connecting to the server on hold for a short time to make sure it's not contributing to the problem, and then for this period of time you see the the temporarily disabled error (and the client isn't actually sending traffic). The server is probably still up at this point, the client is just failing fast.

This theory can be confirmed or refuted with ngrep or tcpdump.

Hope this helps!

~Ryan


On Tue, Feb 11, 2014 at 8:26 AM, Joe7 <joe7...@gmail.com> wrote:
Just to confirm: its only happening once the cache is full

--
 
---
You received this message because you are subscribed to the Google Groups "memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Joe7

unread,
Feb 11, 2014, 6:47:58 PM2/11/14
to memc...@googlegroups.com
Hi Ryan!

The client is PHP/Pecl memcached yes.

It is actually happening before the cache would get full too, managed to replicate it at only ~20% filled today.
It's not happening for a couple of mins after memcached restart..but later then every minute or even every couple of seconds.

There is quite a bit of traffic on the port usually, but will deffo try to filter for this with ngrep and report back.

cheers

dormando

unread,
Feb 12, 2014, 2:35:28 AM2/12/14
to memc...@googlegroups.com
Can you get 'stats' output? What does listen_disabled_num say? Can you
start memcached with -o maxconns_fast, and does that change the errors you
get?

dormando

unread,
Feb 12, 2014, 2:36:43 AM2/12/14
to memc...@googlegroups.com
Ah, I saw this late.

Have you been through: http://memcached.org/timeouts yet?

Joe7

unread,
Feb 12, 2014, 3:29:29 PM2/12/14
to memc...@googlegroups.com
Ok so this is seemingly fixed with compression ON.

Not hitting any bandwith limits imho, it's gigabit network between memcached server and client, using up to 40Mb/sec without compression, and around 15Mb/sec with compression ON.
Reply all
Reply to author
Forward
0 new messages