Hello All !))
I have a some problem.
We have 3 memcached nodes cluster (memcached version: 1.4.5).
Average loading on every node is about 300-350 req/s.
We are interacting with memcached throw JAVA-API (net.spy.memcached)
But sometimes (very rarely) we gets error lines in our application log, as folowing:
01:00:59 ERROR Error: Out of memory
01:30:01 ERROR Error: Out of memory
03:30:25 ERROR Error: Out of memory
11:11:16 ERROR Error: Out of memory
I can't found any dependencies between these errors and software environment.
And anything:
I found "outofmemory" errors on some slabs (stats slabs and stats items requests) as folowing:
server 1:
stats settings
STAT maxconns 1024
STAT tcpport 11211
STAT udpport 11211
STAT inter NULL
STAT verbosity 0
STAT oldest 26962889
STAT evictions on
STAT domain_socket NULL
STAT umask 700
STAT growth_factor 1.25
STAT chunk_size 48
STAT num_threads 30
STAT stat_key_prefix :
STAT detail_enabled no
STAT reqs_per_event 20
STAT cas_enabled yes
STAT tcp_backlog 1024
STAT binding_protocol auto-negotiate
STAT auth_enabled_sasl no
STAT item_size_max 1048576
stats items
........
STAT items:39:number 1
STAT items:39:age 27457197
STAT items:39:evicted 3720
STAT items:39:evicted_nonzero 3
STAT items:39:evicted_time 10
STAT items:39:outofmemory 41
STAT items:39:tailrepairs 0
STAT items:39:reclaimed 0
..........
stats slabs
...
STAT 39:chunk_size 493552
STAT 39:chunks_per_page 2
STAT 39:total_pages 1
STAT 39:total_chunks 2
STAT 39:used_chunks 1
STAT 39:free_chunks 1
STAT 39:free_chunks_end 0
STAT 39:mem_requested 415638
STAT 39:get_hits 562215
STAT 39:cmd_set 568308
STAT 39:delete_hits 0
STAT 39:incr_hits 0
STAT 39:decr_hits 0
STAT 39:cas_hits 0
STAT 39:cas_badval 0
....
STAT active_slabs 42
STAT total_malloced 2151880632
On server 2 we have the same situation, but outofmemory: 9
On server 3 is all OK
So, we have 50 "outofmemory" events in order to "stats" requests from memcached servers.
But in application logs we found about 200 messages as above.
And my questions:
1. Why "outofmemory" count and error messages count in our logs are not the same ?
2. Why this problem is occured ?
3. Can we dispose from this problem (any bug-fix already exists maybe...) ?
Thanks,
PS.
I saw memcached source code, and as I understand, this error message is when occurs PROTOCOL_BINARY_RESPONSE_ENOMEM