Membase started to working very slowly

318 views
Skip to first unread message

Andrey Nikishaev

unread,
Feb 25, 2012, 1:51:35 AM2/25/12
to mem...@googlegroups.com
Few days ago Membase started working very slow. We are using local moxi proxy on each server(we have 4 servers).
And we have Membase cluster with 2 servers(10GBx2) with replication. Now we have about 15M records at cluster(30M with replicas).Load on cluster about 2000-6000ops and 100% memory resident.

Membase: 1.7.1
System: Linux CentOS 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Here is timings from moxi proxy:
stats proxy timings
STAT 11311:default:connect 100+100=23 37.70% ******************
STAT 11311:default:connect 200+100=30 86.89% ************************
STAT 11311:default:connect 300+100=7  98.36% *****
STAT 11311:default:connect 400+100=1 100.00% 
STAT 11311:default:connect 500+100=0 100.00% 
STAT 11311:default:reserved     100+100    =6848    1.64% *
STAT 11311:default:reserved     200+100    =41143  11.49% *******
STAT 11311:default:reserved     300+100    =102670 36.08% *******************
STAT 11311:default:reserved     400+100    =127986 66.73% ************************
STAT 11311:default:reserved     500+100    =53582  79.56% **********
STAT 11311:default:reserved     600+100    =31544  87.11% *****
STAT 11311:default:reserved     700+100    =13188  90.27% **
STAT 11311:default:reserved     800+100    =3630   91.14% 
STAT 11311:default:reserved     900+100    =1629   91.53% 
STAT 11311:default:reserved    1000+100    =1253   91.83% 
STAT 11311:default:reserved    1100+100    =1135   92.10% 
STAT 11311:default:reserved    1200+100    =967    92.33% 
STAT 11311:default:reserved    1300+100    =553    92.46% 
STAT 11311:default:reserved    1400+100    =354    92.55% 
STAT 11311:default:reserved    1500+100    =265    92.61% 
STAT 11311:default:reserved    1600+100    =226    92.67% 
STAT 11311:default:reserved    1700+100    =252    92.73% 
STAT 11311:default:reserved    1800+100    =197    92.77% 
STAT 11311:default:reserved    1900+100    =132    92.81% 
STAT 11311:default:reserved    2000+100    =73     92.82% 
STAT 11311:default:reserved    2100+200    =242    92.88% 
STAT 11311:default:reserved    2300+400    =307    92.96% 
STAT 11311:default:reserved    2700+800    =256    93.02% 
STAT 11311:default:reserved    3500+1600   =244    93.08% 
STAT 11311:default:reserved    5100+3200   =241    93.13% 
STAT 11311:default:reserved    8300+6400   =90     93.15% 
STAT 11311:default:reserved   14700+12800  =13     93.16% 
STAT 11311:default:reserved   27500+25600  =256    93.22% 
STAT 11311:default:reserved   53100+51200  =5      93.22% 
STAT 11311:default:reserved  104300+102400 =17107  97.32% ***
STAT 11311:default:reserved  206700+204800 =8553   99.36% *
STAT 11311:default:reserved  411500+409600 =2358   99.93% 
STAT 11311:default:reserved  821100+819200 =255    99.99% 
STAT 11311:default:reserved 1640300+1638400=37    100.00% 
STAT 11311:default:reserved 3278700+3276800=3     100.00% 
STAT 11311:default:reserved 6555500+6553600=0     100.00% 
END

Aliaksey Kandratsenka

unread,
Feb 25, 2012, 2:58:00 PM2/25/12
to mem...@googlegroups.com
On Fri, Feb 24, 2012 at 22:51, Andrey Nikishaev <cre...@gmail.com> wrote:
Few days ago Membase started working very slow. We are using local moxi proxy on each server(we have 4 servers).
And we have Membase cluster with 2 servers(10GBx2) with replication. Now we have about 15M records at cluster(30M with replicas).Load on cluster about 2000-6000ops and 100% memory resident.

This is interesting.

'Slow' means your ops take longer (i.e. latency increased) or your persistence rate have dropped below your updates rate ?

Andrey Nikishaev

unread,
Feb 26, 2012, 4:00:21 AM2/26/12
to mem...@googlegroups.com
I mean that request takes longer time, if i need for example to get 100 keys from MB i get timeout from server. Problem is that, that two weeks ago MB worked fine, but in one day request time greatly increased, and we started to get problems one by one.

We are using Membase 1.7.1 and Moxi 1.7.0

Andrey Nikishaev

unread,
Feb 27, 2012, 6:04:31 AM2/27/12
to mem...@googlegroups.com
Also i just found that disk queues drain rate almost always is 0 and there are about 15k-30k items in disk write queue. Here is screenshot: http://dl.dropbox.com/u/6308768/mb.jpg

Aliaksey Kandratsenka

unread,
Feb 27, 2012, 12:34:08 PM2/27/12
to mem...@googlegroups.com


On Mon, Feb 27, 2012 at 03:04, Andrey Nikishaev <cre...@gmail.com> wrote:
Also i just found that disk queues drain rate almost always is 0 and there are about 15k-30k items in disk write queue. Here is screenshot: http://dl.dropbox.com/u/6308768/mb.jpg

Double-check your swap.

Andrey Nikishaev

unread,
Feb 28, 2012, 6:24:51 AM2/28/12
to mem...@googlegroups.com
Membase server have 24GB of memory, only 10GB used for Membase. So swap not used at all.
How you think this can be caused by hard-drive failures?

Andrey Nikishaev

unread,
Feb 28, 2012, 6:29:38 AM2/28/12
to mem...@googlegroups.com
But i dont think that 2 servers broke hard drives at the same time 

Aliaksey Kandratsenka

unread,
Feb 28, 2012, 10:01:05 AM2/28/12
to mem...@googlegroups.com


On Tue, Feb 28, 2012 at 03:29, Andrey Nikishaev <cre...@gmail.com> wrote:
But i dont think that 2 servers broke hard drives at the same time 

Data ops do not depend on disk access if stuff is in RAM. So disk access latency or write throughput are not relevant here.

Membase has a known issue with mem_used stat quite a bit off sometimes. Double check with your OS that it is actually not in swap.

Andrey Nikishaev

unread,
Feb 29, 2012, 9:40:02 AM2/29/12
to mem...@googlegroups.com
I checked both servers, swap not used at all.

Aliaksey Kandratsenka

unread,
Feb 29, 2012, 9:43:27 AM2/29/12
to mem...@googlegroups.com
On Wed, Feb 29, 2012 at 06:40, Andrey Nikishaev <cre...@gmail.com> wrote:
I checked both servers, swap not used at all.

Ok. So some problem either in moxi or ep-engine/memcached.

Try to kill moxi. It will be automatically respawned. Moxi doesn't keep any vital state so it's a safe thing to do.

Then observe if request latency improved. We'll see if moxi caused it or it's something inside core of membase.

Andrey Nikishaev

unread,
Feb 29, 2012, 2:18:55 PM2/29/12
to mem...@googlegroups.com
I restarted moxi few times already. Also tried different configuration. Problem the same.
But on thing that i found, if block all connection to Membase and wait some time, then open connections, membase work normal few minutes(load not minimal, it the same) and then start slowing down. Maybe some queue in Membase block response loop or something?

Aliaksey Kandratsenka

unread,
Feb 29, 2012, 2:36:37 PM2/29/12
to mem...@googlegroups.com
On Wed, Feb 29, 2012 at 11:18, Andrey Nikishaev <cre...@gmail.com> wrote:
I restarted moxi few times already. Also tried different configuration. Problem the same.
But on thing that i found, if block all connection to Membase and wait some time, then open connections, membase work normal few minutes(load not minimal, it the same) and then start slowing down. Maybe some queue in Membase block response loop or something?

Just to double check. I'm not aware of any blocking that can happen in membase without swapping.

So your GET requests get slowed down and time out? You're using client-side moxi ?

Андрей Никишаев

unread,
Feb 29, 2012, 2:51:46 PM2/29/12
to mem...@googlegroups.com
Yes we are using client-side moxi. Yes there are timeouts under heavy load. Also there many keys in this time areas(about 12% percent):
STAT 11311:default:reserved  104300+102400 
STAT 11311:default:reserved  206700+204800 
STAT 11311:default:reserved  411500+409600 

Aliaksey Kandratsenka

unread,
Feb 29, 2012, 3:01:40 PM2/29/12
to mem...@googlegroups.com
I've asked moxi folks. I think it's likely some moxi tuning issue.

Andrey Nikishaev

unread,
Feb 29, 2012, 3:21:32 PM2/29/12
to mem...@googlegroups.com
ok. Will wait for your response.

Thanks.

Aliaksey Kandratsenka

unread,
Mar 1, 2012, 6:03:33 PM3/1/12
to mem...@googlegroups.com


On Wed, Feb 29, 2012 at 12:21, Andrey Nikishaev <cre...@gmail.com> wrote:
ok. Will wait for your response.

Thanks.

I believe you may be hitting downstream_max condition. Moxi has a limit on number of concurrently used downstream (i.e. to actual memcached process) connections. It's normally reasonably large. But lets double check.

Here's what I want you to try (and sample output in my box):

xxx@beta:~/src/altoros/moxi/ns_server# ../repo18/install/bin/cbstats lh:12001 raw 'proxy buckets' | grep downstream
 12001:default:pstd_stats:err_downstream_write_prep:              0
 12001:default:pstd_stats:max_downstream_reserved_time:           0
 12001:default:pstd_stats:num_downstream_conn:                    2
 12001:default:pstd_stats:tot_assign_downstream:                  2
 12001:default:pstd_stats:tot_downstream_auth:                    2
 12001:default:pstd_stats:tot_downstream_auth_failed:             0
 12001:default:pstd_stats:tot_downstream_bucket:                  2
 12001:default:pstd_stats:tot_downstream_bucket_failed:           0
 12001:default:pstd_stats:tot_downstream_close_on_upstream_close: 0
 12001:default:pstd_stats:tot_downstream_conn:                    2
 12001:default:pstd_stats:tot_downstream_conn_acquired:           2
 12001:default:pstd_stats:tot_downstream_conn_queue_add:          0
 12001:default:pstd_stats:tot_downstream_conn_queue_remove:       0
 12001:default:pstd_stats:tot_downstream_conn_queue_timeout:      0
 12001:default:pstd_stats:tot_downstream_conn_released:           2
 12001:default:pstd_stats:tot_downstream_connect:                 2
 12001:default:pstd_stats:tot_downstream_connect_failed:          0
 12001:default:pstd_stats:tot_downstream_connect_interval:        0
 12001:default:pstd_stats:tot_downstream_connect_max_reached:     0
 12001:default:pstd_stats:tot_downstream_connect_started:         2
 12001:default:pstd_stats:tot_downstream_connect_timeout:         0
 12001:default:pstd_stats:tot_downstream_connect_wait:            2
 12001:default:pstd_stats:tot_downstream_create_failed:           0
 12001:default:pstd_stats:tot_downstream_freed:                   0
 12001:default:pstd_stats:tot_downstream_max_reached:             0
 12001:default:pstd_stats:tot_downstream_propagate_failed:        0
 12001:default:pstd_stats:tot_downstream_quit_server:             0
 12001:default:pstd_stats:tot_downstream_released:                4
 12001:default:pstd_stats:tot_downstream_reserved:                2
 12001:default:pstd_stats:tot_downstream_reserved_time:           0
 12001:default:pstd_stats:tot_downstream_timeout:                 0
 12001:default:pstd_stats:tot_downstream_waiting_errors:          0


We're most interested in value of *_max_reached stats.

cbstats program is called mbstats in membase 1.7.x. And you need to point it to moxi.

Андрей Никишаев

unread,
Mar 2, 2012, 5:16:36 AM3/2/12
to mem...@googlegroups.com
tot_downstream_connect_max_reached = 34, while max downstream connections is set to 40.

Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666


Андрей Никишаев

unread,
Mar 2, 2012, 5:20:48 AM3/2/12
to mem...@googlegroups.com
Here is all proxy stats:

STAT basic:version 1.7.0_4_g6a26e91
STAT basic:nthreads 5
STAT basic:hostname CentOS-56-64-minimal
STAT memcached:settings:maxbytes 67108864
STAT memcached:settings:maxconns 1024
STAT memcached:settings:tcpport 0
STAT memcached:settings:udpport -2
STAT memcached:settings:inter NULL
STAT memcached:settings:verbosity 0
STAT memcached:settings:oldest 0
STAT memcached:settings:evictions on
STAT memcached:settings:domain_socket NULL
STAT memcached:settings:umask 700
STAT memcached:settings:growth_factor 1.25
STAT memcached:settings:chunk_size 48
STAT memcached:settings:num_threads 5
STAT memcached:settings:stat_key_prefix :
STAT memcached:settings:detail_enabled no
STAT memcached:settings:reqs_per_event 20
STAT memcached:settings:cas_enabled yes
STAT memcached:settings:tcp_backlog 1024
STAT memcached:settings:binding_protocol auto-negotiate
STAT memcached:stats:pid 24984
STAT memcached:stats:uptime 524197
STAT memcached:stats:time 1330683251
STAT memcached:stats:version 1.7.0_4_g6a26e91
STAT memcached:stats:pointer_size 64
STAT memcached:stats:rusage_user 12540.118612
STAT memcached:stats:rusage_system 15518.054897
STAT memcached:stats:curr_connections 163
STAT memcached:stats:total_connections 36504289
STAT memcached:stats:connection_structures 433
STAT memcached:stats:cmd_get 0
STAT memcached:stats:cmd_set 0
STAT memcached:stats:cmd_flush 0
STAT memcached:stats:get_hits 0
STAT memcached:stats:get_misses 0
STAT memcached:stats:delete_misses 0
STAT memcached:stats:delete_hits 0
STAT memcached:stats:incr_misses 0
STAT memcached:stats:incr_hits 0
STAT memcached:stats:decr_misses 0
STAT memcached:stats:decr_hits 0
STAT memcached:stats:cas_misses 0
STAT memcached:stats:cas_hits 0
STAT memcached:stats:cas_badval 0
STAT memcached:stats:bytes_read 428270314378
STAT memcached:stats:bytes_written 218585591337
STAT memcached:stats:limit_maxbytes 67108864
STAT memcached:stats:accepting_conns 1
STAT memcached:stats:listen_disabled_num 0
STAT memcached:stats:threads 5
STAT memcached:stats:conn_yields 0
STAT proxy_main:conf_type dynamic
STAT proxy_main:behavior:cycle 200
STAT proxy_main:behavior:downstream_max 2048
STAT proxy_main:behavior:downstream_conn_max 40
STAT proxy_main:behavior:downstream_weight 0
STAT proxy_main:behavior:downstream_retry 1
STAT proxy_main:behavior:downstream_protocol 8
STAT proxy_main:behavior:downstream_timeout 5000
STAT proxy_main:behavior:downstream_conn_queue_timeout 200
STAT proxy_main:behavior:connect_timeout 400
STAT proxy_main:behavior:auth_timeout 100
STAT proxy_main:behavior:wait_queue_timeout 5000
STAT proxy_main:behavior:time_stats 1
STAT proxy_main:behavior:connect_max_errors 5
STAT proxy_main:behavior:connect_retry_interval 30000
STAT proxy_main:behavior:front_cache_max 200
STAT proxy_main:behavior:front_cache_lifespan 0
STAT proxy_main:behavior:front_cache_spec
STAT proxy_main:behavior:front_cache_unspec
STAT proxy_main:behavior:key_stats_max 4000
STAT proxy_main:behavior:key_stats_lifespan 0
STAT proxy_main:behavior:key_stats_spec
STAT proxy_main:behavior:key_stats_unspec
STAT proxy_main:behavior:optimize_set
STAT proxy_main:behavior:host
STAT proxy_main:behavior:port 0
STAT proxy_main:behavior:bucket
STAT proxy_main:behavior:port_listen 11311
STAT proxy_main:behavior:default_bucket_name default
STAT proxy_main:stats:stat_configs 1292
STAT proxy_main:stats:stat_config_fails 0
STAT proxy_main:stats:stat_proxy_starts 2
STAT proxy_main:stats:stat_proxy_start_fails 0
STAT proxy_main:stats:stat_proxy_existings 1291
STAT proxy_main:stats:stat_proxy_shutdowns 0
STAT 11311:default:info:config_ver 1292
STAT 11311:default:info:behaviors_num 2
STAT 11311:default:behavior:downstream_max 2048
STAT 11311:default:behavior:downstream_conn_max 40
STAT 11311:default:behavior:downstream_weight 0
STAT 11311:default:behavior:downstream_retry 1
STAT 11311:default:behavior:downstream_protocol 8
STAT 11311:default:behavior:downstream_timeout 5000
STAT 11311:default:behavior:downstream_conn_queue_timeout 200
STAT 11311:default:behavior:connect_timeout 400
STAT 11311:default:behavior:auth_timeout 100
STAT 11311:default:behavior:wait_queue_timeout 5000
STAT 11311:default:behavior:time_stats 1
STAT 11311:default:behavior:connect_max_errors 5
STAT 11311:default:behavior:connect_retry_interval 30000
STAT 11311:default:behavior:front_cache_max 200
STAT 11311:default:behavior:front_cache_lifespan 0
STAT 11311:default:behavior:front_cache_spec
STAT 11311:default:behavior:front_cache_unspec
STAT 11311:default:behavior:key_stats_max 4000
STAT 11311:default:behavior:key_stats_lifespan 0
STAT 11311:default:behavior:key_stats_spec
STAT 11311:default:behavior:key_stats_unspec
STAT 11311:default:behavior:optimize_set
STAT 11311:default:behavior:usr default
STAT 11311:default:behavior:host
STAT 11311:default:behavior:port 0
STAT 11311:default:behavior:bucket
STAT 11311:default:behavior:port_listen 11311
STAT 11311:default:behavior:default_bucket_name default
STAT 11311:default:behavior-0:downstream_weight 0
STAT 11311:default:behavior-0:downstream_retry 1
STAT 11311:default:behavior-0:downstream_protocol 8
STAT 11311:default:behavior-0:downstream_timeout 5000
STAT 11311:default:behavior-0:downstream_conn_queue_timeout 200
STAT 11311:default:behavior-0:connect_timeout 400
STAT 11311:default:behavior-0:auth_timeout 100
STAT 11311:default:behavior-0:bucket
STAT 11311:default:behavior-1:downstream_weight 0
STAT 11311:default:behavior-1:downstream_retry 1
STAT 11311:default:behavior-1:downstream_protocol 8
STAT 11311:default:behavior-1:downstream_timeout 5000
STAT 11311:default:behavior-1:downstream_conn_queue_timeout 200
STAT 11311:default:behavior-1:connect_timeout 400
STAT 11311:default:behavior-1:auth_timeout 100
STAT 11311:default:behavior-1:bucket
STAT 11311:default:stats:listening 2
STAT 11311:default:stats:listening_failed 0
STAT 11311:default:frontcache:max 0
STAT 11311:default:frontcache:oldest_live 0
STAT 11311:default:frontcache:tot_get_hits 0
STAT 11311:default:frontcache:tot_get_expires 0
STAT 11311:default:frontcache:tot_get_misses 0
STAT 11311:default:frontcache:tot_get_bytes 0
STAT 11311:default:frontcache:tot_adds 0
STAT 11311:default:frontcache:tot_add_skips 0
STAT 11311:default:frontcache:tot_add_fails 0
STAT 11311:default:frontcache:tot_add_bytes 0
STAT 11311:default:frontcache:tot_deletes 0
STAT 11311:default:frontcache:tot_evictions 0
STAT 11311:default:pstd_stats:num_upstream 38
STAT 11311:default:pstd_stats:tot_upstream 36484727
STAT 11311:default:pstd_stats:num_downstream_conn 122
STAT 11311:default:pstd_stats:tot_downstream_conn 19558
STAT 11311:default:pstd_stats:tot_downstream_conn_acquired 422207177
STAT 11311:default:pstd_stats:tot_downstream_conn_released 388655379
STAT 11311:default:pstd_stats:tot_downstream_released 422206722
STAT 11311:default:pstd_stats:tot_downstream_reserved 422206491
STAT 11311:default:pstd_stats:tot_downstream_reserved_time 13605664570243
STAT 11311:default:pstd_stats:max_downstream_reserved_time 5093646
STAT 11311:default:pstd_stats:tot_downstream_freed 0
STAT 11311:default:pstd_stats:tot_downstream_quit_server 19436
STAT 11311:default:pstd_stats:tot_downstream_max_reached 0
STAT 11311:default:pstd_stats:tot_downstream_create_failed 0
STAT 11311:default:pstd_stats:tot_downstream_connect_started 19558
STAT 11311:default:pstd_stats:tot_downstream_connect_wait 19558
STAT 11311:default:pstd_stats:tot_downstream_connect 10545
STAT 11311:default:pstd_stats:tot_downstream_connect_failed 9013
STAT 11311:default:pstd_stats:tot_downstream_connect_timeout 5065
STAT 11311:default:pstd_stats:tot_downstream_connect_interval 33532303
STAT 11311:default:pstd_stats:tot_downstream_connect_max_reached 34
STAT 11311:default:pstd_stats:tot_downstream_waiting_errors 0
STAT 11311:default:pstd_stats:tot_downstream_auth 10545
STAT 11311:default:pstd_stats:tot_downstream_auth_failed 3948
STAT 11311:default:pstd_stats:tot_downstream_bucket 10545
STAT 11311:default:pstd_stats:tot_downstream_bucket_failed 0
STAT 11311:default:pstd_stats:tot_downstream_propagate_failed 33541349
STAT 11311:default:pstd_stats:tot_downstream_close_on_upstream_close 0
STAT 11311:default:pstd_stats:tot_downstream_conn_queue_timeout 0
STAT 11311:default:pstd_stats:tot_downstream_conn_queue_add 34
STAT 11311:default:pstd_stats:tot_downstream_conn_queue_remove 34
STAT 11311:default:pstd_stats:tot_downstream_timeout 10390
STAT 11311:default:pstd_stats:tot_wait_queue_timeout 0
STAT 11311:default:pstd_stats:tot_auth_timeout 3948
STAT 11311:default:pstd_stats:tot_assign_downstream 422206491
STAT 11311:default:pstd_stats:tot_assign_upstream 422206491
STAT 11311:default:pstd_stats:tot_assign_recursion 33
STAT 11311:default:pstd_stats:tot_reset_upstream_avail 0
STAT 11311:default:pstd_stats:tot_multiget_keys 1510
STAT 11311:default:pstd_stats:tot_multiget_keys_dedupe 0
STAT 11311:default:pstd_stats:tot_multiget_bytes_dedupe 0
STAT 11311:default:pstd_stats:tot_optimize_sets 0
STAT 11311:default:pstd_stats:tot_retry 0
STAT 11311:default:pstd_stats:tot_retry_time 0
STAT 11311:default:pstd_stats:max_retry_time 0
STAT 11311:default:pstd_stats:tot_retry_vbucket 0
STAT 11311:default:pstd_stats:tot_upstream_paused 422206491
STAT 11311:default:pstd_stats:tot_upstream_unpaused 422206466
STAT 11311:default:pstd_stats:err_oom 33
STAT 11311:default:pstd_stats:err_upstream_write_prep 0
STAT 11311:default:pstd_stats:err_downstream_write_prep 0
STAT 11311:default:pstd_stats:tot_cmd_time 204913978034393
STAT 11311:default:pstd_stats:tot_cmd_count 422206466
STAT 11311:default:pstd_stats:tot_local_cmd_time 96890409792696
STAT 11311:default:pstd_stats:tot_local_cmd_count 193075981
STAT 11311:default:pstd_stats_cmd:regular_get:seen 340921477
STAT 11311:default:pstd_stats_cmd:regular_get:hits 0
STAT 11311:default:pstd_stats_cmd:regular_get:misses 0
STAT 11311:default:pstd_stats_cmd:regular_get:read_bytes 41483
STAT 11311:default:pstd_stats_cmd:regular_get:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_get:cas 304422149
STAT 11311:default:pstd_stats_cmd:regular_get_key:seen 1510
STAT 11311:default:pstd_stats_cmd:regular_get_key:hits 215093618
STAT 11311:default:pstd_stats_cmd:regular_get_key:misses 1510
STAT 11311:default:pstd_stats_cmd:regular_get_key:read_bytes 37369
STAT 11311:default:pstd_stats_cmd:regular_get_key:write_bytes 118651867109
STAT 11311:default:pstd_stats_cmd:regular_get_key:cas 0
STAT 11311:default:pstd_stats_cmd:regular_set:seen 76062713
STAT 11311:default:pstd_stats_cmd:regular_set:hits 0
STAT 11311:default:pstd_stats_cmd:regular_set:misses 0
STAT 11311:default:pstd_stats_cmd:regular_set:read_bytes 57240043585
STAT 11311:default:pstd_stats_cmd:regular_set:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_set:cas 0
STAT 11311:default:pstd_stats_cmd:regular_add:seen 453111
STAT 11311:default:pstd_stats_cmd:regular_add:hits 0
STAT 11311:default:pstd_stats_cmd:regular_add:misses 0
STAT 11311:default:pstd_stats_cmd:regular_add:read_bytes 61370693
STAT 11311:default:pstd_stats_cmd:regular_add:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_add:cas 0
STAT 11311:default:pstd_stats_cmd:regular_replace:seen 0
STAT 11311:default:pstd_stats_cmd:regular_replace:hits 0
STAT 11311:default:pstd_stats_cmd:regular_replace:misses 0
STAT 11311:default:pstd_stats_cmd:regular_replace:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_replace:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_replace:cas 0
STAT 11311:default:pstd_stats_cmd:regular_delete:seen 14083
STAT 11311:default:pstd_stats_cmd:regular_delete:hits 0
STAT 11311:default:pstd_stats_cmd:regular_delete:misses 0
STAT 11311:default:pstd_stats_cmd:regular_delete:read_bytes 605540
STAT 11311:default:pstd_stats_cmd:regular_delete:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_delete:cas 0
STAT 11311:default:pstd_stats_cmd:regular_append:seen 0
STAT 11311:default:pstd_stats_cmd:regular_append:hits 0
STAT 11311:default:pstd_stats_cmd:regular_append:misses 0
STAT 11311:default:pstd_stats_cmd:regular_append:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_append:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_append:cas 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:seen 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:hits 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:misses 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_prepend:cas 0
STAT 11311:default:pstd_stats_cmd:regular_incr:seen 1311374
STAT 11311:default:pstd_stats_cmd:regular_incr:hits 0
STAT 11311:default:pstd_stats_cmd:regular_incr:misses 0
STAT 11311:default:pstd_stats_cmd:regular_incr:read_bytes 41611215
STAT 11311:default:pstd_stats_cmd:regular_incr:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_incr:cas 0
STAT 11311:default:pstd_stats_cmd:regular_decr:seen 0
STAT 11311:default:pstd_stats_cmd:regular_decr:hits 0
STAT 11311:default:pstd_stats_cmd:regular_decr:misses 0
STAT 11311:default:pstd_stats_cmd:regular_decr:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_decr:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_decr:cas 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:seen 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:hits 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:misses 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_flush_all:cas 0
STAT 11311:default:pstd_stats_cmd:regular_cas:seen 3443732
STAT 11311:default:pstd_stats_cmd:regular_cas:hits 0
STAT 11311:default:pstd_stats_cmd:regular_cas:misses 0
STAT 11311:default:pstd_stats_cmd:regular_cas:read_bytes 18783596006
STAT 11311:default:pstd_stats_cmd:regular_cas:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_cas:cas 3443732
STAT 11311:default:pstd_stats_cmd:regular_stats:seen 12
STAT 11311:default:pstd_stats_cmd:regular_stats:hits 0
STAT 11311:default:pstd_stats_cmd:regular_stats:misses 0
STAT 11311:default:pstd_stats_cmd:regular_stats:read_bytes 214
STAT 11311:default:pstd_stats_cmd:regular_stats:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_stats:cas 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:seen 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:hits 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:misses 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_stats_reset:cas 0
STAT 11311:default:pstd_stats_cmd:regular_version:seen 0
STAT 11311:default:pstd_stats_cmd:regular_version:hits 0
STAT 11311:default:pstd_stats_cmd:regular_version:misses 0
STAT 11311:default:pstd_stats_cmd:regular_version:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_version:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_version:cas 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:seen 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:hits 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:misses 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_verbosity:cas 0
STAT 11311:default:pstd_stats_cmd:regular_quit:seen 36430888
STAT 11311:default:pstd_stats_cmd:regular_quit:hits 0
STAT 11311:default:pstd_stats_cmd:regular_quit:misses 0
STAT 11311:default:pstd_stats_cmd:regular_quit:read_bytes 145723552
STAT 11311:default:pstd_stats_cmd:regular_quit:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_quit:cas 0
STAT 11311:default:pstd_stats_cmd:regular_getl:seen 0
STAT 11311:default:pstd_stats_cmd:regular_getl:hits 0
STAT 11311:default:pstd_stats_cmd:regular_getl:misses 0
STAT 11311:default:pstd_stats_cmd:regular_getl:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_getl:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_getl:cas 0
STAT 11311:default:pstd_stats_cmd:regular_unl:seen 0
STAT 11311:default:pstd_stats_cmd:regular_unl:hits 0
STAT 11311:default:pstd_stats_cmd:regular_unl:misses 0
STAT 11311:default:pstd_stats_cmd:regular_unl:read_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_unl:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_unl:cas 0
STAT 11311:default:pstd_stats_cmd:regular_ERROR:seen 7
STAT 11311:default:pstd_stats_cmd:regular_ERROR:hits 0
STAT 11311:default:pstd_stats_cmd:regular_ERROR:misses 0
STAT 11311:default:pstd_stats_cmd:regular_ERROR:read_bytes 374
STAT 11311:default:pstd_stats_cmd:regular_ERROR:write_bytes 0
STAT 11311:default:pstd_stats_cmd:regular_ERROR:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_get:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_get:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_get:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_get:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_get:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_get:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_get_key:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_set:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_set:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_set:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_set:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_set:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_set:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_add:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_add:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_add:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_add:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_add:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_add:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_replace:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_delete:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_append:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_append:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_append:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_append:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_append:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_append:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_prepend:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_incr:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_decr:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_flush_all:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_cas:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_stats:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_stats_reset:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_version:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_version:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_version:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_version:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_version:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_version:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_verbosity:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_quit:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_getl:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_unl:cas 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:seen 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:hits 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:misses 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:read_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:write_bytes 0
STAT 11311:default:pstd_stats_cmd:quiet_ERROR:cas 0

Andrey Nikishaev

unread,
Mar 6, 2012, 4:59:56 AM3/6/12
to mem...@googlegroups.com
Also i found that at that time when we start facing this problem Metadata in RAM goes rapidly up to 200MB(from 100MB before).

Aliaksey Kandratsenka

unread,
Mar 21, 2012, 5:56:42 PM3/21/12
to mem...@googlegroups.com


On Tue, Mar 6, 2012 at 01:59, Andrey Nikishaev <cre...@gmail.com> wrote:
Also i found that at that time when we start facing this problem Metadata in RAM goes rapidly up to 200MB(from 100MB before).


Sorry for slow reply. Steve Yen (moxi author) was on vacation.

So moxi stats seem to point out that connection attempts to memcached sometimes take too much time. So there could be something on memcached or ep-engine side (double check again that it's has nothing to do with major page faults). BTW, your're running really old versions of everything. Consider upgrading to 1.8.0.

Andrey Nikishaev

unread,
Mar 21, 2012, 6:13:32 PM3/21/12
to mem...@googlegroups.com
Yesterday we upgraded our cluster to Couchbase 1.8.0 after 10 minutes normal work it started slow down and after this it's working like before update, nothing chaged ((
I don't think that problem in moxi, because i tried to work(just one admin script) with Couchbase right through serer-side proxy(that has no load at all) and requests was still very slow.
Maybe there are some special system configuration(sysctl) for Membase/Couchbase?
If you need some more information about system or anything just say. 

Thanks for help)

Aliaksey Kandratsenka

unread,
Mar 21, 2012, 6:21:55 PM3/21/12
to mem...@googlegroups.com
On Wed, Mar 21, 2012 at 15:13, Andrey Nikishaev <cre...@gmail.com> wrote:
Yesterday we upgraded our cluster to Couchbase 1.8.0 after 10 minutes normal work it started slow down and after this it's working like before update, nothing chaged ((
I don't think that problem in moxi, because i tried to work(just one admin script) with Couchbase right through serer-side proxy(that has no load at all) and requests was still very slow.
Maybe there are some special system configuration(sysctl) for Membase/Couchbase?
If you need some more information about system or anything just say. 

Thanks for help)

Is that possible that you somehow exceeded connections limit of memcached (10000 by default).

I recall you had two servers. Is that still true? How many client moxi's you have? Also look at connections stat.

Андрей Никишаев

unread,
Mar 21, 2012, 6:24:45 PM3/21/12
to mem...@googlegroups.com
Yes e still have 2 servers. Each Membase server have about 400 open connections.
We have 4 frontends each have moxi-proxy.

10k connections on frontend or membase server?

Aliaksey Kandratsenka

unread,
Mar 21, 2012, 6:35:05 PM3/21/12
to mem...@googlegroups.com
Each membase server by default has limit of 10k memcached connections.

400 connections is seemingly way below limit so you're hitting something else.

Lets double-check it's same problem. Can you give us same moxi stats you grabbed previously?

Also please provide major page faults count of memcached. You can see that either via top or ps.

Also cbstats program allows you to grab all kinds of stats from memcached/ep-engine. Particularly interesting are "timings" and "all" stats.

Андрей Никишаев

unread,
Mar 21, 2012, 6:39:44 PM3/21/12
to mem...@googlegroups.com
Ok, i will resend you all stats tomorrow.

Also is there a possibility to increase memcached limit? 

Aliaksey Kandratsenka

unread,
Mar 21, 2012, 6:42:09 PM3/21/12
to mem...@googlegroups.com


On Wed, Mar 21, 2012 at 15:39, Андрей Никишаев <cre...@gmail.com> wrote:
Ok, i will resend you all stats tomorrow.

Also is there a possibility to increase memcached limit? 

Yes.

Looks like easiest way is by wrapping memcached with shell script that'll pass -c<much-larger-limit> after original parameters.

Don't forget that you'll need to raise file descriptors limit accordingly. membase/couchbase initscripts set it AFAIK to 10k as well.

Андрей Никишаев

unread,
Mar 21, 2012, 6:44:46 PM3/21/12
to mem...@googlegroups.com
Ok, so i will recheck everything tomorrow and give you fresh stats from moxi and server.

Kindly yours,
Andrey Nikishaev

Michaël de Groot (Contraband)

unread,
Mar 22, 2012, 5:25:56 AM3/22/12
to mem...@googlegroups.com

Hi Andrey,

 

I experienced a similar thing with membase, after changing

$this->memcache = new Memcache();

$this->memcache->addServer($host, $this->bucket['port'])

To this:

$this->memcache->addServer($host, $this->bucket['port'], true, 1, 1, 15, false, array($this, 'failureCallback'))

 

‘Memcache’ object comes from native PHP5.3.3-7+squeeze3.

 

I saw the open connections going up and up and up to 50k, then the ip conntrack kernel module started to block things.

 

I still have to investigate it further but didn’t find the time yet.

 

Michaël

Andrey Nikishaev

unread,
Mar 22, 2012, 6:21:31 AM3/22/12
to mem...@googlegroups.com
Couchbase timings ##############################################################
arith_cmd (974447 total)
    1us - 2us     : (  0.00%)      1 
    2us - 4us     : (  0.01%)     53 
    4us - 8us     : (  0.03%)    201 
    8us - 16us    : (  1.03%)   9773 #
    16us - 32us   : ( 92.06%) 887081 #####################################################################################################################################################################################
    32us - 64us   : ( 97.53%)  53314 ##########
    64us - 128us  : ( 99.82%)  22276 ####
    128us - 256us : (100.00%)   1723 
    256us - 512us : (100.00%)     11 
    512us - 1ms   : (100.00%)      9 
    1ms - 2ms     : (100.00%)      5 
 data_age (20636834 total)
    0 - 1s        : (  0.87%)  179419 #
    1s - 2s       : (  6.83%) 1229526 ###########
    2s - 4s       : ( 16.33%) 1962033 ##################
    4s - 7s       : ( 26.81%) 2161109 ####################
    7s - 10s      : ( 33.92%) 1468459 ##############
    10s - 16s     : ( 43.93%) 2064852 ###################
    16s - 23s     : ( 52.64%) 1798684 #################
    23s - 34s     : ( 63.39%) 2218079 #####################
    34s - 49s     : ( 73.93%) 2174966 ####################
    49s - 1m      : ( 84.78%) 2237841 #####################
    1m - 1m       : ( 93.53%) 1807563 #################
    1m - 2m       : ( 97.64%)  848159 ########
    2m - 3m       : ( 98.96%)  271698 ##
    3m - 4m       : ( 99.67%)  145530 #
    4m - 6m       : ( 99.91%)   50132 
    6m - 9m       : ( 99.95%)    8945 
    9m - 12m      : (100.00%)    8831 
    12m - 17m     : (100.00%)    1008 
 disk_commit (29439 total)
    0 - 1s        : ( 27.51%) 8098 #######################################################
    1s - 2s       : ( 28.23%)  214 #
    2s - 4s       : ( 43.74%) 4565 ###############################
    4s - 7s       : ( 62.51%) 5524 #####################################
    7s - 10s      : ( 83.00%) 6032 #########################################
    10s - 16s     : ( 96.00%) 3827 ##########################
    16s - 23s     : ( 99.78%) 1115 #######
    23s - 34s     : (100.00%)   63 
    34s - 49s     : (100.00%)    1 
 disk_del (3864702 total)
    1us - 2us     : (  0.00%)       2 
    2us - 4us     : ( 23.32%)  901064 ##############################################
    4us - 8us     : ( 68.06%) 1729127 ########################################################################################
    8us - 16us    : ( 95.18%) 1048267 #####################################################
    16us - 32us   : ( 98.97%)  146376 #######
    32us - 64us   : ( 99.34%)   14471 
    64us - 128us  : ( 99.98%)   24694 #
    128us - 256us : (100.00%)     615 
    256us - 512us : (100.00%)      34 
    512us - 1ms   : (100.00%)      27 
    1ms - 2ms     : (100.00%)      24 
    2ms - 4ms     : (100.00%)       1 
 disk_insert (3720767 total)
    2us - 4us     : ( 16.68%)  620801 #################################
    4us - 8us     : ( 46.24%) 1099803 ##########################################################
    8us - 16us    : ( 94.96%) 1812778 ################################################################################################
    16us - 32us   : ( 98.76%)  141193 #######
    32us - 64us   : ( 99.19%)   15989 
    64us - 128us  : ( 99.98%)   29502 #
    128us - 256us : (100.00%)     619 
    256us - 512us : (100.00%)      34 
    512us - 1ms   : (100.00%)      31 
    1ms - 2ms     : (100.00%)      16 
    2ms - 4ms     : (100.00%)       1 
 disk_invalid_item_del (1 total)
    1us - 2us     : (100.00%) 1 ############################################################################################################################################################################################################
 disk_invalid_vbtable_del (294 total)
    512us - 1ms   : ( 81.29%) 239 ####################################################################################################################################################################
    1ms - 2ms     : (100.00%)  55 #####################################
 disk_update (16891367 total)
    2us - 4us     : (  0.00%)     617 
    4us - 8us     : ( 35.44%) 5985479 ######################################################################
    8us - 16us    : ( 88.98%) 9043092 ##########################################################################################################
    16us - 32us   : ( 95.49%) 1099896 ############
    32us - 64us   : ( 97.40%)  323792 ###
    64us - 128us  : ( 99.49%)  351597 ####
    128us - 256us : ( 99.96%)   79771 
    256us - 512us : (100.00%)    6809 
    512us - 1ms   : (100.00%)     205 
    1ms - 2ms     : (100.00%)     108 
    2ms - 4ms     : (100.00%)       1 
 get_cmd (271990315 total)
    0 - 1us       : (  0.00%)      3098 
    1us - 2us     : (  0.42%)   1134954 
    2us - 4us     : ( 26.85%)  71886528 ###################################################
    4us - 8us     : ( 96.55%) 189592751 ########################################################################################################################################
    8us - 16us    : ( 99.21%)   7231257 #####
    16us - 32us   : ( 99.54%)    885000 
    32us - 64us   : ( 99.56%)     67815 
    64us - 128us  : (100.00%)   1177254 
    128us - 256us : (100.00%)      8664 
    256us - 512us : (100.00%)      1087 
    512us - 1ms   : (100.00%)      1486 
    1ms - 2ms     : (100.00%)       402 
    2ms - 4ms     : (100.00%)        15 
    4ms - 8ms     : (100.00%)         2 
    8ms - 16ms    : (100.00%)         2 
 set_vb_cmd (7680 total)
    0 - 1us       : (  0.08%)    6 
    1us - 2us     : ( 11.60%)  885 #######################
    2us - 4us     : ( 90.59%) 6066 ##############################################################################################################################################################
    4us - 8us     : ( 98.80%)  631 ################
    8us - 16us    : ( 99.67%)   67 #
    16us - 32us   : ( 99.82%)   11 
    64us - 128us  : ( 99.96%)   11 
    128us - 256us : ( 99.99%)    2 
    512us - 1ms   : (100.00%)    1 
 storage_age (20636834 total)
    0 - 1s        : (  0.39%)   79479 
    1s - 2s       : (  1.73%)  278232 ##
    2s - 4s       : (  2.91%)  242634 ##
    4s - 7s       : (  4.18%)  263057 ##
    7s - 10s      : (  5.25%)  219060 ##
    10s - 16s     : (  7.25%)  412722 ###
    16s - 23s     : (  9.97%)  562678 #####
    23s - 34s     : ( 15.01%) 1039406 #########
    34s - 49s     : ( 22.71%) 1589602 ###############
    49s - 1m      : ( 32.58%) 2036916 ###################
    1m - 1m       : ( 43.13%) 2176751 ####################
    1m - 2m       : ( 51.79%) 1786380 #################
    2m - 3m       : ( 59.80%) 1653388 ###############
    3m - 4m       : ( 69.64%) 2031245 ###################
    4m - 6m       : ( 81.27%) 2399569 #######################
    6m - 9m       : ( 94.47%) 2724318 ##########################
    9m - 12m      : ( 99.52%) 1042119 #########
    12m - 17m     : ( 99.73%)   44332 
    17m - 24m     : ( 99.90%)   34353 
    24m - 34m     : (100.00%)   20552 
    34m - 48m     : (100.00%)      41 
 store_cmd (32950985 total)
    1us - 2us     : (  0.00%)      296 
    2us - 4us     : (  0.25%)    83421 
    4us - 8us     : (  2.63%)   782380 ####
    8us - 16us    : ( 91.01%) 29122181 ##############################################################################################################################################################################
    16us - 32us   : ( 98.48%)  2461695 ##############
    32us - 64us   : ( 98.77%)    95381 
    64us - 128us  : ( 99.96%)   394021 ##
    128us - 256us : (100.00%)    10725 
    256us - 512us : (100.00%)      538 
    512us - 1ms   : (100.00%)      260 
    1ms - 2ms     : (100.00%)       77 
    2ms - 4ms     : (100.00%)        9 
    8ms - 16ms    : (100.00%)        1 
 tap_mutation (15722391 total)
    1us - 2us     : (  0.00%)       4 
    2us - 4us     : (  0.18%)   28274 
    4us - 8us     : ( 34.16%) 5342711 ###################################################################
    8us - 16us    : ( 92.32%) 9144157 ###################################################################################################################
    16us - 32us   : ( 98.76%) 1012068 ############
    32us - 64us   : ( 99.04%)   44704 
    64us - 128us  : ( 99.98%)  147357 #
    128us - 256us : (100.00%)    2927 
    256us - 512us : (100.00%)      93 
    512us - 1ms   : (100.00%)      73 
    1ms - 2ms     : (100.00%)      20 
    2ms - 4ms     : (100.00%)       3 
    
################################################################################

Andrey Nikishaev

unread,
Mar 22, 2012, 6:23:56 AM3/22/12
to mem...@googlegroups.com
Couchbase all stats ############################################################
 accepting_conns:                1
 auth_cmds:                      9912
 auth_errors:                    0
 bucket_active_conns:            1
 bucket_conns:                   346
 bytes_read:                     84798174616
 bytes_written:                  102956906721
 cas_badval:                     79483
 cas_hits:                       2502589
 cas_misses:                     0
 cmd_flush:                      0
 cmd_get:                        271526408
 cmd_set:                        29459897
 conn_yields:                    82692
 connection_structures:          501
 curr_connections:               355
 curr_items:                     7553723
 curr_items_tot:                 15114581
 daemon_connections:             10
 decr_hits:                      0
 decr_misses:                    0
 delete_hits:                    12384
 delete_misses:                  0
 ep_bg_fetched:                  0
 ep_commit_num:                  29486
 ep_commit_time:                 9
 ep_commit_time_total:           175176
 ep_data_age:                    54
 ep_data_age_highwat:            881
 ep_db_cleaner_status:           complete
 ep_db_strategy:                 multiMTVBDB
 ep_dbinit:                      0
 ep_dbname:                      /data/membase/data/default-data/default
 ep_dbshards:                    4
 ep_diskqueue_drain:             26489610
 ep_diskqueue_fill:              26496446
 ep_diskqueue_items:             6836
 ep_diskqueue_memory:            546880
 ep_diskqueue_pending:           12254936
 ep_exp_pager_stime:             3600
 ep_expired:                     3918411
 ep_flush_all:                   false
 ep_flush_duration:              42
 ep_flush_duration_highwat:      620
 ep_flush_duration_total:        175494
 ep_flush_preempts:              0
 ep_flusher_state:               running
 ep_flusher_todo:                779
 ep_inconsistent_slave_chk:      0
 ep_io_num_read:                 15250556
 ep_io_num_write:                20655306
 ep_io_read_bytes:               3066698584
 ep_io_write_bytes:              29025651372
 ep_item_begin_failed:           0
 ep_item_commit_failed:          0
 ep_item_flush_expired:          3840382
 ep_item_flush_failed:           0
 ep_items_rm_from_checkpoints:   13621402
 ep_keep_closed_checkpoints:     0
 ep_kv_size:                     4406762285
 ep_latency_arith_cmd:           974448
 ep_latency_get_cmd:             272585357
 ep_latency_store_cmd:           33016151
 ep_max_data_size:               10485760000
 ep_max_txn_size:                1000
 ep_mem_high_wat:                7864320000
 ep_mem_low_wat:                 6291456000
 ep_min_data_age:                0
 ep_num_active_non_resident:     0
 ep_num_checkpoint_remover_runs: 37143
 ep_num_eject_failures:          0
 ep_num_eject_replicas:          0
 ep_num_expiry_pager_runs:       51
 ep_num_non_resident:            0
 ep_num_not_my_vbuckets:         296615
 ep_num_pager_runs:              0
 ep_num_value_ejects:            0
 ep_onlineupdate:                false
 ep_onlineupdate_revert_add:     0
 ep_onlineupdate_revert_delete:  0
 ep_onlineupdate_revert_update:  0
 ep_oom_errors:                  0
 ep_overhead:                    107115442
 ep_pending_ops:                 0
 ep_pending_ops_max:             0
 ep_pending_ops_max_duration:    0
 ep_pending_ops_total:           0
 ep_queue_age_cap:               900
 ep_queue_size:                  6057
 ep_storage_age:                 53
 ep_storage_age_highwat:         2139
 ep_storage_type:                featured
 ep_store_max_concurrency:       1
 ep_store_max_readers:           0
 ep_store_max_readwrite:         1
 ep_tap_bg_fetch_requeued:       0
 ep_tap_bg_fetched:              0
 ep_tap_keepalive:               300
 ep_tmp_oom_errors:              0
 ep_too_old:                     72960
 ep_too_young:                   0
 ep_total_cache_size:            4838079667
 ep_total_del_items:             3865124
 ep_total_enqueued:              26496446
 ep_total_new_items:             3729013
 ep_total_persisted:             24520430
 ep_uncommitted_items:           995
 ep_value_size:                  2787662267
 ep_vb_total:                    1024
 ep_vbucket_del:                 0
 ep_vbucket_del_fail:            0
 ep_version:                     1.8.0r_78_g3539559
 ep_warmed_up:                   15249532
 ep_warmup:                      true
 ep_warmup_dups:                 0
 ep_warmup_oom:                  0
 ep_warmup_thread:               complete
 ep_warmup_time:                 37281275
 get_hits:                       151642973
 get_misses:                     119883435
 incr_hits:                      974074
 incr_misses:                    266
 libevent:                       2.0.11-stable
 limit_maxbytes:                 67108864
 listen_disabled_num:            0
 mem_used:                       4513877727
 pid:                            25177
 pointer_size:                   64
 rejected_conns:                 0
 rusage_system:                  5201.894191
 rusage_user:                    7224.026781
 tap_checkpoint_end_received:    157342
 tap_checkpoint_end_sent:        155147
 tap_checkpoint_start_received:  157836
 tap_checkpoint_start_sent:      155658
 tap_connect_received:           41
 tap_delete_received:            1968547
 tap_delete_sent:                2150705
 tap_mutation_received:          15756975
 tap_mutation_sent:              15860009
 tap_opaque_received:            39
 tap_opaque_sent:                82
 threads:                        4
 time:                           1332411754
 total_connections:              9926
 uptime:                         185791
 vb_active_curr_items:           7553723
 vb_active_eject:                0
 vb_active_ht_memory:            50561024
 vb_active_itm_memory:           2097544965
 vb_active_num:                  512
 vb_active_num_non_resident:     0
 vb_active_ops_create:           1869183
 vb_active_ops_delete:           1872106
 vb_active_ops_reject:           0
 vb_active_ops_update:           9785953
 vb_active_perc_mem_resident:    100
 vb_active_queue_age:            730038000
 vb_active_queue_drain:          14125681
 vb_active_queue_fill:           14128760
 vb_active_queue_memory:         246320
 vb_active_queue_pending:        5733223
 vb_active_queue_size:           3079
 vb_dead_num:                    0
 vb_pending_curr_items:          0
 vb_pending_eject:               0
 vb_pending_ht_memory:           0
 vb_pending_itm_memory:          0
 vb_pending_num:                 0
 vb_pending_num_non_resident:    0
 vb_pending_ops_create:          0
 vb_pending_ops_delete:          0
 vb_pending_ops_reject:          0
 vb_pending_ops_update:          0
 vb_pending_perc_mem_resident:   0
 vb_pending_queue_age:           0
 vb_pending_queue_drain:         0
 vb_pending_queue_fill:          0
 vb_pending_queue_memory:        0
 vb_pending_queue_pending:       0
 vb_pending_queue_size:          0
 vb_replica_curr_items:          7560858
 vb_replica_eject:               0
 vb_replica_ht_memory:           50561024
 vb_replica_itm_memory:          2100961030
 vb_replica_num:                 512
 vb_replica_num_non_resident:    0
 vb_replica_ops_create:          1859830
 vb_replica_ops_delete:          1993018
 vb_replica_ops_reject:          0
 vb_replica_ops_update:          7140340
 vb_replica_perc_mem_resident:   100
 vb_replica_queue_age:           827500000
 vb_replica_queue_drain:         12363929
 vb_replica_queue_fill:          12367686
 vb_replica_queue_memory:        300560
 vb_replica_queue_pending:       6521713
 vb_replica_queue_size:          3757
 version:                        UNKNOWN
 
################################################################################

Andrey Nikishaev

unread,
Mar 22, 2012, 8:15:59 AM3/22/12
to mem...@googlegroups.com
MOXI timings ################################################################

STAT 11311:default:connect  100+100 =218 58.13% ************************
STAT 11311:default:connect  200+100 =106 86.40% ***********
STAT 11311:default:connect  300+100 =28  93.87% ***
STAT 11311:default:connect  400+100 =9   96.27% 
STAT 11311:default:connect  500+100 =1   96.53% 
STAT 11311:default:connect  600+100 =1   96.80% 
STAT 11311:default:connect  700+100 =1   97.07% 
STAT 11311:default:connect  800+100 =1   97.33% 
STAT 11311:default:connect  900+100 =1   97.60% 
STAT 11311:default:connect 1000+100 =1   97.87% 
STAT 11311:default:connect 1100+100 =0   97.87% 
STAT 11311:default:connect 1200+100 =1   98.13% 
STAT 11311:default:connect 1300+100 =0   98.13% 
STAT 11311:default:connect 1400+100 =0   98.13% 
STAT 11311:default:connect 1500+100 =0   98.13% 
STAT 11311:default:connect 1600+100 =0   98.13% 
STAT 11311:default:connect 1700+100 =0   98.13% 
STAT 11311:default:connect 1800+100 =0   98.13% 
STAT 11311:default:connect 1900+100 =0   98.13% 
STAT 11311:default:connect 2000+100 =0   98.13% 
STAT 11311:default:connect 2100+200 =1   98.40% 
STAT 11311:default:connect 2300+400 =2   98.93% 
STAT 11311:default:connect 2700+800 =2   99.47% 
STAT 11311:default:connect 3500+1600=2  100.00% 
STAT 11311:default:connect 5100+3200=0  100.00% 
STAT 11311:default:reserved     100+100    =233381   3.60% **
STAT 11311:default:reserved     200+100    =784456  15.71% *********
STAT 11311:default:reserved     300+100    =1989744 46.42% ************************
STAT 11311:default:reserved     400+100    =1393035 67.92% ****************
STAT 11311:default:reserved     500+100    =643466  77.85% *******
STAT 11311:default:reserved     600+100    =332276  82.98% ****
STAT 11311:default:reserved     700+100    =124103  84.90% *
STAT 11311:default:reserved     800+100    =46716   85.62% 
STAT 11311:default:reserved     900+100    =26695   86.03% 
STAT 11311:default:reserved    1000+100    =18055   86.31% 
STAT 11311:default:reserved    1100+100    =15538   86.55% 
STAT 11311:default:reserved    1200+100    =12350   86.74% 
STAT 11311:default:reserved    1300+100    =9301    86.88% 
STAT 11311:default:reserved    1400+100    =6290    86.98% 
STAT 11311:default:reserved    1500+100    =4052    87.04% 
STAT 11311:default:reserved    1600+100    =3099    87.09% 
STAT 11311:default:reserved    1700+100    =2744    87.13% 
STAT 11311:default:reserved    1800+100    =2516    87.17% 
STAT 11311:default:reserved    1900+100    =2039    87.20% 
STAT 11311:default:reserved    2000+100    =1371    87.22% 
STAT 11311:default:reserved    2100+200    =3682    87.28% 
STAT 11311:default:reserved    2300+400    =3951    87.34% 
STAT 11311:default:reserved    2700+800    =3567    87.40% 
STAT 11311:default:reserved    3500+1600   =2818    87.44% 
STAT 11311:default:reserved    5100+3200   =2752    87.48% 
STAT 11311:default:reserved    8300+6400   =796     87.49% 
STAT 11311:default:reserved   14700+12800  =21      87.49% 
STAT 11311:default:reserved   27500+25600  =3353    87.55% 
STAT 11311:default:reserved   53100+51200  =76      87.55% 
STAT 11311:default:reserved  104300+102400 =358707  93.08% ****
STAT 11311:default:reserved  206700+204800 =331930  98.21% ****
STAT 11311:default:reserved  411500+409600 =96774   99.70% *
STAT 11311:default:reserved  821100+819200 =16168   99.95% 
STAT 11311:default:reserved 1640300+1638400=2674    99.99% 
STAT 11311:default:reserved 3278700+3276800=593    100.00% 
STAT 11311:default:reserved 6555500+6553600=0      100.00% 

Andrey Nikishaev

unread,
Mar 22, 2012, 11:37:58 AM3/22/12
to mem...@googlegroups.com
Hi Michaël,

It's bad practise to use persistent connections in PHP. In production they don't give crtitical increase of perfomance, instead they give bunch ob problems. if you need pool of connections it's better to use proxy.
Also Memcahce module haven't cas method, so it's better to use Memcached module, or use Couchbase module(but it can't work through moxi proxy)

Chad Kouse

unread,
Mar 22, 2012, 11:41:19 AM3/22/12
to mem...@googlegroups.com
Fwiw we use the Memcached extension with persistent connections to
moxi across our enterprise. Seems to work ok. We will be investigating
the php couchbase client soon.

--chad

Андрей Никишаев

unread,
Mar 22, 2012, 11:50:45 AM3/22/12
to mem...@googlegroups.com
Chad, but why are you using permanent connections to proxy? 

Kindly yours,
Andrey Nikishaev

Chad Kouse

unread,
Mar 22, 2012, 11:52:21 AM3/22/12
to mem...@googlegroups.com
We saw a performance boost and the traffic patterns from moxi->couchbase smoothed out.

--chad

Андрей Никишаев

unread,
Mar 22, 2012, 1:54:58 PM3/22/12
to mem...@googlegroups.com
And how much perfomance increase did you get from this?

Kindly yours,
Andrey Nikishaev

Chad Kouse

unread,
Mar 22, 2012, 3:10:34 PM3/22/12
to mem...@googlegroups.com
it's a very small amount but spread across hundreds of millions of connections per day it adds up.

one thing to keep in mind -- at least with the Memcached extension -- if you re-add existing servers it will cause lots of problems.  Here is (basically) the code we use:

$membase = new Memcached($persistent_key);
if (!count($membase->getServerList()))
{
    //first time this persistent conn has been opened
    $membase->addServer($host, $port);
}

Андрей Никишаев

unread,
Mar 22, 2012, 3:15:15 PM3/22/12
to mem...@googlegroups.com
We connecting only to local Moxi proxy, and we don't use persistent because Moxi handle it by itself.
But we have problem with Couchbase, and we still don't know why.


Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666



Chad Kouse

unread,
Mar 22, 2012, 3:27:56 PM3/22/12
to mem...@googlegroups.com
Moxi handles persistent connections from moxi->couchbase but you will still see a performance increase keeping persistent connections from php to moxi as well...   I actually don't know any reason to NOT use persistent connections, and I'm fairly certain they are on by default in the php couchbase extension that's forthcoming.

here's an example of traffic patterns smoothing out after we enabled persistent connections to moxi.

Андрей Никишаев

unread,
Mar 22, 2012, 3:35:29 PM3/22/12
to mem...@googlegroups.com
We don't use persistent in PHP because of few reason:
  • small performance increase
  • local connections are cheap and fast
  • i saw how persistent worked in MySQLi library(not very good)
  • our servers can handle much more load(and they cheap)
  • i don't want bugs

Kindly yours,
Andrey Nikishaev

Chad Kouse

unread,
Mar 22, 2012, 3:53:01 PM3/22/12
to mem...@googlegroups.com
So you don't use persistent connections because you didn't like the way MySQL did. Ok. Your call :)

Before we made the change to start using them, moxi was rejecting connections under heavy load. Now we don't have that problem anymore and the clients are happier handling (as you can see) far fewer threads. 

--chad

Andrey Nikishaev

unread,
Mar 22, 2012, 5:30:59 PM3/22/12
to mem...@googlegroups.com
Here is couchbase top stats:

top - 00:30:09 up 151 days, 12:47,  1 user,  load average: 1.71, 1.57, 1.48
Tasks: 159 total,   1 running, 158 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  0.4%sy,  0.0%ni, 85.5%id, 11.5%wa,  0.0%hi,  2.0%si,  0.0%st
Mem:  24676524k total, 19862896k used,  4813628k free,   845020k buffers
Swap: 25164792k total,        0k used, 25164792k free, 12043180k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
25177 couchbas  15   0 4958m 4.7g 3228 S  6.4 20.0 253:43.58 memcached                                                                                       
25132 couchbas  25   0  320m 142m 2500 S  2.0  0.6 119:55.32 beam.smp                                                                                        
25188 couchbas  18   0  3872  508  420 S  0.1  0.0   0:46.56 sigar_port                                                                                      
25172 couchbas  15   0  170m  24m 1248 S  0.0  0.1   0:05.66 moxi                                                                                            
25170 couchbas  15   0 10536  476  380 S  0.0  0.0   0:03.28 inet_gethost                                                                                    
25168 couchbas  18   0  3804  520  444 S  0.0  0.0   0:02.10 memsup                                                                                          
25171 couchbas  18   0 10536  392  288 S  0.0  0.0   0:02.07 inet_gethost                                                                                    
25166 couchbas  18   0 63864 1120  928 S  0.0  0.0   0:01.06 sh                                                                                              
25169 couchbas  17   0  3800  380  308 S  0.0  0.0   0:00.00 cpu_sup                                                                                         

Андрей Никишаев

unread,
Mar 22, 2012, 5:35:00 PM3/22/12
to mem...@googlegroups.com
How many connections do you had? Also this is normal to reject connections under havy load system must have upper limit, so even uner heavy load it still work with static perfomance.

Kindly yours,
Andrey Nikishaev

Chad Kouse

unread,
Mar 22, 2012, 6:37:38 PM3/22/12
to mem...@googlegroups.com
If I recall we were actually running out of network stack. The
persistent connections fixed that but we had an underlying network
issue that was the real culprit.

--chad

Andrey Nikishaev

unread,
Apr 6, 2012, 7:01:24 AM4/6/12
to mem...@googlegroups.com
I found one thing than maybe can cause this problem. In Couchbase cluster that working very slow, almost all memory used for disk cache about 60%(there is only 5% free memory(if we count disk cache)). Could this cause any problems with Couchbase perfomace? Because second cluster have much lower disk cache and have free memory almost 50%

Aliaksey Kandratsenka

unread,
Apr 29, 2012, 2:56:30 PM4/29/12
to mem...@googlegroups.com


On Fri, Apr 6, 2012 at 04:01, Andrey Nikishaev <cre...@gmail.com> wrote:
I found one thing than maybe can cause this problem. In Couchbase cluster that working very slow, almost all memory used for disk cache about 60%(there is only 5% free memory(if we count disk cache)). Could this cause any problems with Couchbase perfomace? Because second cluster have much lower disk cache and have free memory almost 50%

Sorry for long delay. There's a lot going on in my day job. I'm not seeing anything obviously bad in your stats. But I'm not working on that areas of Couchbase.

No small amount of free RAM is completely fine. 50% of free ram on second node is a potentially weird thing. It can be some clue into this issue. Check that item count and IO stats are similar between this nodes.

I've created jira ticket for this issue and I suggest you to continue discussion there. As that's most reliable way forward. Here's link: http://www.couchbase.com/issues/browse/MB-5188

Андрей Никишаев

unread,
Apr 29, 2012, 3:27:23 PM4/29/12
to mem...@googlegroups.com
I already found problem and fixed it for myself. 
Problem was in Coushbase/Membase cluster list. On our cluster we have two network interfaces local(1Tbit) and global(100Mbit). All backend servers have local moxi proxy that used to connect to Couchbase. In Moxi cluster config Couchbase servers have local IPs, but when moxi connect to Couchbase a get server-list Couchbase return global IPs and because global network interfaces heavy loaded we get issue with respons time. 

I think Couchbase need to separate connections by interface, or even better to make this behavior configurable. Also this behavior should be added to docs, because it not obvious. 

Thanks for your help and time. 

Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666



Matt Ingenthron

unread,
Apr 29, 2012, 3:44:01 PM4/29/12
to mem...@googlegroups.com
On 4/29/12 12:27 PM, "Андрей Никишаев" <cre...@gmail.com> wrote:

I already found problem and fixed it for myself. 
Problem was in Coushbase/Membase cluster list. On our cluster we have two network interfaces local(1Tbit) and global(100Mbit). All backend servers have local moxi proxy that used to connect to Couchbase. In Moxi cluster config Couchbase servers have local IPs, but when moxi connect to Couchbase a get server-list Couchbase return global IPs and because global network interfaces heavy loaded we get issue with respons time. 

I think Couchbase need to separate connections by interface, or even better to make this behavior configurable. Also this behavior should be added to docs, because it not obvious. 

It is configurable, but I agree that it needs to be better documented and easier to set up.  We try to discern what to do automatically for ease of use, but sometimes we need hints from the administrator if there are multiple routes between clients and servers.  If you'd clustered them with the IP/Hostname on the network you want to use, it'd have selected that.

If you need to later override where a cluster is running, see: http://www.couchbase.com/docs/couchbase-manual-1.8/couchbase-bestpractice-cloud.html

BUT don't make this change to a running cluster.  You'll need to rebalance out nodes so it's a cluster of one, then make the change, then re-cluster the nodes.

Hope that helps,

Matt



Thanks for your help and time. 

Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666



On Sun, Apr 29, 2012 at 9:56 PM, Aliaksey Kandratsenka <alkond...@gmail.com> wrote:


On Fri, Apr 6, 2012 at 04:01, Andrey Nikishaev <cre...@gmail.com> wrote:
I found one thing than maybe can cause this problem. In Couchbase cluster that working very slow, almost all memory used for disk cache about 60%(there is only 5% free memory(if we count disk cache)). Could this cause any problems with Couchbase perfomace? Because second cluster have much lower disk cache and have free memory almost 50%

Sorry for long delay. There's a lot going on in my day job. I'm not seeing anything obviously bad in your stats. But I'm not working on that areas of Couchbase.

No small amount of free RAM is completely fine. 50% of free ram on second node is a potentially weird thing. It can be some clue into this issue. Check that item count and IO stats are similar between this nodes.

I've created jira ticket for this issue and I suggest you to continue discussion there. As that's most reliable way forward. Here's link: http://www.couchbase.com/issues/browse/MB-5188




-- 
Matt Ingenthron
Couchbase, Inc.

Андрей Никишаев

unread,
Apr 29, 2012, 4:41:27 PM4/29/12
to mem...@googlegroups.com
Problem is that, that for setup i must log to Couchbase server from web, and because of this it setup global interface. But in real life we use both interfaces, because few server based in the different DC and can't use local network.
I already read about using domains instead ip's, but i pity that i don't found it in installation instructions when i first saw Membase server)


Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666



Matt Ingenthron

unread,
Apr 29, 2012, 4:47:00 PM4/29/12
to mem...@googlegroups.com
On 4/29/12 1:41 PM, "Андрей Никишаев" <cre...@gmail.com> wrote:

Problem is that, that for setup i must log to Couchbase server from web, and because of this it setup global interface. But in real life we use both interfaces, because few server based in the different DC and can't use local network.

When you say a different datacenter, do you mean across a wide area network?  Other than with cross datacenter replication in 2.0, that's not an intended deployment.  It may be okay if the latency/throughput are good between those datacenters, but everything you're saying seems to indicate a deployment we'd not expect and not a scenario we test.

The current scenario we test/support is one in which all of the clients/servers have a consistent way of getting to each other.  Couchbase Server does listen on all IPs in a given OS instance, but the config it gives to clients will tell it to use the clustered interface.

This could be related to the slowness you initially reported.

Can you describe your topology?  It'd be good to make it clear if we'd expect issues here.

Андрей Никишаев

unread,
Apr 29, 2012, 4:51:50 PM4/29/12
to mem...@googlegroups.com
All Couchbase server are set in one DC, but some client servers may be in another. For me Couchbase listen both interfaces when i use server-side moxi(11211 port), and i don't know why it not work in the same way when moxi connecting to it through 8091 port


Kindly yours,
Andrey Nikishaev

LinkedIn      http://ua.linkedin.com/in/creotiv
GitHub        http://github.com/creotiv
Skype         creotiv.in.ua 
Mobile        +380632410666



Reply all
Reply to author
Forward
0 new messages