Couchbase 3.0 node going down at every weekend.
Version: 3.0.0 Enterprise Edition (build-1209)
Cluster State ID: 03B-020-217
Please see below logs.
Request you to pls let me know , to avoid this Failove.
Event Module Code Server Node Time Remote cluster reference "Virginia_to_OregonS" updated. New name is "VirginiaM_to_OregonS". menelaus_web_remote_clusters000 ns_1ec2-####104.compute-1.amazonaws.com 12:46:38 - Mon Nov 17, 2014 Client-side error-report for user undefined on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com': User-Agent:Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Got unhandled error: Script error. At: http://ph.couchbase.net/v2?callback=jQuery162012552191850461902_1416204362614&launchID=8eba0b18a4e965daf1c3a0baecec994c-1416208180553-3638&version=3.0.0-1209-rel-enterprise&_=1416208180556:0:0 Backtrace: <generated> generateStacktrace@http://ec2-####108 -.compute-1.amazonaws.com:8091/js/bugsnag.js:411:7 bugsnag@http://ec2-####108 -.compute-1.amazonaws.com:8091/js/bugsnag.js:555:13 menelaus_web102 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:45:56 - Mon Nov 17, 2014 Replication from bucket "apro" to bucket "apro" on cluster "Virginia_to_OregonS" created. menelaus_web_xdc_replications000 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:38:49 - Mon Nov 17, 2014 Replication from bucket "apro" to bucket "apro" on cluster "Virginia_to_OregonS" removed. xdc_rdoc_replication_srv000 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:38:40 - Mon Nov 17, 2014 Rebalance completed successfully. ns_orchestrator001 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:17 - Mon Nov 17, 2014 Bucket "ifa" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:04 - Mon Nov 17, 2014 Started rebalancing bucket ifa ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:02 - Mon Nov 17, 2014 Could not automatically fail over node ('ns_1@ec2-####108 -.compute-1.amazonaws.com'). Rebalance is running. auto_failover001 ns_1@ec2-####107.compute-1.amazonaws.com 11:49:58 - Mon Nov 17, 2014 Bucket "apro" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@ec2-####107.compute-1.amazonaws.com 11:48:02 - Mon Nov 17, 2014 Started rebalancing bucket apro ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 11:47:59 - Mon Nov 17, 2014 Bucket "apro" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 366 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 11:47:58 - Mon Nov 17, 2014 Bucket "ifa" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 96 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 11:43:29 - Mon Nov 17, 2014 Starting rebalance, KeepNodes = ['ns_1ec2-####104.compute-1.amazonaws.com', 'ns_1@ec2-####107.compute-1.amazonaws.com', 'ns_1@ec2-####108 -.compute-1.amazonaws.com'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@ec2-####108 -.compute-1.amazonaws.com'], Delta recovery buckets = all ns_orchestrator004 ns_1@ec2-####107.compute-1.amazonaws.com 11:41:52 - Mon Nov 17, 2014 Control connection to memcached on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' disconnected: {badmatch, {error, closed}} ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:54 - Sun Nov 16, 2014 Node ('ns_1@ec2-####108 -.compute-1.amazonaws.com') was automatically failovered. [stale, {last_heard,{1416,152978,82869}}, {stale_slow_status,{1416,152863,60088}}, {now,{1416,152968,80503}}, {active_buckets,["apro","ifa"]}, {ready_buckets,["ifa"]}, {status_latency,5743}, {outgoing_replications_safeness_level,[{"apro",green},{"ifa",green}]}, {incoming_replications_conf_hashes, [{"apro", [{'ns_1ec2-####104.compute-1.amazonaws.com',126796989}, {'ns_1@ec2-####107.compute-1.amazonaws.com',41498822}]}, {"ifa", [{'ns_1ec2-####104.compute-1.amazonaws.com',126796989}, {'ns_1@ec2-####107.compute-1.amazonaws.com',41498822}]}]}, {local_tasks, [[{type,xdcr}, {id,<<"949dcce68db4b6d1add4c033ec4e32a9/apro/apro">>}, {errors, [<<"2014-11-16 19:35:03 [Vb Rep] Error replicating vbucket 201. Please see logs for details.">>]}, {changes_left,220}, {docs_checked,51951817}, {docs_written,51951817}, {active_vbreps,4}, {max_vbreps,4}, {waiting_vbreps,210}, {time_working,1040792.401734}, {time_committing,0.0}, {time_working_rate,0.9101340661254117}, {num_checkpoints,53490}, {num_failedckpts,1}, {wakeups_rate,11.007892659036528}, {worker_batches_rate,20.514709046386255}, {rate_replication,22.015785318073057}, {bandwidth_usage,880.6314127229223}, {rate_doc_checks,22.015785318073057}, {rate_doc_opt_repd,22.015785318073057}, {meta_latency_aggr,0.0}, {meta_latency_wt,0.0}, {docs_latency_aggr,1271.0828664152195}, {docs_latency_wt,20.514709046386255}], [{type,xdcr}, {id,<<"fc72b1b0e571e9c57671d6621cac6058/apro/apro">>}, {errors,[]}, {changes_left,278}, {docs_checked,51217335}, {docs_written,51217335}, {active_vbreps,4}, {max_vbreps,4}, {waiting_vbreps,269}, {time_working,1124595.930738}, {time_committing,0.0}, {time_working_rate,1.019751359238166}, {num_checkpoints,54571}, {num_failedckpts,3}, {wakeups_rate,6.50472893793788}, {worker_batches_rate,16.51200422707308}, {rate_replication,23.01673316501096}, {bandwidth_usage,936.6809670630547}, {rate_doc_checks,23.01673316501096}, {rate_doc_opt_repd,23.01673316501096}, {meta_latency_aggr,0.0}, {meta_latency_wt,0.0}, {docs_latency_aggr,1500.9621995190503}, {docs_latency_wt,16.51200422707308}], [{type,xdcr}, {id,<<"16b1afb33dbcbde3d075e2ff634d9cc0/apro/apro">>}, {errors, [<<"2014-11-16 19:21:55 [Vb Rep] Error replicating vbucket 258. Please see logs for details.">>, <<"2014-11-16 19:22:41 [Vb Rep] Error replicating vbucket 219. Please see logs for details.">>, <<"2014-11-16 19:23:04 [Vb Rep] Error replicating vbucket 315. Please see logs for details.">>, <<"2014-11-16 20:06:40 [Vb Rep] Error replicating vbucket 643. Please see logs for details.">>, <<"2014-11-16 20:38:20 [Vb Rep] Error replicating vbucket 651. Please see logs for details.">>]}, {changes_left,0}, {docs_checked,56060297}, {docs_written,56060297}, {active_vbreps,0}, {max_vbreps,4}, {waiting_vbreps,0}, {time_working,140073.119377}, {time_committing,0.0}, {time_working_rate,0.04649055712180432}, {num_checkpoints,103504}, {num_failedckpts,237}, {wakeups_rate,21.524796565643623}, {worker_batches_rate,22.52594989427821}, {rate_replication,22.52594989427821}, {bandwidth_usage,913.0518357147434}, {rate_doc_checks,22.52594989427821}, {rate_doc_opt_repd,22.52594989427821}, {meta_latency_aggr,0.0}, {meta_latency_wt,0.0}, {docs_latency_aggr,13.732319632216313}, {docs_latency_wt,22.52594989427821}], [{type,xdcr}, {id,<<"b734095ad63ea9832f9da1b1ef3449ac/apro/apro">>}, {errors, [<<"2014-11-16 19:36:22 [Vb Rep] Error replicating vbucket 260. Please see logs for details.">>, <<"2014-11-16 19:36:38 [Vb Rep] Error replicating vbucket 299. Please see logs for details.">>, <<"2014-11-16 19:36:43 [Vb Rep] Error replicating vbucket 205. Please see logs for details.">>, <<"2014-11-16 19:36:48 [Vb Rep] Error replicating vbucket 227. Please see logs for details.">>, <<"2014-11-16 20:26:19 [Vb Rep] Error replicating vbucket 175. Please see logs for details.">>, <<"2014-11-16 20:26:25 [Vb Rep] Error replicating vbucket 221. Please see logs for details.">>, <<"2014-11-16 21:16:40 [Vb Rep] Error replicating vbucket 293. Please see logs for details.">>, <<"2014-11-16 21:16:40 [Vb Rep] Error replicating vbucket 251. Please see logs for details.">>, <<"2014-11-16 21:17:06 [Vb Rep] Error replicating vbucket 270. Please see logs for details.">>]}, {changes_left,270}, {docs_checked,50418639}, {docs_written,50418639}, {active_vbreps,4}, {max_vbreps,4}, {waiting_vbreps,261}, {time_working,1860159.788732}, {time_committing,0.0}, {time_working_rate,1.008940755729142}, {num_checkpoints,103426}, {num_failedckpts,87}, {wakeups_rate,6.50782891818858}, {worker_batches_rate,16.01927118323343}, {rate_replication,23.027702325898055}, {bandwidth_usage,933.1225464233472}, {rate_doc_checks,23.027702325898055}, {rate_doc_opt_repd,23.027702325898055}, {meta_latency_aggr,0.0}, {meta_latency_wt,0.0}, {docs_latency_aggr,1367.9901922012182}, {docs_latency_wt,16.01927118323343}], [{type,xdcr}, {id,<<"e213600feb7ec1dfa0537173ad7f2e02/apro/apro">>}, {errors, [<<"2014-11-16 20:16:39 [Vb Rep] Error replicating vbucket 647. Please see logs for details.">>, <<"2014-11-16 20:17:31 [Vb Rep] Error replicating vbucket 619. Please see logs for details.">>]}, {changes_left,854}, {docs_checked,33371659}, {docs_written,33371659}, {active_vbreps,4}, {max_vbreps,4}, {waiting_vbreps,318}, {time_working,2421539.8537169998}, {time_committing,0.0}, {time_working_rate,1.7382361098734072}, {num_checkpoints,102421}, {num_failedckpts,85}, {wakeups_rate,3.0038659755104824}, {worker_batches_rate,7.009020609524459}, {rate_replication,30.539304084356573}, {bandwidth_usage,1261.6237097144026}, {rate_doc_checks,30.539304084356573}, {rate_doc_opt_repd,30.539304084356573}, {meta_latency_aggr,0.0}, {meta_latency_wt,0.0}, {docs_latency_aggr,1997.2249284829577}, {docs_latency_wt,7.009020609524459}]]}, {memory, [{total,752400928}, {processes,375623512}, {processes_used,371957960}, {system,376777416}, {atom,594537}, {atom_used,591741}, {binary,94783616}, {code,15355960}, {ets,175831736}]}, {system_memory_data, [{system_total_memory,64552329216}, {free_swap,0}, {total_swap,0}, {cached_memory,27011342336}, {buffered_memory,4885585920}, {free_memory,12694065152}, {total_memory,64552329216}]}, {node_storage_conf, [{db_path,"/data/couchbase"},{index_path,"/data/couchbase"}]}, {statistics, [{wall_clock,{552959103,4997}}, {context_switches,{8592101014,0}}, {garbage_collection,{2034857586,5985868018204,0}}, {io,{{input,270347194989},{output,799175854069}}}, {reductions,{833510054494,7038093}}, {run_queue,0}, {runtime,{553128340,5090}}, {run_queues,{0,0,0,0,0,0,0,0}}]}, {system_stats, [{cpu_utilization_rate,2.5316455696202533}, {swap_total,0}, {swap_used,0}, {mem_total,64552329216}, {mem_free,44590993408}]}, {interesting_stats, [{cmd_get,0.0}, {couch_docs_actual_disk_size,21729991305}, {couch_docs_data_size,11673379153}, {couch_views_actual_disk_size,0}, {couch_views_data_size,0}, {curr_items,30268090}, {curr_items_tot,60625521}, {ep_bg_fetched,0.0}, {get_hits,0.0}, {mem_used,11032659776}, {ops,116.0}, {vb_replica_curr_items,30357431}]}, {per_bucket_interesting_stats, [{"ifa", [{cmd_get,0.0}, {couch_docs_actual_disk_size,611617800}, {couch_docs_data_size,349385716}, {couch_views_actual_disk_size,0}, {couch_views_data_size,0}, {curr_items,1020349}, {curr_items_tot,2039753}, {ep_bg_fetched,0.0}, {get_hits,0.0}, {mem_used,307268040}, {ops,0.0}, {vb_replica_curr_items,1019404}]}, {"apro", [{cmd_get,0.0}, {couch_docs_actual_disk_size,21118373505}, {couch_docs_data_size,11323993437}, {couch_views_actual_disk_size,0}, {couch_views_data_size,0}, {curr_items,29247741}, {curr_items_tot,58585768}, {ep_bg_fetched,0.0}, {get_hits,0.0}, {mem_used,10725391736}, {ops,116.0}, {vb_replica_curr_items,29338027}]}]}, {processes_stats, [{<<"proc/(main)beam.smp/cpu_utilization">>,0}, {<<"proc/(main)beam.smp/major_faults">>,0}, {<<"proc/(main)beam.smp/major_faults_raw">>,0}, {<<"proc/(main)beam.smp/mem_resident">>,943411200}, {<<"proc/(main)beam.smp/mem_share">>,6901760}, {<<"proc/(main)beam.smp/mem_size">>,2951794688}, {<<"proc/(main)beam.smp/minor_faults">>,0}, {<<"proc/(main)beam.smp/minor_faults_raw">>,456714435}, {<<"proc/(main)beam.smp/page_faults">>,0}, {<<"proc/(main)beam.smp/page_faults_raw">>,456714435}, {<<"proc/beam.smp/cpu_utilization">>,0}, {<<"proc/beam.smp/major_faults">>,0}, {<<"proc/beam.smp/major_faults_raw">>,0}, {<<"proc/beam.smp/mem_resident">>,108077056}, {<<"proc/beam.smp/mem_share">>,2973696}, {<<"proc/beam.smp/mem_size">>,1113272320}, {<<"proc/beam.smp/minor_faults">>,0}, {<<"proc/beam.smp/minor_faults_raw">>,6583}, {<<"proc/beam.smp/page_faults">>,0}, {<<"proc/beam.smp/page_faults_raw">>,6583}, {<<"proc/memcached/cpu_utilization">>,0}, {<<"proc/memcached/major_faults">>,0}, {<<"proc/memcached/major_faults_raw">>,0}, {<<"proc/memcached/mem_resident">>,17016668160}, {<<"proc/memcached/mem_share">>,6885376}, {<<"proc/memcached/mem_size">>,17812746240}, {<<"proc/memcached/minor_faults">>,0}, {<<"proc/memcached/minor_faults_raw">>,4385001}, {<<"proc/memcached/page_faults">>,0}, {<<"proc/memcached/page_faults_raw">>,4385001}]}, {cluster_compatibility_version,196608}, {version, [{lhttpc,"1.3.0"}, {os_mon,"2.2.14"}, {public_key,"0.21"}, {asn1,"2.0.4"}, {couch,"2.1.1r-432-gc2af28d"}, {kernel,"2.16.4"}, {syntax_tools,"1.6.13"}, {xmerl,"1.3.6"}, {ale,"3.0.0-1209-rel-enterprise"}, {couch_set_view,"2.1.1r-432-gc2af28d"}, {compiler,"4.9.4"}, {inets,"5.9.8"}, {mapreduce,"1.0.0"}, {couch_index_merger,"2.1.1r-432-gc2af28d"}, {ns_server,"3.0.0-1209-rel-enterprise"}, {oauth,"7d85d3ef"}, {crypto,"3.2"}, {ssl,"5.3.3"}, {sasl,"2.3.4"}, {couch_view_parser,"1.0.0"}, {mochiweb,"2.4.2"}, {stdlib,"1.19.4"}]}, {supported_compat_version,[3,0]}, {advertised_version,[3,0,0]}, {system_arch,"x86_64-unknown-linux-gnu"}, {wall_clock,552959}, {memory_data,{64552329216,51966836736,{<13661.389.0>,147853368}}}, {disk_data, [{"/",10309828,38}, {"/dev/shm",31519692,0}, {"/mnt",154817516,1}, {"/data",1056894132,3}]}, {meminfo, <<"MemTotal: 63039384 kB\nMemFree: 12396548 kB\nBuffers: 4771080 kB\nCached: 26378264 kB\nSwapCached: 0 kB\nActive: 31481704 kB\nInactive: 17446048 kB\nActive(anon): 17750620 kB\nInactive(anon): 2732 kB\nActive(file): 13731084 kB\nInactive(file): 17443316 kB\nUnevictable: 0 kB\nMlocked: 0 kB\nSwapTotal: 0 kB\nSwapFree: 0 kB\nDirty: 13312 kB\nWriteback: 0 kB\nAnonPages: 17753376 kB\nMapped: 14516 kB\nShmem: 148 kB\nSlab: 1297976 kB\nSReclaimable: 1219296 kB\nSUnreclaim: 78680 kB\nKernelStack: 2464 kB\nPageTables: 39308 kB\nNFS_Unstable: 0 kB\nBounce: 0 kB\nWritebackTmp: 0 kB\nCommitLimit: 31519692 kB\nCommitted_AS: 19222984 kB\nVmallocTotal: 34359738367 kB\nVmallocUsed: 114220 kB\nVmallocChunk: 34359618888 kB\nHardwareCorrupted: 0 kB\nAnonHugePages: 17432576 kB\nHugePages_Total: 0\nHugePages_Free: 0\nHugePages_Rsvd: 0\nHugePages_Surp: 0\nHugepagesize: 2048 kB\nDirectMap4k: 6144 kB\nDirectMap2M: 63993856 kB\n">>}] auto_failover001 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014 Failed over 'ns_1@ec2-####108 -.compute-1.amazonaws.com': ok ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014 Skipped vbucket activations and replication topology changes because not all remaining node were found to have healthy bucket "ifa": ['ns_1@ec2-####107.compute-1.amazonaws.com'] ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014 Shutting down bucket "ifa" on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' for deletion ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:49 - Sun Nov 16, 2014 Starting failing over 'ns_1@ec2-####108 -.compute-1.amazonaws.com' ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:48 - Sun Nov 16, 2014 Bucket "apro" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 0 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:44 - Sun Nov 16, 2014 Control connection to memcached on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' disconnected: {{badmatch, {error, timeout}}, [{mc_client_binary, cmd_vocal_recv, 5, [{file, "src/mc_client_binary.erl"}, {line, 151}]}, {mc_client_binary, select_bucket, 2, [{file, "src/mc_client_binary.erl"}, {line, 346}]}, {ns_memcached, ensure_bucket, 2, [{file, "src/ns_memcached.erl"}, {line, 1269}]}, {ns_memcached, handle_info, 2, [{file, "src/ns_memcached.erl"}, {line, 744}]}, {gen_server, handle_msg, 5, [{file, "gen_server.erl"}, {line, 604}]}, {ns_memcached, init, 1, [{file, "src/ns_memcached.erl"}, {line, 171}]}, {gen_server, init_it, 6, [{file, "gen_server.erl"}, {line, 304}]}, {proc_lib, init_p_do_apply, 3,
[{file, "proc_lib.erl"}, {line, 239}]}]} ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:44 - Sun Nov 16, 2014