[…]14:30:10.609 [error] Ranch listener {{0,0,0,0},1883} terminated with reason: bad argument in vmq_swc_store_batcher:add_to_batch/2 line 2814:30:10.610 [error] CRASH REPORT Process <0.12525.0> with 0 neighbours crashed with reason: bad argument in vmq_swc_store_batcher:add_to_batch/2 line 2814:30:10.611 [error] Ranch listener {{0,0,0,0},1883} terminated with reason: bad argument in vmq_swc_store_batcher:add_to_batch/2 line 2814:30:10.611 [error] CRASH REPORT Process <0.12511.0> with 0 neighbours crashed with reason: bad argument in vmq_swc_store_batcher:add_to_batch/2 line 2814:30:10.611 [error] Ranch listener {{0,0,0,0},1883} terminated with reason: bad argument in vmq_swc_store_batcher:add_to_batch/2 line 2814:30:10.633 [error] vmq_queue process <0.13224.0> exit for subscriber {[],<<"68:02:b8:62:cf:d3">>} due to {badarg,[{vmq_swc_store_batcher,add_to_batch,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store_batcher.erl"},{line,28}]},{vmq_swc_store,enqueue_op_sync,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store.erl"},{line,672}]},{vmq_swc_metrics,timed_measurement,4,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_metrics.erl"},{line,102}]},{vmq_queue,handle_session_down,3,[{file,"/vernemq-build/apps/vmq_server/src/vmq_queue.erl"},{line,715}]},{gen_fsm,handle_msg,8,[{file,"gen_fsm.erl"},{line,486}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}14:30:10.645 [error] vmq_queue process <0.13226.0> exit for subscriber {[],<<"68:02:b8:62:e6:50">>} due to {badarg,[{vmq_swc_store_batcher,add_to_batch,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store_batcher.erl"},{line,28}]},{vmq_swc_store,enqueue_op_sync,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store.erl"},{line,672}]},{vmq_swc_metrics,timed_measurement,4,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_metrics.erl"},{line,102}]},{vmq_queue,handle_session_down,3,[{file,"/vernemq-build/apps/vmq_server/src/vmq_queue.erl"},{line,715}]},{gen_fsm,handle_msg,8,[{file,"gen_fsm.erl"},{line,486}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}14:30:10.652 [info] Replica meta9: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.653 [error] vmq_queue process <0.13244.0> exit for subscriber {[],<<"68:02:b8:62:f7:66">>} due to {badarg,[{vmq_swc_store_batcher,add_to_batch,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store_batcher.erl"},{line,28}]},{vmq_swc_store,enqueue_op_sync,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store.erl"},{line,672}]},{vmq_swc_metrics,timed_measurement,4,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_metrics.erl"},{line,102}]},{vmq_queue,handle_session_down,3,[{file,"/vernemq-build/apps/vmq_server/src/vmq_queue.erl"},{line,715}]},{gen_fsm,handle_msg,8,[{file,"gen_fsm.erl"},{line,486}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}14:30:10.654 [error] vmq_queue process <0.13266.0> exit for subscriber {[],<<"68:02:b8:5e:00:c8">>} due to {badarg,[{vmq_swc_store_batcher,add_to_batch,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store_batcher.erl"},{line,28}]},{vmq_swc_store,enqueue_op_sync,2,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_store.erl"},{line,672}]},{vmq_swc_metrics,timed_measurement,4,[{file,"/vernemq-build/apps/vmq_swc/src/vmq_swc_metrics.erl"},{line,102}]},{vmq_queue,handle_session_down,3,[{file,"/vernemq-build/apps/vmq_server/src/vmq_queue.erl"},{line,715}]},{gen_fsm,handle_msg,8,[{file,"gen_fsm.erl"},{line,486}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}[…]14:30:10.685 [info] Replica meta10: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.687 [info] Replica meta4: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.692 [info] Replica meta2: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.697 [info] Replica meta6: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.698 [info] Replica meta3: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.708 [info] Replica meta1: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.722 [info] Replica meta7: Peers updated ['v...@vernemq-k8s-0.vernemq-k8s-service.mqtt.svc.cluster.local','v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local']14:30:10.754 [info] Sent join request to: 'v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local'14:30:10.759 [info] Unable to connect to 'v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local'14:30:10.760 [error] Execute error: ["vmq-admin","cluster","join","discovery-node=v...@vernemq-k8s-1.vernemq-k8s-service.mqtt.svc.cluster.local"] "Couldn't join cluster due to not_reachable\n"[os_mon] memory supervisor port (memsup): Erlang has closed[os_mon] cpu supervisor port (cpu_sup): Erlang has closed14:30:11.615 [warning] lager_error_logger_h dropped 991 messages in the last second that exceeded the limit of 100 messages/sec{"Kernel pid terminated",application_controller,"{application_terminated,vmq_server,shutdown}"}Kernel pid terminated (application_controller) ({application_terminated,vmq_server,shutdown})
Hi Eduardo,
what does "driving traffic" to the cluster mean? that is, how do you actually test?
Do you separate test runs so that you don't have historical state in the cluster? has the cluster seen restarts/rescheduling of nodes due to Kubernetes? (see https://github.com/vernemq/docker-vernemq/pull/264 for possible context).
have you also tested with plumtree protocol, instead of SWC?
All in all this looks like SWC sync state problems to me due to terminated (ie. "leave"'d) cluster nodes that re-join with existing state.
Best,
Andréwhat does "driving traffic" to the cluster mean? that is, how do you actually test?
Do you separate test runs so that you don't have historical state in the cluster?
has the cluster seen restarts/rescheduling of nodes due to Kubernetes? (see https://github.com/vernemq/docker-vernemq/pull/264 for possible context).
have you also tested with plumtree protocol, instead of SWC?
All in all this looks like SWC sync state problems to me due to terminated (ie. "leave"'d) cluster nodes that re-join with existing state.
Best,
André--
You received this message because you are subscribed to the Google Groups "vernemq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vernemq-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vernemq-users/01be1412-1b3c-49d8-b69d-a71614a04a2cn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.