Nodejs processes getting stuck on grpc mutex after upgrading to 14.1

16 views
Skip to first unread message

Abhishek Parmar

unread,
Oct 22, 2018, 8:49:30 PM10/22/18
to grp...@googlegroups.com
Hi,
On one of our servers we have a few grpc processes that talk to each other. Some processes are servers only (in C++) and others are both clients and servers(noodejs). The nodejs process are getting stuck after we reboot the server (the various processes get started about the same time by the init system). But if we restart/kill the processes they work fine.

Here is a stack trace and debug log. Please let me know if this looks familiar, otherwise I will keep digging.

Thanks
Abhishek

Backtrace from stuck nodejs process

#0  0x00007fc359f8626d in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0

#1  0x00007fc359f7fdbd in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0

#2  0x00007fc3445ad319 in gpr_mu_lock (mu=<optimized out>) at ../deps/grpc/src/core/lib/gpr/sync_posix.cc:47

#3  0x00007fc34453747f in connected (arg=0x2758090, error=0x28d70a0) at ../deps/grpc/src/core/ext/transport/chttp2/client/chttp2_connector.cc:172

#4  0x00007fc3445132c1 in exec_ctx_run (error=0x28d70a0, closure=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:40

#5  grpc_core::ExecCtx::Flush (this=this@entry=0x7fffe53398e0) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:128

#6  0x00007fc34459fffe in ~ExecCtx (this=0x7fffe53398e0, __in_chrg=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.h:108

#7  custom_connect_callback (socket=<optimized out>, error=0x28d70a0) at ../deps/grpc/src/core/lib/iomgr/tcp_client_custom.cc:100

#8  0x00007fc34453716a in chttp2_connector_connect (con=0x2758090, args=0x7fffe5339a50, result=0x299cde0, notify=0x299cdf0) at ../deps/grpc/src/core/ext/transport/chttp2/client/chttp2_connector.cc:215

#9  0x00007fc3445457dc in continue_connect_locked (c=0x299cdb0) at ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:408

#10 maybe_start_connecting_locked (c=0x299cdb0) at ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:483

#11 0x00007fc344546415 in grpc_subchannel_notify_on_state_change (c=0x299cdb0, interested_parties=0xdeafbeef, state=0x2800bbc, notify=<optimized out>) at ../deps/grpc/src/core/ext/filters/client_channel/subchannel.cc:532

#12 0x00007fc344590e07 in grpc_core::SubchannelData<grpc_core::(anonymous namespace)::PickFirst::PickFirstSubchannelList, grpc_core::(anonymous namespace)::PickFirst::PickFirstSubchannelData>::StartConnectivityWatchLocked (

    this=<optimized out>) at ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:304

#13 0x00007fc344591c70 in StartPickingLocked (this=0x2800770) at ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:249

#14 grpc_core::(anonymous namespace)::PickFirst::PickLocked (this=0x2800770, pick=0x28052a0) at ../deps/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:270

#15 0x00007fc344540e2a in StartLocked (elem=<optimized out>) at ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2687

#16 process_service_config_and_start_lb_pick_locked (elem=0x2805170) at ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2815

#17 grpc_core::ResolverResultWaiter::DoneLocked (arg=0x28d26c0, error=0x0) at ../deps/grpc/src/core/ext/filters/client_channel/client_channel.cc:2920

#18 0x00007fc344559332 in grpc_combiner_continue_exec_ctx () at ../deps/grpc/src/core/lib/iomgr/combiner.cc:260

#19 0x00007fc3445132e4 in grpc_core::ExecCtx::Flush (this=this@entry=0x7fffe5339c10) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.cc:131

#20 0x00007fc34455a64a in ~ExecCtx (this=0x7fffe5339c10, __in_chrg=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/exec_ctx.h:108

#21 grpc_custom_resolve_callback (r=0x27f3b40, result=0x2730ac0, error=<optimized out>) at ../deps/grpc/src/core/lib/iomgr/resolve_address_custom.cc:85

#22 0x000000000143c395 in ?? ()

#23 0x000000000143e44b in ?? ()

#24 0x000000000144ffa8 in ?? ()

#25 0x000000000143edd6 in uv_run ()

#26 0x000000000122c1bd in node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) ()

#27 0x0000000001224d03 in node::Start(int, char**) ()

#28 0x00007fc359bcc830 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6

#29 0x00000000008aee41 in _start ()


verbose grpc log


2018-10-22T16:15:39,402380574-0700 E: D1022 16:15:39.401908859    2103 dns_resolver.cc:339]        Using native dns resolver

2018-10-22T16:15:39,402862016-0700 E: I1022 16:15:39.401936640    2103 timer_manager.cc:85]        Spawn timer thread

2018-10-22T16:15:39,403382056-0700 E: I1022 16:15:39.401955746    2103 init.cc:153]                grpc_init(void)

2018-10-22T16:15:39,403875574-0700 E: I1022 16:15:39.401960577    2198 timer_manager.cc:235]       timers not checked: expect another thread to

2018-10-22T16:15:39,404425511-0700 E: I1022 16:15:39.401969527    2198 timer_manager.cc:183]       sleep until kicked

2018-10-22T16:15:39,404906467-0700 E: I1022 16:15:39.402227144    2103 completion_queue.cc:431]    grpc_completion_queue_create_internal(completion_type=0, polling_type=0)

2018-10-22T16:15:40,582796872-0700 E: I1022 16:15:40.582192594    2103 server.cc:944]              grpc_server_create(0x2e29c70, (nil))

2018-10-22T16:15:40,583379934-0700 E: I1022 16:15:40.582212478    2103 server.cc:930]              grpc_server_register_completion_queue(server=0x2ddd2b0, cq=0x2de06b0, reserved=(nil))

2018-10-22T16:15:40,583888435-0700 E: Method handler requestAction for /c3d.AdminServiceInterface/RequestAction expected but not provided

2018-10-22T16:15:40,584442345-0700 E: I1022 16:15:40.582867963    2103 server_chttp2.cc:34]        grpc_server_add_insecure_http2_port(server=0x2ddd2b0, addr=0.0.0.0:1111)

2018-10-22T16:15:40,584957465-0700 E: I1022 16:15:40.582899401    2103 tcp_server_custom.cc:380]   SERVER 0x2dbb660 add_port [::]:1111 error="No Error"

2018-10-22T16:15:40,590308691-0700 E: I1022 16:15:40.589690840    2103 channel_create.cc:93]       grpc_insecure_channel_create(target=myteddy:1100, args=0x2eed230, reserved=(nil))

2018-10-22T16:15:40,591341636-0700 E: I1022 16:15:40.590835146    2103 channel.cc:267]             grpc_channel_get_target(channel=0x2e29600)

2018-10-22T16:15:40,591929944-0700 E: I1022 16:15:40.591224466    2103 metadata_array.cc:29]       grpc_metadata_array_init(array=0x2d3a048)

2018-10-22T16:15:40,592455185-0700 E: I1022 16:15:40.591235015    2103 metadata_array.cc:29]       grpc_metadata_array_init(array=0x2d7fd48)

2018-10-22T16:15:40,592997632-0700 E: I1022 16:15:40.591240219    2103 call.cc:1983]               grpc_call_start_batch(call=0x2ed3770, ops=0x2ea14d0, nops=2, tag=0x2d388f0, reserved=(nil))

2018-10-22T16:15:40,593465266-0700 E: I1022 16:15:40.591244610    2103 call.cc:1593]               ops[0]: SEND_INITIAL_METADATA(nil)

2018-10-22T16:15:40,593928046-0700 E: I1022 16:15:40.591247538    2103 call.cc:1593]               ops[1]: RECV_INITIAL_METADATA ptr=0x2d7fd48

2018-10-22T16:15:40,594908524-0700 E: I1022 16:15:40.591251816    2103 call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780] closure=0x2ed4690 [executing batch] error="No Error"

2018-10-22T16:15:40,595398831-0700 E: I1022 16:15:40.591254354    2103 call_combiner.cc:76]          size: 0 -> 1

2018-10-22T16:15:40,595937480-0700 E: I1022 16:15:40.591256477    2103 call_combiner.cc:85]          EXECUTING IMMEDIATELY

2018-10-22T16:15:40,596469538-0700 E: I1022 16:15:40.591263238    2103 call.cc:638]                OP[client-channel:0x2ed4110]:  SEND_INITIAL_METADATA{key=3a 70 61 74 68 ':path' value=2f 63 33 64 2e 7a 69 6e 63 2e 5a 69 6e 63 2f 41 63 74 69 6f 6e 53 74 72 65 61 6d '/c3d.zinc.Zinc/ActionStream'} RECV_INITIAL_METADATA

2018-10-22T16:15:40,597051866-0700 E: I1022 16:15:40.591266381    2103 client_channel.cc:1116]     chand=0x2e296c0 calld=0x2ed4130: adding pending batch at index 0

2018-10-22T16:15:40,597583780-0700 E: I1022 16:15:40.591268938    2103 client_channel.cc:3064]     chand=0x2e296c0 calld=0x2ed4130: entering client_channel combiner

2018-10-22T16:15:40,598104257-0700 E: I1022 16:15:40.591274987    2103 client_channel.cc:218]      chand=0x2e296c0: starting name resolution

2018-10-22T16:15:40,599053615-0700 E: D1022 16:15:40.591277545    2103 dns_resolver.cc:280]        Start resolving.

2018-10-22T16:15:40,599580345-0700 E: I1022 16:15:40.591284535    2103 client_channel.cc:2828]     chand=0x2e296c0 calld=0x2ed4130: deferring pick pending resolver result

2018-10-22T16:15:40,600090104-0700 E: I1022 16:15:40.591287019    2103 call_combiner.cc:165]       call_combiner=0x2ed3780: setting notify_on_cancel=0x2de2850

2018-10-22T16:15:40,600662796-0700 E: I1022 16:15:40.591299744    2103 metadata_array.cc:29]       grpc_metadata_array_init(array=0x2da7898)

2018-10-22T16:15:40,601247786-0700 E: I1022 16:15:40.591303804    2103 call.cc:1983]               grpc_call_start_batch(call=0x2ed3770, ops=0x2eed050, nops=1, tag=0x2d23af0, reserved=(nil))

2018-10-22T16:15:40,601765545-0700 E: I1022 16:15:40.591307696    2103 call.cc:1593]               ops[0]: RECV_STATUS_ON_CLIENT metadata=0x2da7898 status=0x2da78b0 details=0x2da78b8

2018-10-22T16:15:40,602248272-0700 E: I1022 16:15:40.591310486    2103 call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780] closure=0x2ed4790 [executing batch] error="No Error"

2018-10-22T16:15:40,602770292-0700 E: I1022 16:15:40.591312823    2103 call_combiner.cc:76]          size: 1 -> 2

2018-10-22T16:15:40,603249135-0700 E: I1022 16:15:40.591315028    2103 call_combiner.cc:91]          QUEUING

2018-10-22T16:15:40,603754515-0700 E: I1022 16:15:40.596813629    2103 server.cc:1024]             grpc_server_start(server=0x2ddd2b0)

2018-10-22T16:15:40,604252793-0700 E: I1022 16:15:40.596825047    2103 executor.cc:200]            EXECUTOR schedule 0x2ed5650 inline

2018-10-22T16:15:40,604735654-0700 E: I1022 16:15:40.596829255    2103 tcp_server_custom.cc:422]   SERVER_START 0x2dbb660

2018-10-22T16:15:40,605185438-0700 E: I1022 16:15:40.596835319    2103 call_details.cc:31]         grpc_call_details_init(cd=0x2dabdf0)

2018-10-22T16:15:40,605648237-0700 E: I1022 16:15:40.596838124    2103 metadata_array.cc:29]       grpc_metadata_array_init(array=0x2dabe50)

2018-10-22T16:15:40,606157918-0700 E: I1022 16:15:40.596843515    2103 server.cc:1352]             grpc_server_request_call(server=0x2ddd2b0, call=0x2dabde8, details=0x2dabdf0, initial_metadata=0x2dabe50, cq_bound_to_call=0x2de06b0, cq_for_notification=0x2de06b0, tag=0x2dbd1d0)

2018-10-22T16:15:40,639462920-0700 E: Mon, 22 Oct 2018 23:15:40 GMT sequelize deprecated String based operators are now deprecated. Please use Symbol based operators for better security, read more at http://docs.sequelizejs.com/manual/tutorial/querying.html#operators at ../../node_modules/sequelize/lib/sequelize.js:242:13

2018-10-22T16:15:40,644245283-0700 E: (node:2103) DeprecationWarning: grpc.load: Use the @grpc/proto-loader module with grpc.loadPackageDefinition instead

2018-10-22T16:15:40,644791789-0700 E: (node:2103) DeprecationWarning: Server#addProtoService: Use Server#addService instead

2018-10-22T16:15:40,645336094-0700 E: I1022 16:15:40.643930321    2103 call.cc:1983]               grpc_call_start_batch(call=0x2ed3770, ops=0x2ea14d0, nops=1, tag=0x2da4260, reserved=(nil))

2018-10-22T16:15:40,645932732-0700 E: I1022 16:15:40.643939646    2103 call.cc:1593]               ops[0]: RECV_MESSAGE ptr=0x2da4298

2018-10-22T16:15:40,646470991-0700 E: I1022 16:15:40.643952248    2103 call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780] closure=0x2ed4890 [executing batch] error="No Error"

2018-10-22T16:15:40,647038084-0700 E: I1022 16:15:40.643954807    2103 call_combiner.cc:76]          size: 2 -> 3

2018-10-22T16:15:40,647569655-0700 E: I1022 16:15:40.643956732    2103 call_combiner.cc:91]          QUEUING

2018-10-22T16:15:40,648086615-0700 E: I1022 16:15:40.647123516    2103 call.cc:1983]               grpc_call_start_batch(call=0x2ed3770, ops=0x2ea4370, nops=1, tag=0x2ee2140, reserved=(nil))

2018-10-22T16:15:40,648577680-0700 E: I1022 16:15:40.647131625    2103 call.cc:1593]               ops[0]: SEND_MESSAGE ptr=0x2d6bee0

2018-10-22T16:15:40,649081272-0700 E: I1022 16:15:40.647135137    2103 call_combiner.cc:67]        ==> grpc_call_combiner_start() [0x2ed3780] closure=0x2ed4990 [executing batch] error="No Error"

2018-10-22T16:15:40,649534561-0700 E: I1022 16:15:40.647136958    2103 call_combiner.cc:76]          size: 3 -> 4

2018-10-22T16:15:40,649982921-0700 E: I1022 16:15:40.647138389    2103 call_combiner.cc:91]          QUEUING

2018-10-22T16:15:40,650453200-0700 E: I1022 16:15:40.647217372    2103 completion_queue.cc:851]    grpc_completion_queue_next(cq=0x2de06b0, deadline=gpr_timespec { tv_sec: -9223372036854775808, tv_nsec: 0, clock_type: 0 }, reserved=(nil))

2018-10-22T16:15:40,650934867-0700 E: I1022 16:15:40.647225719    2103 completion_queue.cc:951]    RETURN_EVENT[0x2de06b0]: QUEUE_TIMEOUT

2018-10-22T16:15:40,651414350-0700 E: I1022 16:15:40.647566577    2103 client_channel.cc:485]      chand=0x2e296c0: got resolver result: resolver_result=0x2d80140 error="No Error"

2018-10-22T16:15:40,651904090-0700 E: I1022 16:15:40.647574780    2103 pick_first.cc:154]          Pick First 0x2db0d10 created.

2018-10-22T16:15:40,652397957-0700 E: I1022 16:15:40.647578030    2103 pick_first.cc:382]          Pick First 0x2db0d10 received update with 1 addresses

2018-10-22T16:15:40,652877632-0700 E: I1022 16:15:40.647580977    2103 subchannel_list.h:458]      [pick_first 0x2db0d10] Creating subchannel list 0x2ea39f0 for 1 subchannels

2018-10-22T16:15:40,653356788-0700 E: I1022 16:15:40.647619797    2103 subchannel_list.h:498]      [pick_first 0x2db0d10] subchannel list 0x2ea39f0 index 0: Created subchannel 0x2d86ed0 for address uri ipv4:10.12.61.201:1100

2018-10-22T16:15:40,653804302-0700 E: I1022 16:15:40.647624418    2103 client_channel.cc:398]      chand=0x2e296c0: created new LB policy "pick_first" (0x2db0d10)

2018-10-22T16:15:40,654245777-0700 E: I1022 16:15:40.647627929    2103 connectivity_state.cc:92]   CONWATCH: 0x2db0ef8 pick_first: get IDLE

2018-10-22T16:15:40,654759956-0700 E: I1022 16:15:40.647630687    2103 connectivity_state.cc:116]  CONWATCH: 0x2db0ef8 pick_first: from IDLE [cur=IDLE] notify=0x2d871a8

2018-10-22T16:15:40,655252548-0700 E: I1022 16:15:40.647634030    2103 client_channel.cc:177]      chand=0x2e296c0: setting connectivity state to IDLE

2018-10-22T16:15:40,655740066-0700 E: I1022 16:15:40.647636773    2103 connectivity_state.cc:164]  SET: 0x2e29748 client_channel: IDLE --> IDLE [resolver_result] error=(nil) "No Error"

2018-10-22T16:15:40,656232847-0700 E: I1022 16:15:40.647639358    2103 client_channel.cc:2917]     chand=0x2e296c0 calld=0x2ed4130: resolver returned, doing LB pick

2018-10-22T16:15:40,656696185-0700 E: I1022 16:15:40.647641353    2103 client_channel.cc:2756]     chand=0x2e296c0 calld=0x2ed4130: applying service config to call

2018-10-22T16:15:40,657147226-0700 E: I1022 16:15:40.647642836    2103 client_channel.cc:2665]     chand=0x2e296c0 calld=0x2ed4130: starting pick on lb_policy=0x2db0d10

2018-10-22T16:15:40,657601911-0700 E: I1022 16:15:40.647645073    2103 subchannel_list.h:292]      [pick_first 0x2db0d10] subchannel list 0x2ea39f0 index 0 of 1 (subchannel 0x2d86ed0): starting watch: requesting connectivity change notification (from IDLE)

2018-10-22T16:15:40,658032165-0700 E: I1022 16:15:40.647646950    2103 connectivity_state.cc:116]  CONWATCH: 0x2d86fa0 subchannel: from IDLE [cur=IDLE] notify=0x2d38578

2018-10-22T16:15:40,658462296-0700 E: I1022 16:15:40.647649274    2103 connectivity_state.cc:164]  SET: 0x2d86fa0 subchannel: IDLE --> CONNECTING [state_change] error=(nil) "No Error"

2018-10-22T16:15:40,658876797-0700 E: I1022 16:15:40.647650773    2103 connectivity_state.cc:190]  NOTIFY: 0x2d86fa0 subchannel: 0x2d38578

2018-10-22T16:15:40,659315104-0700 E: I1022 16:15:40.647657057    2103 tcp_client_custom.cc:139]   CLIENT_CONNECT: 0x2da77c0 ipv4:10.12.61.201:1100: asynchronously connecting

2018-10-22T16:15:40,659745439-0700 E: I1022 16:15:40.647676597    2103 tcp_client_custom.cc:69]    CLIENT_CONNECT: ipv4:10.12.61.201:1100: on_alarm: error="Cancelled"



Abhishek Parmar

unread,
Oct 22, 2018, 8:59:50 PM10/22/18
to grp...@googlegroups.com
It sounds similar to the bug reported here?


--
-Abhishek

Reply all
Reply to author
Forward
0 new messages