deadlock in a elliptics callback thread

45 views
Skip to first unread message

Denis Sotnikov

unread,
Nov 10, 2016, 5:15:55 PM11/10/16
to reverbrain
hello!
i have noticed a problem in elliptics client(and cocaine c++ apps that using ServicePool helper class written by Polyakov too).
when u r in elliptics thread(callback was called) and trying to make another request to elliptics u can get a lock.
it happens because client is trying to reconnect to service(if reconnect_timeout is expired). 
in cocaine framework it hangs in service_client/manager.cpp:192(v.0.11 branch)
service->connect().get();
the reconnect event was posted to thread we stay at. and nothing can't set value to promise, so waiting on get() is as long as infinity

Evgeniy Polyakov

unread,
Nov 11, 2016, 12:24:23 PM11/11/16
to Denis Sotnikov, reverbrain
Hi Denis

11.11.2016, 01:15, "Denis Sotnikov" <deanso...@gmail.com>:
You generally can not invoke blocking operations from elliptics callbacks, since they grab one IO thread.
If there are multiple block calls in different threads, IO pool can be exhausted and system will stuck.

Solution is to call async connect and attach completion callback.
Message has been deleted
Message has been deleted
Message has been deleted

Evgeniy Polyakov

unread,
Nov 14, 2016, 5:27:14 PM11/14/16
to Denis Sotnikov, reverbrain
Hi Denis

12.11.2016, 16:19, "Denis Sotnikov" <deanso...@gmail.com>:
> i do not make blocking calls from elliptics thread.
> i call async write_data with attached callback and get stucked.
> this happens if i set reconnect_timeout in session = 1s.
> so during 2nd call(from elliptics thread) client tries to reconnect by itself (i do nothing)
> and get stucked.
> this happens because you posting reconnect event to the same elliptics thread and waiting on future

Reconnect does not start in the context of IO thread, i.e. when you call write_data() it can not end up stucking in waiting
for reconnection. There is a dedicated reconnection thread which performs all the work.

write_data() merely allocates transaction structures and queues them into appropriate remote states or sometimes it calls
completion callback directly, i.e. not postponing this to IO completion threads.

Please show me your client log (at the highest possible log level) and code if you can
Reply all
Reply to author
Forward
0 new messages