Error in c++ Sync server: "...completion_queue.cc:1242] Too many outstanding grpc_completion_queue_pluck calls: maximum is 6"

37 views
Skip to first unread message

rick...@improbable.io

unread,
Apr 23, 2020, 2:24:53 PM4/23/20
to grpc.io
I am implementing a sync server in c++. The RPC is unary server -> client. When an RPC comes in, the server handler reads some protobuf messages from files on disk and sends them to the client (via _writer->Write()) as fast as it can.

My program crashes around the same place every time (but after a variable number of messages go out). The logs are full of the message "Too many outstanding grpc_completion_queue_pluck calls: maximum is 6".

Here is the bottom of the trace logging:

I0423 14:08:17.295448000 123145533829120 channel_stack.cc:226]         OP[message_compress:0x7faf47001c40]:  SEND_MESSAGE:flags=0x00000000:len=206
I0423 14:08:17.295493000 123145533829120 channel_stack.cc:226]         OP[connected:0x7faf47001c58]:  SEND_MESSAGE:flags=0x00000000:len=206
I0423 14:08:17.295943000 123145533829120 call.cc:615]                  OP[server:0x7faf47001be0]:  SEND_MESSAGE:flags=0x00000000:len=205
I0423 14:08:17.295956000 123145533829120 channel_stack.cc:226]         OP[message_size:0x7faf47001bf8]:  SEND_MESSAGE:flags=0x00000000:len=205
I0423 14:08:17.295961000 123145533829120 channel_stack.cc:226]         OP[deadline:0x7faf47001c10]:  SEND_MESSAGE:flags=0x00000000:len=205
I0423 14:08:17.296015000 123145533829120 channel_stack.cc:226]         OP[http-server:0x7faf47001c28]:  SEND_MESSAGE:flags=0x00000000:len=205
I0423 14:08:17.296038000 123145533829120 channel_stack.cc:226]         OP[message_compress:0x7faf47001c40]:  SEND_MESSAGE:flags=0x00000000:len=205
I0423 14:08:17.296044000 123145533829120 channel_stack.cc:226]         OP[connected:0x7faf47001c58]:  SEND_MESSAGE:flags=0x00000000:len=205
D0423 14:08:17.296051000 123145533829120 completion_queue.cc:1242]     Too many outstanding grpc_completion_queue_pluck calls: maximum is 6
E0423 14:08:17.296371000 123145533829120 call_op_set.h:943]            assertion failed: GRPC_CALL_OK == g_core_codegen_interface->grpc_call_start_batch( call_.call(), ops, nops, core_cq_tag(), nullptr)

=========

I've tried with various combinations of NUM_CQS, MIN/Max_POLLERS, and resource quota SetMaxThreads. The results are always the same.

As a curious side note, my app actually spawns numerous threads to read the protobuf messages from files as fast as possible. But only the main thread issues grpc->Write() calls. It basically sits in an infinite loop, while it visits each thread's queue of protobuf messages (locked via mutex for reading and writing) and streams them all out to the client via Write(), until there are no more messages to stream.

I worry that I may have stumbled into some memory trickery, ferrying these protobuf messages across thread boundaries. But what I've read says protobuf messages are thread-safe, and I'm never modifying them.

Indeed, when I disable the multithreading in my file readers, I no longer hit this error. But I'm confused, as my Write() calls are only happening from the main thread, and only in this one exact place.

Any thoughts on how I might debug this further?

Thanks in advance!

rick...@improbable.io

unread,
Apr 23, 2020, 4:59:50 PM4/23/20
to grpc.io
Found something very suspicious.

I commented out my gRPC Write() calls entirely. So, my RPC handler on the server now does this:
- Receive a request
- Read some protobuf messages from files _and throw them away_
- Quit

Despite the fact that I NEVER send a message over gRPC, I still see messages from gRPC saying ""Too many outstanding grpc_completion_queue_pluck calls: maximum is 6". Curiously, this only happens if I have 6(!) threads reading files.

Those threaded file readers use the protobuf::util::ParseDelimitedFromZeroCopyStream utility to read messages. It seems there is some sort of nasty global / singleton, such that I can't read >= 6 messages at a time using the ZeroCopyStream and also send one over gRPC.

This seems crazy - I get gRPC warnings when I'm reading files, _doing nothing with gRPC at all_. Except for the fact that the work is initiated from a gRPC handler.

Any clues?

rick...@improbable.io

unread,
Apr 25, 2020, 3:52:59 PM4/25/20
to grpc.io
I figured it out! In case anyone finds this and is wondering, we were checking the gRPC context via context->IsCancelled() from the background threads. Despite the documentation in the headers saying this is always safe to do in a sync server, it's actually not safe to do from multiple threads in a sync worker.

We added a mutex to our code to ensure no two threads call context->IsCancelled() at the same time. It fixed the problem nicely.
Reply all
Reply to author
Forward
0 new messages