gRPC Java server fails under load tester after a few days

607 views
Skip to first unread message

Edvard Fagerholm

unread,
Mar 8, 2024, 10:01:36 AM3/8/24
to grpc.io
Hi there,

I'm working on a proxy server for a set of services that use gRPC. I've been running a load tester with 1 million concurrent clients against 70 server tasks, so the number of sockets per server is not anything crazy. In terms of CPU capacity the servers are over-provisioned and running at about 30% CPU usage.

After the test has been running for a few days, some servers fail. Basically, they get stuck in GC. Logs are filled with the following:

ERROR io.netty.util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.

We also see the following in the logs (our servers don't shutdown themselves unless they get a SIGTERM and at that point they would have 60 seconds to clean up nicely, so we aren't triggering server shutdowns in our code, since the servers keep running):

io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
at io.grpc.Status.asRuntimeException(Status.java:539)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.util.concurrent.RejectedExecutionException: event executor terminated

Has anyone else had similar issues?

Best,
Edvard

Sergii Tkachenko

unread,
Mar 20, 2024, 12:16:29 AM3/20/24
to grpc.io
Hey Edvard,

Take a look at this thread: https://github.com/grpc/grpc-java/issues/4544.

> Panic is legitimate here. If the executor queue is full, the channel can't submit any work to it. It can't even fail the RPC, because the executor can't even run ClientCall.Listener.onClose(). Logging alone won't be sufficient, because the RPC would be dropped by the channel and stuck in limbo.

Let me know if this helps.

Best regards,
Sergii

Reply all
Reply to author
Forward
0 new messages