Glowroot central collector hangs on waiting for semaphores in Sesson.java

206 views
Skip to first unread message

Goldy Liang

unread,
Jun 21, 2021, 11:55:10 AM6/21/21
to Glowroot
Hi,

Not sure if others have encoutered similar thing.

What we are facing is that, glowroot occasionally hangs on hundreds or even thousands of threads all waiting for semaphores in Session.java, which purpose is the throttling.

Meanwhile, I don't seem to find active threads in Cassandra which is doing write/read, or even flushing. The CPU/GC in Cassandra is always low.

I added glowroot agent into the central collector itself to monitor the trend of semaphore permits.  And I can see that the available permits are fluctuating and for a long time both read and write permits are down to zero, then suddenly coming back. As below:

glowroot_gauges_2.png

Any idea what could be the issue?

Attached the links of the thread dumps from both glowroot collector and Cassandra when the permits are down to almost zero.




Thank you!

Goldy Liang

unread,
Jun 21, 2021, 12:00:46 PM6/21/21
to Glowroot
Any chance this is a hidden issue in the way Session.java acquires and releases the semaphores?

In below code:

    private static ListenableFuture<ResultSet> throttle(DoUnderThrottle doUnderThrottle,
            Semaphore overallSemaphore) throws Exception {
        overallSemaphore.acquire();
        SettableFuture<ResultSet> outerFuture = SettableFuture.create();
        ResultSetFuture innerFuture;
        try {
            innerFuture = doUnderThrottle.execute();
        } catch (Throwable t) {
            overallSemaphore.release();
            throw t;
        }
        Futures.addCallback(innerFuture, new FutureCallback<ResultSet>() {
            @Override
            public void onSuccess(ResultSet result) {
                overallSemaphore.release();
                outerFuture.set(result);
            }
            @Override
            public void onFailure(Throwable t) {
                overallSemaphore.release();
                outerFuture.setException(t);
            }
        }, MoreExecutors.directExecutor());
        return outerFuture;
    }

The release of semaphore normally relies on the call back. Is it a chance that the call backs are not called in time?  Or the executor is not executing the async call at all for some reason?

Goldy Liang

unread,
Jun 25, 2021, 8:31:31 AM6/25/21
to Glowroot
Further information:

The issue seems not related to the semaphores. I have modified the code a bit by removing the semaphores but then soon after startup, it gets errors of out of Cassandra connection.

Furthermore, I added back the semaphores and increased the Cassandra connection pool to be maximum 10 (which default is 1 only), the number of maximum queries for one connection set to 1024, and the number of concurrent queries to be totally 10 * 1024.  Even with that, the available semaphores for reading and write decrease to zero from time to time, and occasionally errors of Cassandra not responding occurs. Meanwhile, still, the glowroot and Cassandra is always pretty much idle.

It seems like Cassandra is in the on/off by performing requests. However I don't see any clues from Cassandra logs.
Reply all
Reply to author
Forward
0 new messages