URGENT PLZ: ISSUE WITH CLIENT THREAD POOL WAITING STATE

126 views
Skip to first unread message

Harish Kumar Kandikattu

unread,
Jul 26, 2017, 10:32:13 AM7/26/17
to quickserver
Hello,

We have been hitting a serious issue, excess of waiting threads in client thread pool. These are accumulating over a period of time could be days,months and beyond.
We are using 1.4.7 version and I have taken thread dumps in few intervals for 2days to compute thread counts.

I have written simple quick server local client server test program sample and simulated the calls. I observed for pool size (wait+runnable threads) default is 25 and grows till 50. If incoming threads suddenly pikes to more than 50 say 150 it's creating extra 100 threads to server all 150 requests at once and now pool size is 150 but it is shrinking back to 50 after some time and this depends no. of threads in waiting state.

Below is the thread count data from many dumps of our application. The waiting threads are constantly growing here they grown from 346 to 422 in 1day and I don't understand whether claiming of waiting resources is exactly happening or not from what I observed in my above test sample. The pool count never becoming 50. This could be leak in our source code as well which we don't know and debugging for that. Any help from your end if this is known issue or pattern helps us to resolve lot faster.

This is very serious issue we are facing. ANY HELP in addressing this is promptly appreciated.

wait - Threads are in WAITING state
runnable - Threads are in RUNNABLE state
total - current active thread count
time - dump snapshot timestamp.

DUMP wait runnable total time
1 346 135 481 Tue Jul 25 08:07:59 EDT 2017
2 377 104 481 Tue Jul 25 08:08:55 EDT 2017
3 372 109 481 Tue Jul 25 08:09:54 EDT 2017
4       358 123 481 Tue Jul 25 08:10:56 EDT 2017
5            352 129 481  Tue Jul 25 08:11:56 EDT 2017
6             370 109 479  
7 383 80 463 Tue Jul 25 08:31:09 EDT 2017
8 380 75 455 Tue Jul 25 08:34:52 EDT 2017
9 373 36 412 Tue Jul 25 10:40:08 EDT 2017
10 379 30 409 Tue Jul 25 11:59:46 EDT 2017
11 377 156 533 Tue Jul 25 13:34:12 EDT 2017
12 379 154 533 Tue Jul 25 13:39:02 EDT 2017
13 389 58 447 Tue Jul 25 14:34:23 EDT 2017
14 405 35 440 Tue Jul 25 21:59:47 EDT 2017
15 408 42 450 Wed Jul 26 02:55:20 EDT 2017
16 422 36 458 Wed Jul 26 05:44:24 EDT 2017


Harish Kumar Kandikattu

unread,
Aug 9, 2017, 3:41:27 AM8/9/17
to quickserver
I think I figured out the issue. It's a synchronization issue in run method of ClientThread class happens specially in a busy system. I fixed it and started testing. It's been more than 1 week and growth and shrink behaviour of ClientPool is stable and consistent which was not the case earlier.


I will share three things here
1. Fix I made in ClientThread class.
2. Sample Monitoring code to get thread pool stats. (add this piece to infinite loop and run it in intervals from background thread)
3. What to look for to confirm fix is working.

Note: Just do only #2 and run ur program to get some stats with issues and then do all steps to get stats with issue fixed. (Before and After scenario)
Reach out to me for more needs




#1  (ClientThread class with fix, you can diff with original and find out changes)

/*

 * This file is part of the QuickServer library 

 * Copyright (C) QuickServer.org

 *

 * Use, modification, copying and distribution of this software is subject to

 * the terms and conditions of the GNU Lesser General Public License. 

 * You should have received a copy of the GNU LGP License along with this 

 * library; if not, you can download a copy from <http://www.quickserver.org/>.

 *

 * For questions, suggestions, bug-reports, enhancement-requests etc.

 * visit http://www.quickserver.org

 *

 */


package org.quickserver.util.pool.thread;


import java.util.logging.*;

import java.util.*;

import org.quickserver.util.MyString;

import org.quickserver.net.server.ClientHandler;

import org.quickserver.net.server.ClientEvent;


/**

 * This is the worker thread used to handle clients using 

 * {@link org.quickserver.net.server.ClientHandler}

 * @author Akshathkumar Shetty

 * @since 1.3

 */

public class ClientThread extends Thread {

    private static Logger logger = Logger.getLogger(ClientThread.class.getName());

    private static Map idMap = new HashMap();


    private String name = "<ClientThread-Pool#";

    private ClientPool pool;

    private Runnable client;

    private int id;

    private boolean ready = false;


    /**

     * Holds the current Thread state. <code><pre>

        U = Unknown

        S = Started

        R - Running a client

        I = Idle

        L = Looking for client

        P = Was sent back to pool

        W = Waiting in pool     

        N = Was notified, Looking for client

        D = Dead

        </pre></code>

    */

    protected volatile char state = 'U';


    public boolean isReady() {

        return ready;

    }


    public void clean() {

        client = null;

    }


    public ClientThread(ClientPool pool) {

        this(pool, -1);

    }


    static class InstanceId {

        private int id = 0;

        public int getNextId() {

            return ++id;

        }

    };


    private static int getNewId(int instanceCount) {

        InstanceId instanceId = (InstanceId) idMap.get(""+instanceCount);

        if(instanceId==null) {

            instanceId = new InstanceId();

            idMap.put(""+instanceCount, instanceId);

        }

        return instanceId.getNextId();

    }


    public ClientThread(ClientPool pool, int instanceCount) {

        id = getNewId(instanceCount);

        name = name+instanceCount+"-ID:"+id+">";

        this.pool = pool;

        setName(name);

    }


    public int getInstanceId() {

        return id;

    }


    private void executeClient() {

        boolean niowriteFlag = false;

        state = 'R';


        if(ClientHandler.class.isInstance(client)) {

            niowriteFlag = ((ClientHandler) client).isClientEventNext(ClientEvent.WRITE);

            if(niowriteFlag) {

                pool.nioWriteStart();

            }

        } else {

            niowriteFlag = false;

        }


        try {

            client.run();

        } catch(RuntimeException e) {

            logger.warning("RuntimeException @ thread run() : "+getName()+": "+MyString.getStackTrace(e));

        } finally {

            if(niowriteFlag) {

                pool.nioWriteEnd();

            }

        }

        state = 'I';

    }


    public void run() {

        state = 'S';

        

        synchronized(pool) {

            if(pool.isClientAvailable()==true) {

                ready = true;

                pool.notify();

            }

        }


        boolean returnToPool = false;

        while(true) {

            if(ready) {

                state = 'L';

                client = pool.getClient();

                if(client==null) {

                    logger.fine("ClientPool returned a null client! Other Thread must have taken my client.. Ok");

                } else {

                    executeClient();                    

                    logger.finest("Client returned the thread: "+getName());

                    client = null;

                    if(pool==null) {

                        logger.fine("Could not returning client thread "+getName()+", pool was null!");

                        state = 'D';

                        break;

                    }

                }


                /*if(pool.isClientAvailable()==true) {

                    state = 'L';

                    continue;

                }*/

                

                returnToPool = true;

            } //end if ready

            

            synchronized(this) {

                if(ready==false) ready = true;


                if(returnToPool) {

                    logger.finest("Returning client thread to pool: "+getName());

                    pool.returnObject(ClientThread.this);

                    returnToPool = false;

                    state = 'P';

                }

                

                try {

                    state = 'W';

                    wait();

                    state = 'N';

                } catch(InterruptedException e) {

                    logger.finest("Closing thread "+Thread.currentThread().getName()+" since interrupted.");

                    state = 'D';

                    break;

                }               

            }

        }//end while

    }


    /**

     * Returns the {@link org.quickserver.net.server.ClientHandler} beeing 

     * run by the ClientThread.

     * @since 1.3.1

     */

    public Runnable getThread() {

        return client;

    }


    /**

     * [ThreadInPool[<Instance Count>]:<id>] - <state> - Client {ClientHandler:...}

     * @since 1.4.1

     */

    public String toString() {

        return super.toString()+" - "+state+" - Client "+client;

    }

}



#2  Sample monitoring code

    System.out.println(monitorClientThreadPool());

    private String monitorClientThreadPool() {

        QuickServer server = myServer// Assign your application's server object reference here

        int count = Thread.getAllStackTraces().size();

        if (server != null) {

            ClientPool clientpool = server.getClientPool();

            ObjectPool handlerpool = server.getClientHandlerPool();

            ObjectPool datapool = server.getClientDataPool();

            ObjectPool bytebufferpool = server.getByteBufferPool();

            int max_idle = server.getClientPool().getPoolConfig().getMaxIdle();

            int max_active = server.getClientPool().getPoolConfig().getMaxActive();

            long client_count=server.getClientCount();

            StringBuffer msg = new StringBuffer();


            msg.append("{ MonitoringServer : ")

            .append("{ ApproxTotalThreads:")

            .append(count)

            .append(", Max_Idle:")

            .append(max_idle)

            .append(", Max_Active:")

            .append(max_active)

            .append(", NumOfClientsConnected:")

            .append(client_count)

            .append(", [ ");


            if (clientpool != null) {

                int cp_active = clientpool.getNumActive();

                int cp_idle = clientpool.getNumIdle();

                msg.append("{ Type:ClientPool, Total:")

                .append(cp_active + cp_idle)

                .append(", Active:")

                .append(cp_active)

                .append(", Idle:")

                .append(cp_idle)

                .append(" }");

            }


            if (handlerpool != null) {

                int h_active = handlerpool.getNumActive();

                int h_idle = handlerpool.getNumIdle();

                msg.append(", { Type:ClientHandlerPool, Total:")

                .append(h_active + h_idle)

                .append(", Active:")

                .append(h_active)

                .append(", Idle:")

                .append(h_idle)

                .append(" }");

            }


            if (datapool != null) {

                int d_active = datapool.getNumActive();

                int d_idle = datapool.getNumIdle();

                msg.append(", { Type:ClientDataPool, Total:")

                .append(d_active + d_idle)

                .append(", Active:")

                .append(d_active)

                .append(", Idle:")

                .append(d_idle)

                .append(" }");

            }


            if (bytebufferpool != null) {

                int b_active = bytebufferpool.getNumActive();

                int b_idle = bytebufferpool.getNumIdle();

                msg.append(", { Type:ByteBufferPool, Total:")

                .append(b_active + b_idle)

                .append(", Active:")

                .append(b_active)

                .append(", Idle:")

                .append(b_idle)

                .append(" }");

            }


            msg.append(" ] } }");

            return msg.toString();

        }

        return "{ MonitoringServer : NoDataFound }";

    }



#3  Validating the fix (with monitor code in place)


Steps to fetch data

To get more accurate data,  perform steps 1 & 2 with minimal delay in time 20 to 50 seconds should be good.

1. Open log file and search for text "MonitoringServer" and make a note of monitor message. Below is sample monitor message.

''{ MonitoringServer : { ApproxTotalThreads:105, Max_Idle:50, Max_Active:-1, NumOfClientsConnected:11, [ { Type:ClientPool, Total:60, Active:11, Idle:49 }, { Type:ClientHandlerPool, Total:60, Active:11, Idle:49 } ] } }"

2. Get thread dump data

kill -3 <PID of your program>

3. Open thread dump file, select all and copy.

4. Use online free thread dump analyzer tool. https://spotify.github.io/threaddump-analyzer 

5. After opening link. Paste the content copied in step3 into text area and click Analyze button. You should see grouping of similar thread stack traces with counts.

Validating fetched data

All below validations indicates that quick server's thread pool behaviour is stable.

1. ClientPool and ClientHandlerPool should always have exact Total,Active and Idle counts.

    { Type:ClientPool, Total:60, Active:11, Idle:49 }

    { Type:ClientHandlerPool, Total:60, Active:11, Idle:49 }

2. Number of clients connected, ClientPool's and ClientHandlerPool's Active count should always be same.

    NumOfClientsConnected:11 

   { Type:ClientPool, Total:60, Active:11, Idle:49 } 

   { Type:ClientHandlerPool, Total:60, Active:11, Idle:49 }

3. ClientPool's and ClientHandlerPool's Idle count should always <=50.

4. Compare ClientPool data and thread dump data. ClientPool's Idle count and awaiting thread count at this line "org.quickserver.util.pool.thread.ClientThread.run(ClientThread.java:166)" should almost match.

Monitor data

 { Type:ClientPool, Total:60, Active:11, Idle:49 }

Data from thread dump analyzer tool.

49 threads with this stack:

"<ClientThread-Pool#1-ID:9>": awaiting notification on [0x00000000b080fe70]

at java.lang.Object.wait(Native Method)

at java.lang.Object.wait(Object.java:502)

at org.quickserver.util.pool.thread.ClientThread.run(ClientThread.java:166)

5. ClientPool's Active count and Actively running thread count should almost match.

Monitor data

 { Type:ClientPool, Total:60, Active:11, Idle:49 } 

Data from thread dump analyzer tool.

11 threads with this stack:

"<ClientThread-Pool#1-ID:25>": running, holding [0x00000000b19aeed8]

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)

at java.net.SocketInputStream.read(SocketInputStream.java:171)

at java.net.SocketInputStream.read(SocketInputStream.java:141)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)

at java.io.BufferedInputStream.read(BufferedInputStream.java:265)

at org.quickserver.net.server.impl.BasicClientHandler.readInputStream(BasicClientHandler.java:895)

at org.quickserver.net.server.impl.BlockingClientHandler.readInputStream(BlockingClientHandler.java:484)

at org.quickserver.net.server.impl.BasicClientHandler.readBinary(BasicClientHandler.java:1111)

at org.quickserver.net.server.impl.BlockingClientHandler.processRead(BlockingClientHandler.java:355)

at org.quickserver.net.server.impl.BlockingClientHandler.run(BlockingClientHandler.java:193)

at org.quickserver.util.pool.thread.ClientThread.executeClient(ClientThread.java:107)

at org.quickserver.util.pool.thread.ClientThread.run(ClientThread.java:136)

Akshath

unread,
Aug 12, 2017, 7:21:23 AM8/12/17
to quickserver
Thanks for reporting the issue and providing the possible fix.

We have investigated this issue and based on your feedback and our investigation, we have release new version of QuickServer 2.1.0.

Reply all
Reply to author
Forward
0 new messages