"Single shoot" threads

49 views
Skip to first unread message

Jarek Poświata

unread,
Sep 12, 2014, 11:44:00 AM9/12/14
to nx...@googlegroups.com
I try to run nxweb on virtualBox (with CentOS 7.0). By default it starts single network worker.

First.. I found messages "job not done in XXXXX steps". What does it mean? And is there any reason for two __sync_synchronize() in nx_workers.c:184, is it connected?

Second.. During digging.. I noticed that "sometime" threads are started just to handle single job. Queue always has 127 workers ("free workers"?) and new ones are created?
I added counters to check numbers of threads and running threads, and it looks that no thread is processing request..

Yaroslav

unread,
Sep 12, 2014, 12:38:31 PM9/12/14
to nx...@googlegroups.com
Hi Jarek,

On Fri, Sep 12, 2014 at 7:44 PM, Jarek Poświata <pajak...@gmail.com> wrote:
I try to run nxweb on virtualBox (with CentOS 7.0). By default it starts single network worker.

Number of network threads should be equal to the number of CPU cores. Do you have single CPU in your virtual machine?
 

First.. I found messages "job not done in XXXXX steps". What does it mean? And is there any reason for two __sync_synchronize() in nx_workers.c:184, is it connected?

"job not done" is a consequence of mutex-free implementation. Memory view is not updated immediately between CPU cores. I could not find better solution than just loop until it is synchronized (sort of spin-lock). This should not happen often under stable workload. Usually these messages appear at the very start when launching many workers at once (during benchmarking).

Strange that this is happening on a single-core system though. Very strange. Maybe it is consequence of using virtual machine...

__sync_synchronize() is a memory barrier. It is meant to ensure that job_done flag is not set before other fields in worker struct and eventfd is only fired after job_done is set. This is important for thread synchronization in the absence of mutexes.
 

Second.. During digging.. I noticed that "sometime" threads are started just to handle single job. Queue always has 127 workers ("free workers"?) and new ones are created?

Worker thread pool is meant to be dynamic. More threads get created on demand. Extra threads get killed and destroyed by garbage collector when load goes down. This is controlled by #defines in nx_workers.h Note that there are separate thread pools for each network thread so that we can avoid synchronization issues between network threads. Defaults for each thread pool:

#define NXWEB_MAX_WORKERS 512 // max. running+idle workers
#define NXWEB_MAX_IDLE_WORKERS_IN_QUEUE 16 // idle workers beyond that get garbage collected
#define NXWEB_MAX_WORKERS_IN_QUEUE 128 // max idle workers queue capacity; idle workers beyond that get killed immediately
#define NXWEB_START_WORKERS_IN_QUEUE 0 // thread pool starts without any threads (saves stack memory especially on CentOS)

Note that idle workers beyond NXWEB_MAX_IDLE_WORKERS_IN_QUEUE are not killed immediately. They are garbage collected (in portions), and the garbage collector is only invoked when event loop is idle for at least 1 second. Under constant peak load (eg. during benchmarking) the garbage collector has no chance to run. Therefore queue could stay fully loaded (127 items) for a long time. That should not mean that threads get created for every new job. New thread is only created when the queue is empty (see nxw_get_worker() code).
 
I added counters to check numbers of threads and running threads, and it looks that no thread is processing request..

You can deploy counter in worker struct that counts the number of jobs this particular worker has done. Then print the value of this counter in nxw_destroy_worker(). This way you can check if the pool is working as intended. In case you find something suspicious please write to me we will investigate further.

Yaroslav
Message has been deleted

Jarek Poświata

unread,
Sep 19, 2014, 8:07:47 AM9/19/14
to nx...@googlegroups.com
VM was configured with multiple cores, but only one was available (one network thread started). When the number of cores was changed to 1.. nothing changed, evertyhing worked as previous..
I did some tests on real hardware, everything works fine, so it means that issue is at virtualBox side.
I need to do some tests with other VMs (or even with the cloud..) to see if threads are implemented correctly..

Yaroslav

unread,
Sep 19, 2014, 8:22:16 AM9/19/14
to nx...@googlegroups.com
Virtual machines are used everywhere nowadays so nxweb should work on them.

I did not see real issues though from what you have described earlier. It should be investigated further.

--
You received this message because you are subscribed to the Google Groups "nxweb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nxweb+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages