Spawn threads only as needed; retire them if idle for a long time (issue708001)

9 views
Skip to first unread message

mdst...@google.com

unread,
Mar 6, 2012, 4:37:46 PM3/6/12
to aoa...@google.com, bmcq...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
Reviewers: aoates, bmcquade,

Description:
I'm not really sure how to unit test this new behavior in a reliable way
(although I can observe with gdb that it does seem to work). I am open
to suggestions. (-:

Please review this at http://page-speed-codereview.appspot.com/708001/

Affected files:
M mod_spdy/apache/config_commands.cc
M mod_spdy/common/spdy_server_config.cc
M mod_spdy/common/spdy_server_config.h
M mod_spdy/common/thread_pool.cc
M mod_spdy/common/thread_pool.h
M mod_spdy/common/thread_pool_test.cc
M mod_spdy/mod_spdy.cc


bmcq...@google.com

unread,
Mar 7, 2012, 1:50:03 PM3/7/12
to mdst...@google.com, aoa...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
Overall looks good. I am glad to have this change in. It will allow
server owners to set aggressive max thread pool sizes w/o having to pay
that price when it's not needed.


http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc
File mod_spdy/common/thread_pool.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode197
mod_spdy/common/thread_pool.cc:197: if (master_->workers_.size() ==
master_->min_threads_) {
might as well make this test safe against cases where workers goes below
min threads as well, rather than using strict equality, just in case.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode204
mod_spdy/common/thread_pool.cc:204: delete this;
yikes. can we be sure that all parent functions also do not reference
the this variable as we return? also are there any scoped variables that
would attempt to modify deleted data as we go out of scope (e.g.
autolocks referencing locks that are members of this object)? Let's add
a short comment explaining why we believe this is safe.

You might also consider doing something like having ThreadMain like
this:

ThreadMain() {
base::AutoLock autolock(master_->lock_);

// ThreadMainImpl returns true (or some enum value - up
// to you) when it has decided to terminate on its own.
// In that case, we need to clean up after it.
if (ThreadMainImpl()) {
// Now that we are about to exit, remove ourselves from
// the worker queue and clean up. It is safe to delete
// this because...
DCHECK_EQ(1, master_->workers_.count(this));
master_->workers_.erase(this);
delete this;
}
}

This moves the delete this call to a very safe place. If someone adds
some new scoped variables inside ThreadMainImpl they will continue to
work since delete this only gets called once that method returns and
they go out of scope.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode225
mod_spdy/common/thread_pool.cc:225: master_->StartNewWorkerIfNeeded();
see my comment in StartNewWorkerIfNeeded - I am hoping this isn't needed
if we implement the change suggested there.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode315
mod_spdy/common/thread_pool.cc:315: if (num_idle_workers_ == 0 &&
!task_queue_.empty() &&
what if we say

if (task_queue_.size() - num_idle_workers_ > 0 && workers_.size() <
max_threads_)

that way if we add 10 tasks and can have up to 10 threads, we'll create
a thread per added task even if some of the new tasks haven't entered
their thread yet.

That should allow you to get rid of the other call to
StartNewWorkerIfNeeded that is a special case to address the case where
many tasks are added while a single thread is idle.

Hope I am understanding correctly.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool_test.cc
File mod_spdy/common/thread_pool_test.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool_test.cc#newcode218
mod_spdy/common/thread_pool_test.cc:218: mod_spdy::ThreadPool
thread_pool(1, 1);
let's add a test case where min and max are not the same and exercise
that the threads are reaped. you may be able to use a mock time class
(does chromium provide one) to advance the clock 60s w/o actually having
to wait.

http://page-speed-codereview.appspot.com/708001/

mdst...@google.com

unread,
Mar 7, 2012, 3:39:21 PM3/7/12
to aoa...@google.com, bmcq...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode197
mod_spdy/common/thread_pool.cc:197: if (master_->workers_.size() ==
master_->min_threads_) {

On 2012/03/07 18:50:03, bmcquade wrote:
> might as well make this test safe against cases where workers goes
below min
> threads as well, rather than using strict equality, just in case.

Done.

Done. The code should now be a little clearer. Note that Valgrind and
ThreadSanitizer are both happy, which helps confirm that this is in fact
safe.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode225
mod_spdy/common/thread_pool.cc:225: master_->StartNewWorkerIfNeeded();

On 2012/03/07 18:50:03, bmcquade wrote:
> see my comment in StartNewWorkerIfNeeded - I am hoping this isn't
needed if we
> implement the change suggested there.

Done.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool.cc#newcode315
mod_spdy/common/thread_pool.cc:315: if (num_idle_workers_ == 0 &&
!task_queue_.empty() &&

On 2012/03/07 18:50:03, bmcquade wrote:
> what if we say

> if (task_queue_.size() - num_idle_workers_ > 0 && workers_.size() <
> max_threads_)

> that way if we add 10 tasks and can have up to 10 threads, we'll
create a thread
> per added task even if some of the new tasks haven't entered their
thread yet.

> That should allow you to get rid of the other call to
StartNewWorkerIfNeeded
> that is a special case to address the case where many tasks are added
while a
> single thread is idle.

> Hope I am understanding correctly.

Ah, that is a much nicer way of doing things. Done.

http://page-speed-codereview.appspot.com/708001/diff/1/mod_spdy/common/thread_pool_test.cc#newcode218
mod_spdy/common/thread_pool_test.cc:218: mod_spdy::ThreadPool
thread_pool(1, 1);

On 2012/03/07 18:50:03, bmcquade wrote:
> let's add a test case where min and max are not the same and exercise
that the
> threads are reaped. you may be able to use a mock time class (does
chromium
> provide one) to advance the clock 60s w/o actually having to wait.

Done.

http://page-speed-codereview.appspot.com/708001/

mdst...@google.com

unread,
Mar 7, 2012, 3:55:48 PM3/7/12
to aoa...@google.com, bmcq...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
Clarified StartNewWorkerIfNeeded a bit.

http://page-speed-codereview.appspot.com/708001/

bmcq...@google.com

unread,
Mar 7, 2012, 4:17:11 PM3/7/12
to mdst...@google.com, aoa...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
LGTM, just a few test fixes to reduce flakiness.


http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool.cc
File mod_spdy/common/thread_pool.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool.cc#newcode301
mod_spdy/common/thread_pool.cc:301: workers.assign(workers_.begin(),
workers_.end());
Let's also call workers_.clear() right after stealing the workers into
the local workers vector.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool.h
File mod_spdy/common/thread_pool.h (right):

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool.h#newcode68
mod_spdy/common/thread_pool.h:68: int DebugGetNumWorkers();
I like to call test-only methods "FooForTest" so in this case
GetNumWorkersForTest.

You can alternatively declare this class a friend of ThreadPoolTestPeer
and then declare a thin class ThreadPoolTestPeer in your test cc file
that reaches into private data and provides an implementation of this
method. Either is fine with me.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc
File mod_spdy/common/thread_pool_test.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode268
mod_spdy/common/thread_pool_test.cc:268:
base::PlatformThread::Sleep(25);
so technically this will be a flaky test. let's busy-wait for a max
period of time until DebugGetNumWorkers reaches 2, and fail the test if
the max time limit expires. something like:
max_sleep_time = 500ms
per_iter_sleep_time = 10ms
sleep_time_so_far = 0ms;
while (sleep_time_so_far < max_sleep_time &&
thread_pool.DebugGetNumWorkers() < 2) {
sleep(per_iter_sleep_time);
}
ASSERT_EQ(2, thread_pool.DebugGetNumWorkers());

etc

Since you need this in a few other places you could hvae a test:
AssertNumDebugWorkersWithTimeout(int expected_num_workers);

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode278
mod_spdy/common/thread_pool_test.cc:278:
base::PlatformThread::Sleep(25);
i dont think you should need this sleep; AddTask increases the number of
debug workers synchronously, right?

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode286
mod_spdy/common/thread_pool_test.cc:286:
base::PlatformThread::Sleep(25);
should not need this either

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode294
mod_spdy/common/thread_pool_test.cc:294: base::PlatformThread::Sleep(2 *
idle_time_millis);
you could use AssertNumDebugWorkersWithTimeout here also

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode302
mod_spdy/common/thread_pool_test.cc:302: base::PlatformThread::Sleep(2 *
idle_time_millis);
same

http://page-speed-codereview.appspot.com/708001/

mdst...@google.com

unread,
Mar 8, 2012, 11:39:42 AM3/8/12
to aoa...@google.com, bmcq...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool.cc#newcode301
mod_spdy/common/thread_pool.cc:301: workers.assign(workers_.begin(),
workers_.end());

On 2012/03/07 21:17:11, bmcquade wrote:
> Let's also call workers_.clear() right after stealing the workers into
the local
> workers vector.

Done.

On 2012/03/07 21:17:11, bmcquade wrote:
> I like to call test-only methods "FooForTest" so in this case
> GetNumWorkersForTest.

> You can alternatively declare this class a friend of
ThreadPoolTestPeer and then
> declare a thin class ThreadPoolTestPeer in your test cc file that
reaches into
> private data and provides an implementation of this method. Either is
fine with
> me.

Done.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode268
mod_spdy/common/thread_pool_test.cc:268:
base::PlatformThread::Sleep(25);

On 2012/03/07 21:17:11, bmcquade wrote:
> so technically this will be a flaky test. let's busy-wait for a max
period of
> time until DebugGetNumWorkers reaches 2, and fail the test if the max
time limit
> expires. something like:
> max_sleep_time = 500ms
> per_iter_sleep_time = 10ms
> sleep_time_so_far = 0ms;
> while (sleep_time_so_far < max_sleep_time &&
thread_pool.DebugGetNumWorkers() <
> 2) {
> sleep(per_iter_sleep_time);
> }
> ASSERT_EQ(2, thread_pool.DebugGetNumWorkers());

> etc

> Since you need this in a few other places you could hvae a test:
> AssertNumDebugWorkersWithTimeout(int expected_num_workers);

Done.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode278
mod_spdy/common/thread_pool_test.cc:278:
base::PlatformThread::Sleep(25);

On 2012/03/07 21:17:11, bmcquade wrote:
> i dont think you should need this sleep; AddTask increases the number
of debug
> workers synchronously, right?

Done.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode286
mod_spdy/common/thread_pool_test.cc:286:
base::PlatformThread::Sleep(25);

On 2012/03/07 21:17:11, bmcquade wrote:
> should not need this either

Done.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode294
mod_spdy/common/thread_pool_test.cc:294: base::PlatformThread::Sleep(2 *
idle_time_millis);

On 2012/03/07 21:17:11, bmcquade wrote:
> you could use AssertNumDebugWorkersWithTimeout here also

Done.

http://page-speed-codereview.appspot.com/708001/diff/1008/mod_spdy/common/thread_pool_test.cc#newcode302
mod_spdy/common/thread_pool_test.cc:302: base::PlatformThread::Sleep(2 *
idle_time_millis);

On 2012/03/07 21:17:11, bmcquade wrote:
> same

Done.

http://page-speed-codereview.appspot.com/708001/

bmcq...@google.com

unread,
Mar 8, 2012, 1:01:39 PM3/8/12
to mdst...@google.com, aoa...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
LGTM, but let's please switch to incrementing num_idle_workers in
WorkerThread::Start(). If you have questions/concerns let's chat about
it in person.


http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool.cc
File mod_spdy/common/thread_pool.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool.cc#newcode361
mod_spdy/common/thread_pool.cc:361: task_queue_.size() <=
num_idle_workers_) {
It strikes me that we might create too many workers here due to the fact
that num_idle_workers_ isn't incremented until the worker starts.
Consider:
1. we have 1 idle worker
2. a task is added. we determine that no additional workers should be
created. task queue is now size 1
3. a task is added before the worker has taken the first pending task,
so we are at queuesize=2, idle=1. create another worker.
4. the originally idle worker takes one of the tasks. now we are at
queuesize=1, idle=0.
5. the originally idle worker finishes the task and takes the next entry
in the queue, but the originally idle worker still hasn't started. so we
are now at queuesize=0, idle=0.
6. a task is added. num_idle is 0 because the second worker hasn't
started. we start another worker unnecessarily.

This would not happen if we incremented the idle count when
WorkerThread::Start() was called. incrementing idle count in that method
would also make some of the test logic less flaky. So I think we should
do that. As soon as ThreadMainImpl() starts you could set it to no
longer idle. What do you think?

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc
File mod_spdy/common/thread_pool_test.cc (right):

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode295
mod_spdy/common/thread_pool_test.cc:295: ExpectWorkersWithinTimeout(2,
2, &thread_pool, 20);
might as well dial this up to a larger timeout. I think 20ms would cover
most cases but I'd rather wait longer for the tail and have no test
flakiness than fail some tests early. How about 100ms here? Same for the
other call sites.

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode302
mod_spdy/common/thread_pool_test.cc:302: ExpectWorkersWithinTimeout(3,
0, &thread_pool, 20);
once you change over to incrementing num_idle_workers in the
workerthread Start(), these tests can be plain old EXPECT_EQ without a
timeout, since the data will all be updated synchronously on the main
test thread.

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode316
mod_spdy/common/thread_pool_test.cc:316: ExpectWorkersWithinTimeout(4,
1, &thread_pool, 20);
this series of events looks flaky to me. on a machine with high-priority
load i bet this fails some of the time. if num_idle_workers got
incremented synchronously at worker construction time i think this'd be
safe to do synchronously and it would remove the races.

the specific issue i see is that each thread may or may not get
scheduled to run/finish in the time you've specified. it should, but it
is not guaranteed.

for the sake of the tests and to fix another race (mentioned in a
comment in the cc file) i think it's worth incrementing the number of
idle workers at worker construction time rather than only once the
worker starts running. If you do this, this first test can be 2 plain
old EXPECT_EQ and then you would have an ExpectWorkersWithinTimeout(3,
0, &thread_pool, 100); after that. I prefer that, as the test will no
longer be flaky and it removes a small but real race in the cc file.

http://page-speed-codereview.appspot.com/708001/

bmcq...@google.com

unread,
Mar 8, 2012, 1:04:28 PM3/8/12
to mdst...@google.com, aoa...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
Where I say "but the originally idle worker still hasn't started" that
should read "but the newly created worker still hasn't started"

http://page-speed-codereview.appspot.com/708001/

mdst...@google.com

unread,
Mar 8, 2012, 1:37:03 PM3/8/12
to aoa...@google.com, bmcq...@google.com, mod-...@googlegroups.com, re...@page-speed-codereview.appspotmail.com
PTAL; I think the thread spawning, and the unit test, should be more
robust now.

Note that although the test could theoretically still be flaky, I've yet
to see it ever actually fail spuriously (either the new revision or even
my earlier revision), even when running under Valgrind or
ThreadSanitizer (which I find tend to break overly timing-sensitive
tests).

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool.cc#newcode361
mod_spdy/common/thread_pool.cc:361: task_queue_.size() <=
num_idle_workers_) {

I've switched from tracking num_idle_workers to tracking
num_busy_workers. So instead of incrementing/decrementing around the
condvar.Wait(), I increment/decrement around running the task. This
means that a thread is counted as "idle" from the moment it is created
and added to workers_. I think that should fix the issue you're worried
about, and I think it makes the code easier to reason about.

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode295
mod_spdy/common/thread_pool_test.cc:295: ExpectWorkersWithinTimeout(2,
2, &thread_pool, 20);

On 2012/03/08 18:01:39, bmcquade wrote:
> might as well dial this up to a larger timeout. I think 20ms would
cover most
> cases but I'd rather wait longer for the tail and have no test
flakiness than
> fail some tests early. How about 100ms here? Same for the other call
sites.

Done.

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode302
mod_spdy/common/thread_pool_test.cc:302: ExpectWorkersWithinTimeout(3,
0, &thread_pool, 20);

On 2012/03/08 18:01:39, bmcquade wrote:
> once you change over to incrementing num_idle_workers in the
workerthread
> Start(), these tests can be plain old EXPECT_EQ without a timeout,
since the
> data will all be updated synchronously on the main test thread.

Done.

http://page-speed-codereview.appspot.com/708001/diff/6001/mod_spdy/common/thread_pool_test.cc#newcode316
mod_spdy/common/thread_pool_test.cc:316: ExpectWorkersWithinTimeout(4,
1, &thread_pool, 20);

On 2012/03/08 18:01:39, bmcquade wrote:
> this series of events looks flaky to me. on a machine with
high-priority load i
> bet this fails some of the time. if num_idle_workers got incremented
> synchronously at worker construction time i think this'd be safe to do
> synchronously and it would remove the races.

> the specific issue i see is that each thread may or may not get
scheduled to
> run/finish in the time you've specified. it should, but it is not
guaranteed.

> for the sake of the tests and to fix another race (mentioned in a
comment in the
> cc file) i think it's worth incrementing the number of idle workers at
worker
> construction time rather than only once the worker starts running. If
you do
> this, this first test can be 2 plain old EXPECT_EQ and then you would
have an
> ExpectWorkersWithinTimeout(3, 0, &thread_pool, 100); after that. I
prefer that,
> as the test will no longer be flaky and it removes a small but real
race in the
> cc file.

Done.

http://page-speed-codereview.appspot.com/708001/

Reply all
Reply to author
Forward
0 new messages