HOST MACHINE and OPERATING SYSTEM:
Linux Debian 6.0 on x86_64
COMPILER NAME AND VERSION (AND PATCHLEVEL):
g++-4.4.5
THE $ACE_ROOT/ace/config.h FILE:
#include "config-linux.h"
THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
buildbits ?= 64
include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
CONTENTS OF $ACE_ROOT/bin/MakeProjectCreator/config/default.features: (unmodified)
AREA/CLASS/EXAMPLE AFFECTED:
ACE_Logging_Strategy
DOES THE PROBLEM AFFECT:
EXECUTION? YES
SYNOPSIS: Race condition in ACE_Logging_Strategy::handle_timeout results in occasional core dumps, and random file lengths
DESCRIPTION:
We use ACE_Logging_Strategy to limit log file size to 100MB, and default 10 minute file-size-polling interval.
Our program is multi-threaded with very low activity - there are couple of timers running, and they keep invoking ACE_TRACE. We notice that files are prematurely being rotated, with sizes much smaller than the 100MB configured size.
Also, there are occasional core dumps (see stack trace attached).
Looking at ACE_Logging_Strategy::handle_timeout function, I notice that the test for the file length is invoked while Log_Msg is *not* being locked.
401#if defined (ACE_LACKS_IOSTREAM_TOTALLY)
402 if ((size_t) ACE_OS::ftell (this->log_msg_->msg_ostream ()) > this->max_size_)
403#else
404 if ((size_t) this->log_msg_->msg_ostream ()->tellp () > this->max_size_)
405#endif /* ACE_LACKS_IOSTREAM_TOTALLY */
406 {
407 // Lock out any other logging.
Line 404: invoking tellp() on ostream is not thread safe - it actually invokes setpos. tellp() returns implementation specific streampos, which may not be an integer, but can be constructed from (-1), since in case tellp() fails, it will return streampos(-1).
WORKAROUND:
Derive MyLoggingStrategy from ACE_Logging_Strategy, enclose Logging_Straregy::handle_timeout into MyLoggingStrategy::handle_timeout, and invoke this->log_msg_->acquire() before invoking ACE_Logging_Strategy::handle_timeout
REPEAT BY:
Sample test program shows that tellp() is not thread safe.
One thread writes messages to a file, other thread runs tellp(). Must be run on a machine with more than one CPU. It core dumps in less than 5 sec.
PROPOSED FIX:
Attachments:
- stack trace of our application with ace-6.0.1
- sample code for tellp not being thread safe
- attached patch for Logging_Strategy.cpp for ace-6.1.5
Thanks for the PRF form. Thanks for all input, could you take the unit test and rewrite it as an ACE test (just take ACE/tests/OS_Test.cpp) and add your new test code and just run it for some time, also just call activate explicitly from the main? Than we can add that as new unit test with your patch to ACE.
> SYNOPSIS: Race condition in ACE_Logging_Strategy::handle_timeout
> results in occasional core dumps, and random file lengths
> DESCRIPTION:
> We use ACE_Logging_Strategy to limit log file size to 100MB, and default
> 10 minute file-size-polling interval.
> Our program is multi-threaded with very low activity - there are couple
> of timers running, and they keep invoking ACE_TRACE. We notice that
> files are prematurely being rotated, with sizes much smaller than the
> 100MB configured size.
> Also, there are occasional core dumps (see stack trace attached).
> Looking at ACE_Logging_Strategy::handle_timeout function, I notice that
> the test for the file length is invoked while Log_Msg is *not* being
> locked.
> 401#if defined (ACE_LACKS_IOSTREAM_TOTALLY)
> 402 if ((size_t) ACE_OS::ftell (this->log_msg_->msg_ostream ()) >
> this->max_size_)
> 403#else
> 404 if ((size_t) this->log_msg_->msg_ostream ()->tellp () >
> this->max_size_)
> 405#endif /* ACE_LACKS_IOSTREAM_TOTALLY */
> 406 {
> 407 // Lock out any other logging.
> 408 if (this->log_msg_->acquire ())
> 409 ACE_ERROR_RETURN ((LM_ERROR,
> 410 ACE_TEXT ("Cannot acquire lock!\n")),
> 411 -1);
> Line 404: invoking tellp() on ostream is not thread safe - it actually
> invokes setpos. tellp() returns implementation specific streampos, which
> may not be an integer, but can be constructed from (-1), since in case
> tellp() fails, it will return streampos(-1).
> WORKAROUND:
> Derive MyLoggingStrategy from ACE_Logging_Strategy, enclose
> Logging_Straregy::handle_timeout into MyLoggingStrategy::handle_timeout,
> and invoke this->log_msg_->acquire() before invoking
> ACE_Logging_Strategy::handle_timeout
> REPEAT BY:
> Sample test program shows that tellp() is not thread safe.
> One thread writes messages to a file, other thread runs tellp(). Must be
> run on a machine with more than one CPU. It core dumps in less than 5 sec.
> PROPOSED FIX:
> Attachments:
> - stack trace of our application with ace-6.0.1
> - sample code for tellp not being thread safe
> - attached patch for Logging_Strategy.cpp for ace-6.1.5
Well, the patch I sent earlier is not quite good, since it does not properly unlock the mutex.
I am resending the patch for Logging_Strategy.cpp. This time I used ACE_GUARD_RETURN to make sure the lock really gets released.
Also, I rewrote test so it uses ACE_Logging_Strategy. To make it fail, it was enough to try to log 100000 lines while invoking tellp() very fast in another thread. Log_Rotation.cpp should be placed in $ACE_ROOT/tests/Log_Rotation.cpp. You may have to tweak the file parameters passed to ACE_Logging_Strategy::init, in particular '-s' currently creates files in current working directory, rather than $ACE_ROOT/tests/log.
> Thanks for the PRF form. Thanks for all input, could you take the unit
> test and rewrite it as an ACE test (just take ACE/tests/OS_Test.cpp) and
> add your new test code and just run it for some time, also just call
> activate explicitly from the main? Than we can add that as new unit test
> with your patch to ACE.
> Best regards,
> Johnny
> On 11/13/2012 01:16 AM, Milan Cvetkovic wrote:
>> ACE VERSION: 6.0.1
>> ACE VERSION: 6.1.5
>> HOST MACHINE and OPERATING SYSTEM:
>> Linux Debian 6.0 on x86_64
>> COMPILER NAME AND VERSION (AND PATCHLEVEL):
>> g++-4.4.5
>> THE $ACE_ROOT/ace/config.h FILE:
>> #include "config-linux.h"
>> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
>> buildbits ?= 64
>> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>> CONTENTS OF
>> $ACE_ROOT/bin/MakeProjectCreator/config/default.features: (unmodified)
>> SYNOPSIS: Race condition in ACE_Logging_Strategy::handle_timeout
>> results in occasional core dumps, and random file lengths
>> DESCRIPTION:
>> We use ACE_Logging_Strategy to limit log file size to 100MB, and default
>> 10 minute file-size-polling interval.
>> Our program is multi-threaded with very low activity - there are couple
>> of timers running, and they keep invoking ACE_TRACE. We notice that
>> files are prematurely being rotated, with sizes much smaller than the
>> 100MB configured size.
>> Also, there are occasional core dumps (see stack trace attached).
>> Looking at ACE_Logging_Strategy::handle_timeout function, I notice that
>> the test for the file length is invoked while Log_Msg is *not* being
>> locked.
>> 401#if defined (ACE_LACKS_IOSTREAM_TOTALLY)
>> 402 if ((size_t) ACE_OS::ftell (this->log_msg_->msg_ostream ()) >
>> this->max_size_)
>> 403#else
>> 404 if ((size_t) this->log_msg_->msg_ostream ()->tellp () >
>> this->max_size_)
>> 405#endif /* ACE_LACKS_IOSTREAM_TOTALLY */
>> 406 {
>> 407 // Lock out any other logging.
>> 408 if (this->log_msg_->acquire ())
>> 409 ACE_ERROR_RETURN ((LM_ERROR,
>> 410 ACE_TEXT ("Cannot acquire lock!\n")),
>> 411 -1);
>> Line 404: invoking tellp() on ostream is not thread safe - it actually
>> invokes setpos. tellp() returns implementation specific streampos, which
>> may not be an integer, but can be constructed from (-1), since in case
>> tellp() fails, it will return streampos(-1).
>> WORKAROUND:
>> Derive MyLoggingStrategy from ACE_Logging_Strategy, enclose
>> Logging_Straregy::handle_timeout into MyLoggingStrategy::handle_timeout,
>> and invoke this->log_msg_->acquire() before invoking
>> ACE_Logging_Strategy::handle_timeout
>> REPEAT BY:
>> Sample test program shows that tellp() is not thread safe.
>> One thread writes messages to a file, other thread runs tellp(). Must be
>> run on a machine with more than one CPU. It core dumps in less than 5
>> sec.
>> PROPOSED FIX:
>> Attachments:
>> - stack trace of our application with ace-6.0.1
>> - sample code for tellp not being thread safe
>> - attached patch for Logging_Strategy.cpp for ace-6.1.5
> Well, the patch I sent earlier is not quite good, since it does not properly unlock the mutex.
> I am resending the patch for Logging_Strategy.cpp. This time I used ACE_GUARD_RETURN to make sure the lock really gets released.
> Also, I rewrote test so it uses ACE_Logging_Strategy. To make it fail, it was enough to try to log 100000 lines while invoking tellp() very fast in another thread. Log_Rotation.cpp should be placed in $ACE_ROOT/tests/Log_Rotation.cpp. You may have to tweak the file parameters passed to ACE_Logging_Strategy::init, in particular '-s' currently creates files in current working directory, rather than $ACE_ROOT/tests/log.
> Milan.
> Johnny Willemsen wrote:
>> Hi,
>> Thanks for the PRF form. Thanks for all input, could you take the unit
>> test and rewrite it as an ACE test (just take ACE/tests/OS_Test.cpp) and
>> add your new test code and just run it for some time, also just call
>> activate explicitly from the main? Than we can add that as new unit test
>> with your patch to ACE.
>> Best regards,
>> Johnny
>> On 11/13/2012 01:16 AM, Milan Cvetkovic wrote:
>>> ACE VERSION: 6.0.1
>>> ACE VERSION: 6.1.5
>>> HOST MACHINE and OPERATING SYSTEM:
>>> Linux Debian 6.0 on x86_64
>>> COMPILER NAME AND VERSION (AND PATCHLEVEL):
>>> g++-4.4.5
>>> THE $ACE_ROOT/ace/config.h FILE:
>>> #include "config-linux.h"
>>> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
>>> buildbits ?= 64
>>> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>>> CONTENTS OF
>>> $ACE_ROOT/bin/MakeProjectCreator/config/default.features: (unmodified)
>>> SYNOPSIS: Race condition in ACE_Logging_Strategy::handle_timeout
>>> results in occasional core dumps, and random file lengths
>>> DESCRIPTION:
>>> We use ACE_Logging_Strategy to limit log file size to 100MB, and default
>>> 10 minute file-size-polling interval.
>>> Our program is multi-threaded with very low activity - there are couple
>>> of timers running, and they keep invoking ACE_TRACE. We notice that
>>> files are prematurely being rotated, with sizes much smaller than the
>>> 100MB configured size.
>>> Also, there are occasional core dumps (see stack trace attached).
>>> Looking at ACE_Logging_Strategy::handle_timeout function, I notice that
>>> the test for the file length is invoked while Log_Msg is *not* being
>>> locked.
>>> 401#if defined (ACE_LACKS_IOSTREAM_TOTALLY)
>>> 402 if ((size_t) ACE_OS::ftell (this->log_msg_->msg_ostream ()) >
>>> this->max_size_)
>>> 403#else
>>> 404 if ((size_t) this->log_msg_->msg_ostream ()->tellp () >
>>> this->max_size_)
>>> 405#endif /* ACE_LACKS_IOSTREAM_TOTALLY */
>>> 406 {
>>> 407 // Lock out any other logging.
>>> 408 if (this->log_msg_->acquire ())
>>> 409 ACE_ERROR_RETURN ((LM_ERROR,
>>> 410 ACE_TEXT ("Cannot acquire lock!\n")),
>>> 411 -1);
>>> Line 404: invoking tellp() on ostream is not thread safe - it actually
>>> invokes setpos. tellp() returns implementation specific streampos, which
>>> may not be an integer, but can be constructed from (-1), since in case
>>> tellp() fails, it will return streampos(-1).
>>> WORKAROUND:
>>> Derive MyLoggingStrategy from ACE_Logging_Strategy, enclose
>>> Logging_Straregy::handle_timeout into MyLoggingStrategy::handle_timeout,
>>> and invoke this->log_msg_->acquire() before invoking
>>> ACE_Logging_Strategy::handle_timeout
>>> REPEAT BY:
>>> Sample test program shows that tellp() is not thread safe.
>>> One thread writes messages to a file, other thread runs tellp(). Must be
>>> run on a machine with more than one CPU. It core dumps in less than 5
>>> sec.
>>> PROPOSED FIX:
>>> Attachments:
>>> - stack trace of our application with ace-6.0.1
>>> - sample code for tellp not being thread safe
>>> - attached patch for Logging_Strategy.cpp for ace-6.1.5
> Thanks, can you put this in bugzilla, than I can integrate this later
> Johnny
> Op 13 nov. 2012 om 21:56 heeft Milan Cvetkovic<milan.cvetko...@mpathix.com> het volgende geschreven:
>> Well, the patch I sent earlier is not quite good, since it does not properly unlock the mutex.
>> I am resending the patch for Logging_Strategy.cpp. This time I used ACE_GUARD_RETURN to make sure the lock really gets released.
>> Also, I rewrote test so it uses ACE_Logging_Strategy. To make it fail, it was enough to try to log 100000 lines while invoking tellp() very fast in another thread. Log_Rotation.cpp should be placed in $ACE_ROOT/tests/Log_Rotation.cpp. You may have to tweak the file parameters passed to ACE_Logging_Strategy::init, in particular '-s' currently creates files in current working directory, rather than $ACE_ROOT/tests/log.
>> Milan.
>> Johnny Willemsen wrote:
>>> Hi,
>>> Thanks for the PRF form. Thanks for all input, could you take the unit
>>> test and rewrite it as an ACE test (just take ACE/tests/OS_Test.cpp) and
>>> add your new test code and just run it for some time, also just call
>>> activate explicitly from the main? Than we can add that as new unit test
>>> with your patch to ACE.
>>> Best regards,
>>> Johnny
>>> On 11/13/2012 01:16 AM, Milan Cvetkovic wrote:
>>>> ACE VERSION: 6.0.1
>>>> ACE VERSION: 6.1.5
>>>> HOST MACHINE and OPERATING SYSTEM:
>>>> Linux Debian 6.0 on x86_64
>>>> COMPILER NAME AND VERSION (AND PATCHLEVEL):
>>>> g++-4.4.5
>>>> THE $ACE_ROOT/ace/config.h FILE:
>>>> #include "config-linux.h"
>>>> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
>>>> buildbits ?= 64
>>>> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>>>> CONTENTS OF
>>>> $ACE_ROOT/bin/MakeProjectCreator/config/default.features: (unmodified)
>>>> SYNOPSIS: Race condition in ACE_Logging_Strategy::handle_timeout
>>>> results in occasional core dumps, and random file lengths
>>>> DESCRIPTION:
>>>> We use ACE_Logging_Strategy to limit log file size to 100MB, and default
>>>> 10 minute file-size-polling interval.
>>>> Our program is multi-threaded with very low activity - there are couple
>>>> of timers running, and they keep invoking ACE_TRACE. We notice that
>>>> files are prematurely being rotated, with sizes much smaller than the
>>>> 100MB configured size.
>>>> Also, there are occasional core dumps (see stack trace attached).
>>>> Looking at ACE_Logging_Strategy::handle_timeout function, I notice that
>>>> the test for the file length is invoked while Log_Msg is *not* being
>>>> locked.
>>>> 401#if defined (ACE_LACKS_IOSTREAM_TOTALLY)
>>>> 402 if ((size_t) ACE_OS::ftell (this->log_msg_->msg_ostream ())>
>>>> this->max_size_)
>>>> 403#else
>>>> 404 if ((size_t) this->log_msg_->msg_ostream ()->tellp ()>
>>>> this->max_size_)
>>>> 405#endif /* ACE_LACKS_IOSTREAM_TOTALLY */
>>>> 406 {
>>>> 407 // Lock out any other logging.
>>>> 408 if (this->log_msg_->acquire ())
>>>> 409 ACE_ERROR_RETURN ((LM_ERROR,
>>>> 410 ACE_TEXT ("Cannot acquire lock!\n")),
>>>> 411 -1);
>>>> Line 404: invoking tellp() on ostream is not thread safe - it actually
>>>> invokes setpos. tellp() returns implementation specific streampos, which
>>>> may not be an integer, but can be constructed from (-1), since in case
>>>> tellp() fails, it will return streampos(-1).
>>>> WORKAROUND:
>>>> Derive MyLoggingStrategy from ACE_Logging_Strategy, enclose
>>>> Logging_Straregy::handle_timeout into MyLoggingStrategy::handle_timeout,
>>>> and invoke this->log_msg_->acquire() before invoking
>>>> ACE_Logging_Strategy::handle_timeout
>>>> REPEAT BY:
>>>> Sample test program shows that tellp() is not thread safe.
>>>> One thread writes messages to a file, other thread runs tellp(). Must be
>>>> run on a machine with more than one CPU. It core dumps in less than 5
>>>> sec.
>>>> PROPOSED FIX:
>>>> Attachments:
>>>> - stack trace of our application with ace-6.0.1
>>>> - sample code for tellp not being thread safe
>>>> - attached patch for Logging_Strategy.cpp for ace-6.1.5