I've found a workaround for this problem - but would still like to
understand why this is happening.
Thanks
TAO VERSION: 1.6.6
ACE VERSION: 5.6.6
HOST MACHINE and OPERATING SYSTEM:
Linux x86 64 bit (Fedora 11)
TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
COMPILER NAME AND VERSION (AND PATCHLEVEL):
THE $ACE_ROOT/ace/config.h FILE [if you use a link to a platform-
specific file, simply state which one]:
ace/config-linux.h
THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE [if you
use a link to a platform-specific file, simply state which one
(unless this isn't used in this case, e.g., with Microsoft Visual
C++)]:
platform_linux.GNU
CONTENTS OF $ACE_ROOT/bin/MakeProjectCreator/config/default.features
(used by MPC when you generate your own makefiles):
AREA/CLASS/EXAMPLE AFFECTED:
Naming Service
DOES THE PROBLEM AFFECT:
EXECUTION
SYNOPSIS:
Client programs are obtaining and holding several references to
the naming service.
After about obtaining 235 references, futher attempts to resolve
the naming service hangs.
DESCRIPTION:
Start up naming service, using following command:
Naming_Service -ORBEndPoint iiop://andre:11004 -m 0
The svc.conf file contains:
dynamic Advanced_Resource_Factory Service_Object *
TAO_Strategies:_make_TAO_Advanced_Resource_Factory () "-ORBReactorType
select_mt"
static Client_Strategy_Factory "-ORBClientConnectionHandler
MT"
static Server_Strategy_Factory "-ORBConcurrency
thread-per-connection"
Start up the client programs. Each client attempt to obtain
several references to the naming service. After
obtaining about 235 references, further attempts to resolve the
naming service hang.
The client can be pstack-ed to examine what call is hanging:
#0 0x00000038bd6d6f53 in __select_nocancel () from
/lib64/libc.so.6
#1 0x00007f623f78ea20 in
ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::wait_for_multiple_events(ACE_Select_Reactor_Handle_Set&,
ACE_Time_Value*) ()
#2 0x00007f623f804848 in
ACE_TP_Reactor::dispatch_i(ACE_Time_Value*, ACE_TP_Token_Guard&) () from
/home/andre/SDU/bin/../lib/libACE.so
#3 0x00007f623f804986 in
ACE_TP_Reactor::handle_events(ACE_Time_Value*) ()
#4 0x00007f623fb16a79 in
TAO_Leader_Follower::wait_for_event(TAO_LF_Event*, TAO_Transport*,
ACE_Time_Value*) () from /home/andre/SDU/bin/../lib/libTAO.so
#5 0x00007f623fb4eb9d in
TAO::Synch_Twoway_Invocation::wait_for_reply(ACE_Time_Value*,
TAO_Synch_Reply_Dispatcher&, TAO_Bind_Dispatcher_Guard&) ()
#6 0x00007f623fb4f9b7 in
TAO::Synch_Twoway_Invocation::remote_twoway(ACE_Time_Value*) () from
/home/andre/SDU/bin/../lib/libTAO.so
#7 0x00007f623fb0feb8 in
TAO::Invocation_Adapter::invoke_twoway(TAO_Operation_Details&,
TAO_Pseudo_Var_T<CORBA::Object>&, TAO::Profile_Transport_Resolver&,
ACE_Time_Value*&) () from /home/andre/SDU/bin/../lib/libTAO.so
#8 0x00007f623fb0fa42 in
TAO::Invocation_Adapter::invoke_remote_i(TAO_Stub*,
TAO_Operation_Details&, TAO_Pseudo_Var_T<CORBA::Object>&,
ACE_Time_Value*&) ()
#9 0x00007f623fb10480 in
TAO::Invocation_Adapter::invoke_i(TAO_Stub*, TAO_Operation_Details&) ()
from /home/andre/SDU/bin/../lib/libTAO.so
#10 0x00007f623fb10092 in
TAO::Invocation_Adapter::invoke(TAO::Exception_Data*, unsigned long) ()
from /home/andre/SDU/bin/../lib/libTAO.so
#11 0x00007f623fb4671c in
TAO::Remote_Object_Proxy_Broker::_is_a(CORBA::Object*, char const*) ()
from /home/andre/SDU/bin/../lib/libTAO.so
#12 0x00007f623fb23a55 in CORBA::Object::_is_a(char const*)
()
#13 0x00007f62424279aa in
CosNaming::NamingContext::_narrow(CORBA::Object*) ()
#14 0x00007f6242677679 in
naming_assistant::resolve_object(char const*, char, int) () from
/home/andre/SDU/bin/../lib/libipa.so
#15 0x000000000040879b in main ()
At this point, pstack-ing the server reveals 235 threads with a
trace such as the following:
Thread 2 (Thread 0x7f715a536910 (LWP 16801)):
#0 0x00000038bd6d6fa2 in select () from /lib64/libc.so.6
#1 0x00007f72699a3fc2 in ACE::handle_ready(int,
ACE_Time_Value const*, int, int, int) () from
/home/andre/SDU/lib/libACE.so
#2 0x00007f72699a42e7 in ACE::enter_recv_timedwait(int,
ACE_Time_Value const*, int&) () from /home/andre/SDU/lib/libACE.so
#3 0x00007f72699a455c in ACE::recv(int, void*, unsigned
long, ACE_Time_Value const*) () from /home/andre/SDU/lib/libACE.so
#4 0x00007f7269d317e2 in TAO_IIOP_Transport::recv(char*,
unsigned long, ACE_Time_Value const*) () from
/home/andre/SDU/lib/libTAO.so
#5 0x00007f7269d883df in
TAO_Transport::handle_input_parse_data(TAO_Resume_Handle&,
ACE_Time_Value*) () from /home/andre/SDU/lib/libTAO.so
#6 0x00007f7269d88e9e in
TAO_Transport::handle_input(TAO_Resume_Handle&, ACE_Time_Value*) () from
/home/andre/SDU/lib/libTAO.so
#7 0x00007f7269d09e06 in TAO_Connection_Handler::svc_i() ()
#8 0x00007f7269a1e877 in ACE_Task_Base::svc_run(void*) ()
#9 0x00007f7269a1f805 in ACE_Thread_Adapter::invoke() ()
#10 0x00000038be20686a in start_thread () from
/lib64/libpthread.so.0
#11 0x00000038bd6de25d in clone () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()
as well as the initial thread:
Thread 1 (Thread 0x7f72698d2710 (LWP 14623)):
#0 0x00000038bd6d6fa2 in select () from /lib64/libc.so.6
#1 0x00007f72699b1a20 in
ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::wait_for_multiple_events(ACE_Select_Reactor_Handle_Set&,
ACE_Time_Value*) ()
#2 0x00007f7269a27848 in
ACE_TP_Reactor::dispatch_i(ACE_Time_Value*, ACE_TP_Token_Guard&) () from
/home/andre/SDU/lib/libACE.so
#3 0x00007f7269a27986 in
ACE_TP_Reactor::handle_events(ACE_Time_Value*) ()
#4 0x00007f7269d51283 in TAO_ORB_Core::run(ACE_Time_Value*,
int) ()
#5 0x0000000000401bcb in TAO_Naming_Service::run() ()
#6 0x00000000004018be in main ()
If using an empty service config file for the naming service,
this problem does not occur, so I suspect it is reaching some
OS limit related to the number of threads spawned on the server
side when using thread-per-connection.
It doesn't appear to be a process limit. After I increased the
user process limit (ulimit -u) to 4000, it
had no effect on the problem.
SAMPLE FIX/WORKAROUND:
Don't use -ORBConcurrency thread-per-connection in the svc.conf
file for the naming service.
> I've found a workaround for this problem
Great!
> - but would still like to understand why this is happening.
I suspect there's some thread limit being reached in the Naming
Service. In addition to switching to the thread pool model, I
recommend you try doing the following:
. Upgrade to ACE+TAO+CIAO x.7.4 (i.e., ACE 5.7.4, TAO 1.7.4, and CIAO
0.7.4), which you can download from
http://download.dre.vanderbilt.edu
under the heading: "Latest Micro Release Kit."
. Running valgrind on the Naming Service to see where it's leaking resources.
Thanks,
Doug
>several references to the naming service. After=20
>this problem does not occur, so I suspect it is reaching some=20
> OS limit related to the number of threads spawned on the server
>side when using thread-per-connection.
>
> It doesn't appear to be a process limit. After I increased the
>user process limit (ulimit -u) to 4000, it
> had no effect on the problem.
>
> SAMPLE FIX/WORKAROUND:
> Don't use -ORBConcurrency thread-per-connection in the svc.conf
>file for the naming service.
>
--
Dr. Douglas C. Schmidt Professor and Associate Chair
Electrical Engineering and Computer Science TEL: (615) 343-8197
Vanderbilt University WEB: www.dre.vanderbilt.edu/~schmidt
Nashville, TN 37203 NET: d.sc...@vanderbilt.edu