Instance not terminating (stuck connections& high CPU usage)

40 views
Skip to first unread message

Vasil Labovský

unread,
Mar 22, 2025, 7:47:29 AMMar 22
to Event-Driven Servers
Hello,


I’m experiencing an issue where some tac_plus-ng instances do not terminate as expected.
These instances remain active indefinitely and consume 100% of a CPU thread.
Over time, multiple such instances accumulate and eventually overwhelm the server.


It appears that some connections are stuck. Below is a snippet from ps aux:
ps aux | grep tac_plus
root       31279  0.9  0.0  18160  8224 ?        Ss   01:41   0:00 tac_plus-ng: 1 connection
root       34940  0.0  0.0   3396   648 pts/0    S+   01:42   0:00 grep --color=auto tac_plus
root      651800  0.2  0.0  16060  6884 ?        Ss   Mar21   2:20 tac_plus-ng: 1 connection, accepting up to 1919 more
root     3114283 98.9  0.0  18164  8140 ?        Rs   00:06  94:49 tac_plus-ng: 1 connection left, dying when idle
root     3530463 99.5  0.0  18160  8272 ?        Rs   00:14  87:39 tac_plus-ng: 1 connection left, dying when idle

As you can see, two of these instances are stuck in high CPU usage for over 90 minutes.



Configuration which I use is as below:
id = spawnd {
background = yes
listen = { port = 49 }
spawn {
instances min = 1
instances max = 32
}
}


id = tac_plus-ng {

log authzlog {
destination = /var/log/tac_plus-ng/authorization.log.
}
log authclog {
destination = /var/log/tac_plus-ng/authentication.log.
}
log acctlog  {
destination = /var/log/tac_plus-ng/accounting.log.
}

accounting log = acctlog
authentication log = authclog
authorization log = authzlog

connection timeout = 2 #Terminate a connection to a NAS after an idle period of at least s seconds. Default: 600
context timeout = 60 #Clears context cache entries after s seconds of inactivity. Default: 3600 seconds.
max-rounds = 64 #This sets an upper limit on the number of packet exchanges per session. Default: 40, acceptable range is from 1 to 127.

last-recently-used limit = 1500 # Prioritise new connections (slighly lower than max users == spawn, instances max x 60 = 32 x 60 = less than 1920)
retire limit = 1000 # The particular daemon instance will terminate after processing n requests. The spawnd instance will spawn a new instance if necessary
retire timeout = 600 # Terminate spawnd instance after 10min (in seconds)

#... mavis module, device, profiles, users & ruleset blocks

}


According to the documentation:
It seems only idle timeouts are supported


Could you please assist us with the following questions:
- Is there a way to define a maximum TCP session duration, after which the connection will be forcibly closed?
- How can we debug these stuck connections? Logging everything is problematic as we manage over 1k+ devices via TACACS

Thanks!

Vasil Labovský

unread,
Mar 22, 2025, 8:18:40 AMMar 22
to Event-Driven Servers
After approx 12 hours since the previous processes were killed:

tacscs-1:/var/log# ps aux | grep tac_plus
root     3034024  0.3  0.0  16036  5872 ?        Ss   Mar21   3:50 tac_plus-ng: 22 connections, accepting up to 1898 more
root     3394374 99.9  0.0  18280  8144 ?        Rs   02:55 611:38 tac_plus-ng: 1 connection left, dying when idle
root     3399226 99.9  0.0  18280  8088 ?        Rs   03:02 604:16 tac_plus-ng: 1 connection left, dying when idle
root     3422107 99.9  0.0  18140  8208 ?        Rs   03:42 565:12 tac_plus-ng: 1 connection left, dying when idle
root     3424917 99.9  0.0  18416  8128 ?        Rs   03:47 559:45 tac_plus-ng: 1 connection left, dying when idle
root     3506390 99.8  0.0  18420  8136 ?        Rs   06:08 418:53 tac_plus-ng: 1 connection left, dying when idle
root     3759642 96.7  0.0  18284  8232 ?        Rs   13:02   4:48 tac_plus-ng: 20 connections
root     3762396  0.3  0.0  18140  8148 ?        Ss   13:07   0:00 tac_plus-ng: 2 connections
root     3762692  0.0  0.0   6320  2340 pts/1    S+   13:07   0:00 grep --color=auto tac_plus

5 instances are stuck, each using one thread on 100% 



Dátum: sobota 22. marca 2025, čas: 12:47:29 UTC+1, odosielateľ: Vasil Labovský

Marc Huber

unread,
Mar 22, 2025, 9:26:38 AMMar 22
to event-driv...@googlegroups.com

Hi,

I yet don't know what's causing this, but hoped that

commit 49927890623e624dcff6a4c215befde7f3970e66
Author: Marc Huber <Marc....@web.de>
Date:   Thu Mar 20 17:11:21 2025 +0100

    misc/io_sched.c: auto-unregister file descriptors without callback function

would provide a work-around. Is your installation at current GIT level?

Cheers,

Marc

--
You received this message because you are subscribed to the Google Groups "Event-Driven Servers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to event-driven-ser...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/event-driven-servers/ffdc47f9-1603-4d07-a2d9-373f3ed1749an%40googlegroups.com.

Vasil Labovský

unread,
Mar 23, 2025, 5:54:45 AMMar 23
to Event-Driven Servers
Hello,


the current version we are using is:
tacscs-1:/var/log# tac_plus-ng -v
tac_plus-ng version ae93eb949861196e9a4bbb62986ec9f0906f5dcb/PCRE2/CURL

Thanks for your quick response and for looking into the potential bug.

We will try to use latest version and will let you know the results
(deployment latest version to production will takes a few days)

Dátum: sobota 22. marca 2025, čas: 14:26:38 UTC+1, odosielateľ: Marc Huber
Reply all
Reply to author
Forward
0 new messages