Hello,
I’m experiencing an issue where some tac_plus-ng instances do not terminate as expected.
These instances remain active indefinitely and consume 100% of a CPU thread.
Over time, multiple such instances accumulate and eventually overwhelm the server.
It appears that some connections are stuck. Below is a snippet from ps aux:
ps aux | grep tac_plus
root 31279 0.9 0.0 18160 8224 ? Ss 01:41 0:00 tac_plus-ng: 1 connection
root 34940 0.0 0.0 3396 648 pts/0 S+ 01:42 0:00 grep --color=auto tac_plus
root 651800 0.2 0.0 16060 6884 ? Ss Mar21 2:20 tac_plus-ng: 1 connection, accepting up to 1919 more
root 3114283 98.9 0.0 18164 8140 ? Rs 00:06 94:49 tac_plus-ng: 1 connection left, dying when idle
root 3530463 99.5 0.0 18160 8272 ? Rs 00:14 87:39 tac_plus-ng: 1 connection left, dying when idle
As you can see, two of these instances are stuck in high CPU usage for over 90 minutes.
Configuration which I use is as below:
id = spawnd {
background = yes
listen = { port = 49 }
spawn {
instances min = 1
instances max = 32
}
}
id = tac_plus-ng {
log authzlog {
destination = /var/log/tac_plus-ng/authorization.log.
}
log authclog {
destination = /var/log/tac_plus-ng/authentication.log.
}
log acctlog {
destination = /var/log/tac_plus-ng/accounting.log.
}
accounting log = acctlog
authentication log = authclog
authorization log = authzlog
connection timeout = 2 #Terminate a connection to a NAS after an idle period of at least s seconds. Default: 600
context timeout = 60 #Clears context cache entries after s seconds of inactivity. Default: 3600 seconds.
max-rounds = 64 #This sets an upper limit on the number of packet exchanges per session. Default: 40, acceptable range is from 1 to 127.
last-recently-used limit = 1500 # Prioritise new connections (slighly lower than max users == spawn, instances max x 60 = 32 x 60 = less than 1920)
retire limit = 1000 # The particular daemon instance will terminate after processing n requests. The spawnd instance will spawn a new instance if necessary
retire timeout = 600 # Terminate spawnd instance after 10min (in seconds)
#... mavis module, device, profiles, users & ruleset blocks
}
According to the documentation:
It seems only idle timeouts are supported
Could you please assist us with the following questions:
-
Is there a way to define a maximum TCP session duration, after which the connection will be forcibly closed?
- How can we debug these stuck connections? Logging everything is problematic as we manage over 1k+ devices via TACACS
Thanks!