Instance not terminating (stuck connections& high CPU usage)

Vasil Labovský

unread,

Mar 22, 2025, 7:47:29 AMMar 22

to Event-Driven Servers

Hello,

I’m experiencing an issue where some tac_plus-ng instances do not terminate as expected.
These instances remain active indefinitely and consume 100% of a CPU thread.
Over time, multiple such instances accumulate and eventually overwhelm the server.

It appears that some connections are stuck. Below is a snippet from ps aux:

ps aux | grep tac_plus
root 31279 0.9 0.0 18160 8224 ? Ss 01:41 0:00 tac_plus-ng: 1 connection
root 34940 0.0 0.0 3396 648 pts/0 S+ 01:42 0:00 grep --color=auto tac_plus
root 651800 0.2 0.0 16060 6884 ? Ss Mar21 2:20 tac_plus-ng: 1 connection, accepting up to 1919 more
root 3114283 98.9 0.0 18164 8140 ? Rs 00:06 94:49 tac_plus-ng: 1 connection left, dying when idle
root 3530463 99.5 0.0 18160 8272 ? Rs 00:14 87:39 tac_plus-ng: 1 connection left, dying when idle

As you can see, two of these instances are stuck in high CPU usage for over 90 minutes.

Configuration which I use is as below:
id = spawnd {
background = yes
listen = { port = 49 }
spawn {
instances min = 1
instances max = 32
}
}

id = tac_plus-ng {

log authzlog {
destination = /var/log/tac_plus-ng/authorization.log.
}
log authclog {
destination = /var/log/tac_plus-ng/authentication.log.
}
log acctlog {
destination = /var/log/tac_plus-ng/accounting.log.
}

accounting log = acctlog
authentication log = authclog
authorization log = authzlog

connection timeout = 2 #Terminate a connection to a NAS after an idle period of at least s seconds. Default: 600
context timeout = 60 #Clears context cache entries after s seconds of inactivity. Default: 3600 seconds.
max-rounds = 64 #This sets an upper limit on the number of packet exchanges per session. Default: 40, acceptable range is from 1 to 127.

last-recently-used limit = 1500 # Prioritise new connections (slighly lower than max users == spawn, instances max x 60 = 32 x 60 = less than 1920)
retire limit = 1000 # The particular daemon instance will terminate after processing n requests. The spawnd instance will spawn a new instance if necessary
retire timeout = 600 # Terminate spawnd instance after 10min (in seconds)

#... mavis module, device, profiles, users & ruleset blocks

}

According to the documentation:

https://github.com/MarcJHuber/event-driven-servers/blob/fd65b22cb6f6871a445dfbd17c9f5d2eee0e2872/doc/tac_plus-ng.txt#L1058

It seems only idle timeouts are supported

Could you please assist us with the following questions:
- Is there a way to define a maximum TCP session duration, after which the connection will be forcibly closed?
- How can we debug these stuck connections? Logging everything is problematic as we manage over 1k+ devices via TACACS

Thanks!

Vasil Labovský

unread,

Mar 22, 2025, 8:18:40 AMMar 22

to Event-Driven Servers

After approx 12 hours since the previous processes were killed:

tacscs-1:/var/log# ps aux | grep tac_plus
root 3034024 0.3 0.0 16036 5872 ? Ss Mar21 3:50 tac_plus-ng: 22 connections, accepting up to 1898 more
root 3394374 99.9 0.0 18280 8144 ? Rs 02:55 611:38 tac_plus-ng: 1 connection left, dying when idle
root 3399226 99.9 0.0 18280 8088 ? Rs 03:02 604:16 tac_plus-ng: 1 connection left, dying when idle
root 3422107 99.9 0.0 18140 8208 ? Rs 03:42 565:12 tac_plus-ng: 1 connection left, dying when idle
root 3424917 99.9 0.0 18416 8128 ? Rs 03:47 559:45 tac_plus-ng: 1 connection left, dying when idle
root 3506390 99.8 0.0 18420 8136 ? Rs 06:08 418:53 tac_plus-ng: 1 connection left, dying when idle
root 3759642 96.7 0.0 18284 8232 ? Rs 13:02 4:48 tac_plus-ng: 20 connections
root 3762396 0.3 0.0 18140 8148 ? Ss 13:07 0:00 tac_plus-ng: 2 connections
root 3762692 0.0 0.0 6320 2340 pts/1 S+ 13:07 0:00 grep --color=auto tac_plus

5 instances are stuck, each using one thread on 100%

Dátum: sobota 22. marca 2025, čas: 12:47:29 UTC+1, odosielateľ: Vasil Labovský

Marc Huber

unread,

Mar 22, 2025, 9:26:38 AMMar 22

to event-driv...@googlegroups.com

Hi,

I yet don't know what's causing this, but hoped that

commit 49927890623e624dcff6a4c215befde7f3970e66
Author: Marc Huber <Marc....@web.de>
Date: Thu Mar 20 17:11:21 2025 +0100

misc/io_sched.c: auto-unregister file descriptors without callback function

would provide a work-around. Is your installation at current GIT level?

Cheers,

Marc

--
You received this message because you are subscribed to the Google Groups "Event-Driven Servers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to event-driven-ser...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/event-driven-servers/ffdc47f9-1603-4d07-a2d9-373f3ed1749an%40googlegroups.com.

Vasil Labovský

unread,

Mar 23, 2025, 5:54:45 AMMar 23

to Event-Driven Servers

Hello,

the current version we are using is:

tacscs-1:/var/log# tac_plus-ng -v
tac_plus-ng version ae93eb949861196e9a4bbb62986ec9f0906f5dcb/PCRE2/CURL

Thanks for your quick response and for looking into the potential bug.

We will try to use latest version and will let you know the results
(deployment latest version to production will takes a few days)

Dátum: sobota 22. marca 2025, čas: 14:26:38 UTC+1, odosielateľ: Marc Huber

Reply all

Reply to author

Forward