Tracked this down to a credential store. Our configuration uses LDAP as primary, with legacy Kerberos (via JAAS) as fallback (to go away some day). Authentication usually fails through to Kerberos because of bad passwords. On rare occasion the user doesn’t
have LDAP credentials.
Turning everything on the JAAS/Krb5LoginModule up to debug, we found an occasional socket timeout to the KDC correlated with the login timeouts.
The default KDC request timeout buried down in the old Sun Java code (sun.security.krb5.KdcComm) is 30 seconds. By the time that happens, the AJP connector has timed out and aborted
the POST.
Options at this point are maybe to reduce the aggregate KDC timeout (default 3 tries * socket timeout) to less than the AJP proxy timeout, switch the Kerberos configuration to use TCP
(default is “unreliable” UDP), maybe both, or remove the AJP timeout altogether. The down side of removing the AJP timeout might be a risk of a rare connection “hang” with no response (long timeout; not that we’ve seen it other than what’s described here).