Hi there,
There is a Production (& separate DR) domain mode wildfly 17 installed and running. It's been problem free for months. We recently noted some transactions was failing, and the individual server/log/server.log's made us aware of which HC was having some trouble (essentially, there WAS no logs for a while), We decided to bounce that HC. Following that, we kept getting failure to connect to DC errors. "WFLYPRT0023: Could not connect to remote+http://[dc.ip.address.here]:9990. The connection timed out"
We then noted that we couldn't even connect to the DC using jboss-cli FROM the same IP that the DC was running on (gave same error as above). We've confirmed that there is NO firewalld, seLinux or iptables running/enabled), however any attempt to connect to the DC give the same result. (error). The strange thing is that we are able to download (wget) the index.html from the DC, no problem.. we can even telnet to the DC IP & management port, but WF just refuses to connect. We've added the system property "jboss.as.management.blocking.timeout" and set it to be 3600 on both the DC and on the domain.conf of all HC's, but it still times out and fails to connect.
This being a Production environment makes it even more worrying as we believe that if the DC goes down, it is staying down for good. Google searches have recommended doing thread dumps, but we do not see jcmd or jstack on these VMs as they are JREs not JDKs. Also I wouldn't know whether to run it on the DC or the HC.. I've seen postssaying kill -3 may be used. I presume i do so on the HC, however I do not know at what point do i issue the kill -3 command,
Please assist me to get this environment stable again, or diagnose what the problem is.