remote+http connection to DC times out in Production WF

330 views
Skip to first unread message

Watchuta Awkin

unread,
Apr 24, 2021, 1:17:30 PM4/24/21
to WildFly
Hi there,
There is a Production (& separate DR) domain mode wildfly 17 installed and running. It's been problem free for months. We recently noted some transactions was failing, and the individual server/log/server.log's made us aware of which HC was having some trouble (essentially, there WAS no logs for a while), We decided to bounce that HC. Following that, we kept getting failure to connect to DC errors. "WFLYPRT0023: Could not connect to remote+http://[dc.ip.address.here]:9990. The connection timed out"

We then noted that we couldn't even connect to the DC using jboss-cli FROM the same IP that the DC was running on (gave same error as above). We've confirmed that there is NO firewalld, seLinux or iptables running/enabled), however any attempt to connect to the DC give the same result. (error). The strange thing is that we are able to download (wget) the index.html from the DC, no problem.. we can even telnet to the DC IP & management port, but WF just refuses to connect. We've added the system property "jboss.as.management.blocking.timeout" and set it to be 3600 on both the DC and on the domain.conf of all HC's, but it still times out and fails to connect.

This being a Production environment makes it even more worrying as we believe that if the DC goes down, it is staying down for good. Google searches have recommended doing thread dumps, but we do not see jcmd or jstack on these VMs as they are JREs not JDKs. Also I wouldn't know whether to run it on the DC or the HC.. I've seen postssaying kill -3 may be used. I presume i do so on the HC, however I do not know at what point do i issue the kill -3 command, 
Please assist me to get this environment stable again, or diagnose what the problem is.

Yeray Borges Santana

unread,
Apr 26, 2021, 8:33:08 AM4/26/21
to WildFly
Hello,

You mentioned you are unable to connect via jboss-cli to your Domain Controller (DC) from the same machine. That suggests me the problem could be on the DC and not on any slave HC. So, at first, glance kill -3 against your slave HC process would not fix the situation. Notice you can try to connect via jboss-cli to your slave HC instead of your DC. That would be a good test to verify your slave HC is fine. Check the  --controller option for the jboss-cli.sh, you can specify a different host and port to connect another instance.

Have you checked the status of your DC process via jconsole?

If there is nothing relevant there, you could try to reload your DC (I'm assuming you have managed servers on your DC side and you don't want to shutdown them). The reload operation allows you to reload a HC without restarting the managed servers. In the MBeans tab on your jconsole connected to your DC process navigate to jboss.as/master (the name depends on your DC name), there are some management operations exposed there that could be useful, you could try to reload it from there.

You can also try a more aggressive option killing the DC directly. On domain mode, there is also a process called "Process Controller" (PC) which is monitoring the HC process, and when the HC process dies, the PC will start it again. Killing the HC process should not affect the running applications since they are running on the servers managed by the HC. 


Hope it helps
Reply all
Reply to author
Forward
0 new messages