Understanding why Lucee stopped responding

162 views
Skip to first unread message

Simon Goldschmidt

unread,
Oct 4, 2016, 9:45:25 PM10/4/16
to Lucee
We run a couple of load balanced AWS servers in separate data centres running Windows and Lucee 5.0.0.254. Our session data is stored in an RDS MySQL database. This morning, both servers stopped responding at approximately the same time. We noticed that they were still responding on port 8888 and we could use the Lucee server admin pages while our applications were not responding. We could also use applications that don't use session management. Restarting Lucee returned services back to normal.

The only related logging we could find was Lucee's timeout log, attached.

We noticed the same issue on a Test server later this morning running Windows 8.1 and the same version of Lucee. We have no idea what triggered the issue in any of these cases.

I don't think we're doing anything special... has anyone else noticed this behaviour? Any suggestions what may have caused the issue and what we can do to prevent this impact if it happens again?

Thanks,
Simon

error.txt

Andrew Dixon

unread,
Oct 5, 2016, 3:20:23 AM10/5/16
to lu...@googlegroups.com
Hi Simon,

Did MySQL get restarted, either by yourself or by the RDS maintenance window? If you're running a multi-AZ RDS instance it would have rebooted with failover to the standby instance, which would be somewhere else, maybe Lucee was using a connection from the pool that could no longer connect to the RDS instance? What connection timeout do you have set against the datasource you are using for the session management? If it was set to say 20 minutes then it could be it is just holding on to a dead connection.

Kind regards,

Andrew

--
Get 10% off of the regular price for this years CFCamp in Munich, Germany (Oct. 20th & 21st) with the Lucee discount code Lucee@cfcamp. 189€ instead of 210€. Visit https://ti.to/cfcamp/cfcamp-2016/discount/Lucee@cfcamp
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/7bce32ab-dc38-4228-83d6-bc620c58caff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Goldschmidt

unread,
Oct 5, 2016, 5:52:20 PM10/5/16
to Lucee
Hi Andrew,

No restart and the zone doesn't appear to have changed for ages. The connection timeout is set to one minute. The best simulation I have mustered is to switch off MySQL, but that returns the expected "Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server." and comes back to life when MySQL is restarted. Perhaps something to do with the maximum connections (10 for us) at the time? I notice an "auto reconnect" option for data sources, but this option doesn't seem to be encouraged.

Also, the timeout.log file shows records immediately after the issue started, but the servers were unresponsive for a good 15 minutes until we rebooted the servers. Would the fact that port 8888 responded normally be a sign that the issue may have started with a session hiccup, but manifested as a problem with the Boncode connector (v 1.0.28)?

Simon

Simon Goldschmidt

unread,
Nov 21, 2016, 4:49:20 AM11/21/16
to Lucee
My best guess at the cause is an invalid configuration of a Connector in the server.xml file. We had specified MaxThreads="1000" (with the incorrect capitalisation.... should have been maxThreads). If the issue was triggered by the number of concurrent threads passing a threshold, it stands to reason that the load would have been distributed to the other server, which failed soon after the first in the same way. Having corrected the configuration, we haven't seen a repeat of this issue.
Simon

Reply all
Reply to author
Forward
0 new messages