Is this a bug from Railo 4.2.1.008 and is it fixed in Lucee 4.5?

135 views
Skip to first unread message

Shane Curless

unread,
May 19, 2016, 9:48:11 AM5/19/16
to Lucee
Hi,

My team currently has Railo 4.2.1.008 running on a production server. We are experiencing an issue with the server locking up and requiring a restart of Jetty.

Right now it happens roughly every 9 hours or so, but this interval has been getting shorter and shorter as the service gains more users.

I believe I have tracked down the issue to be related to logging - Our stderr log file shows this:

Wed May 18 19:49:23 EDT 2016-107 timeout after 10006 ms (10000 ms) occured while accessing file [/var/www/html/v2.ims-login.com/WEB-INF/railo/logs/login.log]
Wed May 18 19:49:23 EDT 2016-107 conflict in same thread: on /var/www/html/v2.ims-login.com/WEB-INF/railo/logs/login.log
java.lang.NullPointerException
        at railo.commons.io.retirement.RetireOutputStreamFactory$RetireThread.run(RetireOutputStreamFactory.java:43)
Wed May 18 19:49:23 EDT 2016-108 conflict in same thread: on /var/www/html/v2.ims-login.com/WEB-INF/railo/logs/login.log
Wed May 18 19:49:23 EDT 2016-108 conflict in same thread: on /var/www/html/v2.ims-login.com/WEB-INF/railo/logs/login.log

I'm not sure if the timeout and conflict messages are related in any way to the RetireOutputStreamFactory messages, but they do happen around the same time.

When this happens, the server is unresponsive - Will accept a connection, but just sits there not giving any response and eventually the connection times out, and only restarting Jetty gets it going again, to which end we have a monitor set up to automatically restart it if the connection times out, which is far from an ideal solution for a production system.

Based on this post here: https://groups.google.com/forum/#!topic/railo/bzW-clxkb44 - It seems someone else had the same issue some time ago but it was never resolved for him.

Is this a bug that existed in Radilo 4.2.1.008, and has it been fixed in Lucee 4.5? 

P.S. I am also a Java developer, so any answers related to the Railo/Lucee source won't be a problem for me.

Andrew Dixon

unread,
May 19, 2016, 10:08:38 AM5/19/16
to lu...@googlegroups.com
Hi Shane,

First port of call would be to check the Lucee JIRA to see if you can find if the bug has been logged and if so, what the status of the ticket is:


If not, then please raise a ticket for it and it can be progressed from there. If you are able to find and fix the bug yourself then a pull request would also be appreciated.

Kind regards,

Andrew

--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/57cc248a-ce4e-4abc-9ce7-9a53aff23197%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shane Curless

unread,
May 19, 2016, 10:18:40 AM5/19/16
to Lucee
I found LDEV-750, which is a ticket regarding a very similar problem in Lucee 5, same error messages except for the ticket creator it happens on startup, and references another "failed to flush writer" error.

It does seem from both my issue and his that the issue is stemming from an NPE in RetireOutputStream/RetireOutputStreamFactory. 

Shane Curless

unread,
May 19, 2016, 12:29:57 PM5/19/16
to Lucee
What is the purpose of the following check in ResourceLockImpl?

if(t==Thread.currentThread()) {
//aprint.err(path);
Config config = ThreadLocalPageContext.getConfig();
if(config!=null)
SystemOut.printDate(config.getErrWriter(),"conflict in same thread: on "+path);
//SystemOut.printDate(config.getErrWriter(),"conflict in same thread: on "+path+"\nStacktrace:\n"+StringUtil.replace(ExceptionUtil.getStacktrace(new Throwable(), false),"java.lang.Throwable\n","",true));
return;
}

The problem appears to be possibly related to this, because it is called in the process of getting a RetireOutputStream for logging.
 

Grant Griffith

unread,
Sep 19, 2016, 8:34:58 AM9/19/16
to Lucee
Any luck fixing this issue?  I am seeing this most mornings when the load picks up.  Usually a Lucee restart resolves it for that day, but sometimes it does happen multiple times a day here.

Lucee Version: 4.5.3.020

Grant

Shane Curless

unread,
Sep 19, 2016, 10:32:59 AM9/19/16
to Lucee
As far as I am aware this issue hasn't been fixed. My team had to resort to commenting out all cflog tags and script calls. 
Reply all
Reply to author
Forward
0 new messages