After a 503 Crash -- What to Look for?

94 views
Skip to first unread message

sjwoo

unread,
Nov 23, 2015, 12:38:35 PM11/23/15
to FusionReactor

So after upgrading from CF8 to CF11 for my new production box, everything was hunky-dory -- until this past Friday.  I received a call around 2:45pm, and when I clicked on anything on the site, I got a "503 service unavailable".  Just like that, all in lowercase letters, FYI.  Going to CFAdmin gave me the same issue.


However, FR was running just fine.  And although I don't have a screencap of memory usage, it looked very much like this (the green arrow is what I added).  The way my memory usage looks during the day is the jagged kind, like between 8am and 10am you see below.  But leading up to the 503 crash, the memory looked more like what I have in arrow.  I'm sure this means something, but I don't know what.


The server was only using ~1GB of the 2GB it is allotted, so it wasn't anywhere close to running out of memory.  And yet CF was as dead as a doornail.


FYI, here's what a typical day looks like, via the summary email I get every day.  It isn't exactly a server that's in heavy use.  It's a VM instance, running Xeon E5-2660 @ 2.20GHz, quad core with 8GB of memory.  The web response is not really accurate as it's counting all the scheduled tasks that run, which can run for a long time; on average, user transactions complete under 200ms.



I've pulled up FR log files for that day, in the time between noon and 2pm in particular.  Looking at request.log, I can see that 1:44:31 PM was the last time a request was completed.  From then until 2:42 PM, when CF service was recycled, no requests went through because it was generating 503.


I've scoured Windows logs, IIS logs, and CF logs.  Right before the 503s started happening, I see the login page threw a number of errors.  Deciphering my IIS logs, I see that there's a bunch of code 5 -- access denied.  But then there are 2 which have a Windows Error Code of 2148074248!


sc-status sc-substatus sc-win32-status time-taken

401 1 2148074248 327
401 2 5 1
401 2 5 694
401 2 5 7502
401 2 5 3
401 1 2148074248 1
401 2 5 1
401 2 5 0
401 2 5 0


For the life of me, I can't figure out what happened.  I don't know the FR logs as well -- any advice on how to figure out what happened here?


- Sung

Auto Generated Inline Image 1

Brad Wood

unread,
Nov 23, 2015, 3:18:58 PM11/23/15
to fusionreactor
That memory graph is probably just the JVM being bored.  The more that's happening, the more often GC tends to run, partially just due to the natural accumulation of young objects in the heap triggering minor collections. When load eases up on a server, the GC kicks back and doesn't do much. On to the error, that's coming from IIS-- probably because it can't reach CF.  I'd test and see when the last request FR shows in the history.  Enabling detailed error messages in ISS might give you more details.  It could be related to the IIS-CF connectors, but I'm just guessing.  Do you have the latest updates installed for CF and can you try reconfiguring the ISS connectors?

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 


--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fusionreacto...@googlegroups.com.
To post to this group, send email to fusion...@googlegroups.com.
Visit this group at http://groups.google.com/group/fusionreactor.
For more options, visit https://groups.google.com/d/optout.

sjwoo

unread,
Nov 23, 2015, 4:03:22 PM11/23/15
to FusionReactor
Thanks, Brad.  Looking at request.log in the FR logs folder, the last request looks like this, memory wise:



It completed with a 200.  It was a PDF-generating page, but not anything that would crash the server, at least not that I can see.

With my CF8 install, I'd get from time to time JVM crashes, whose exceptions would be saved in the bin folder.  I'm guessing it'd be the same for CF11 -- D:\ColdFusion11\cfusion\bin.  But I don' tsee any "hs_err_pidnnnnn.log" files, so I guess the JVM did not crash.  This is the first time something like this has happened since we cut over to the new boxes, which was almost a month ago.



- Sung
Auto Generated Inline Image 1

John Sieber

unread,
Nov 24, 2015, 12:21:27 PM11/24/15
to FusionReactor
If this is the same situation we've been running into with CF10, your application pool is crashing in IIS and restarting the application pool should bring the site back online. Coldfusion is still running fine when this happens, but the application pool has crashed or stopped taking requests, so it is unable to pass any requests on the application server. I'm not sure what is causing the application pool crashes, but I suspect the connector could be to blame.  I'm not sure if the latest CF10 and CF 11 updates that do have connector fixes were meant to address this issue or not.

Look at your httperr log , C:\Windows\System32\LogFiles\,  to see what errors you were getting around the time the site went down. 

I have more information posted here on our situation which sounds very familiar to yours. http://serverfault.com/questions/737619/application-pool-failling-with-client-reset-errors-in-httperr-followed-by-503-2

sjwoo

unread,
Nov 24, 2015, 2:45:02 PM11/24/15
to FusionReactor
Thanks for this, John.  I just checked C:\Windows\System32\LogFiles\HTTPERR but there are no issues I can see -- the logfile doesn't have any errors during that time.  So I guess my issue is something else, though from what you describe, my issue does sound awfully similar.

One thing I'm having done on this box is a weekly reboot.  I've been doing daily CF/WWW recycles, but maybe the weekly reboot will help.  It's what we did with our old box, so no reason not to with this one.

- Sung
Reply all
Reply to author
Forward
0 new messages