Zerogw crashes sometimes

41 views
Skip to first unread message

Edward Surov

unread,
Feb 9, 2014, 3:30:54 AM2/9/14
to zer...@googlegroups.com
Hello! Sometimes zerogw goes down unpredictably with the following record in log file:

2014-02-09 11:51:52 [ALRT] http.c:29: (e11) ws_statusline(&req->ws, status): Resource temporarily unavailable  

What could be the reason of such a failure?

Paul Colomiets

unread,
Feb 9, 2014, 8:28:01 AM2/9/14
to zer...@googlegroups.com
Hi Edward,





This assertion is triggered when something is wrong with request state machine.

1. Do you use "master" version?
2. Can you get the traceback, or the core image? (latter send me privately if you can)?
3. Is there any warnings preceding the assertion?

By looking at the code, it's probably http.c:117 causes the error. And it's triggered by sending more than 3 message parts in response. According to the docs [1] only one, two or three parts may be in the reply. This assertion should be fixed, anyway, so I created an issue [2]. It would be nice if you can confirm that this is the cause.

--
Paul

Edward Surov

unread,
Feb 9, 2014, 11:27:24 AM2/9/14
to zer...@googlegroups.com

Hello!

воскресенье, 9 февраля 2014 г., 17:28:01 UTC+4 пользователь Paul Colomiets написал:



This assertion is triggered when something is wrong with request state machine.

1. Do you use "master" version?
2. Can you get the traceback, or the core image? (latter send me privately if you can)?
3. Is there any warnings preceding the assertion?

By looking at the code, it's probably http.c:117 causes the error. And it's triggered by sending more than 3 message parts in response. According to the docs [1] only one, two or three parts may be in the reply. This assertion should be fixed, anyway, so I created an issue [2]. It would be nice if you can confirm that this is the cause.


1. I'm using this version: https://github.com/tailhook/zerogw.git
2. The failures are happening at production server, core dumps were switched off there for security reasons. We've made an exception for zerogw and are waiting for the next crash now.
3. No, in general. I've got only one case of this alert preceded with some record within the same second:

2013-11-14 00:15:26 [DEBG] resolve.c:79: Matching ``%%mydomain%%.ru'' by 18
2013-11-14 00:15:26 [ALRT] http.c:29: (e11) ws_statusline(&req->ws, status): Resource temporarily unavailable

But I think it's just a coincidence.

I'm still checking if any response could be invalid, but at first glance I don't see any abnormalities.

Paul Colomiets

unread,
Feb 9, 2014, 5:43:54 PM2/9/14
to zer...@googlegroups.com
Hi Edward,
Strange. It seems my case should be preceded with "Too many message parts" warning. So looking forward to the core dump/backtrace. Also check if you have the latest libwebsite dependency.

I'll try to brute-force all the http_static_response (the function where assertion happens) invocations for errors tomorrow, if you don't get the backtrace.

P.S.: I'm curious what project/company runs zerogw in production, if it's not a secret?

--
Paul

Edward Surov

unread,
Feb 10, 2014, 2:12:01 AM2/10/14
to zer...@googlegroups.com
Hello!


Strange. It seems my case should be preceded with "Too many message parts" warning. So looking forward to the core dump/backtrace. Also check if you have the latest libwebsite dependency.


No, not a single "Too many message parts in ~3Gb log, and we've ran zerogw with loglevel of at least 7 for some time. I've failed to detect any special conditions preceding the crash except for the single "Resource temporarily unavailable" alert, it seems to appear absolutely randomly.
 
I'll try to brute-force all the http_static_response (the function where assertion happens) invocations for errors tomorrow, if you don't get the backtrace.

P.S.: I'm curious what project/company runs zerogw in production, if it's not a secret?

 

The system is stable since yesterday, but I guess we'll have a core in a couple of days.

I'll write you a private message about the project.

Edward Surov

unread,
Feb 10, 2014, 3:01:06 AM2/10/14
to zer...@googlegroups.com
And one more weird case I recall recently: zerogw just started failing to resolve path to static content (while sockets continued to work okay). Zerogw runs under chroot on our server, so we thought that it could be a chroot bug or something.

Paul Colomiets

unread,
Feb 11, 2014, 3:10:25 PM2/11/14
to zer...@googlegroups.com
Hi,

For anybody using zerogw and following this thread. As investigated privately it seems that the issue is with two year old zerogw that Edward uses. So if you are using recent enough zerogw you are not affected.

--
Paul
Reply all
Reply to author
Forward
0 new messages