abrupt hanging / blocking of mongoose web server

661 views
Skip to first unread message

swati joshi

unread,
Sep 23, 2014, 7:41:12 AM9/23/14
to mongoos...@googlegroups.com
Hi There,

I am using mongoose.c & mongoose.h embedded in my application which acts as a localhost server, serving 2 purposes

1. To Print & Handle all the incoming websocket requests
2. Act as local hosts to load the index.html and other relates .js & .css files.

For this i have referred server.c and websocket_echo_server.c  example codes.
I have used mongoose.c with time stamp $Date: 2014-09-16 06:47:40 UTC $

Following is my code,

time_t current_timer = 0, last_timer = time(NULL);

static int ev_handler(struct mg_connection *conn, enum mg_event ev) {
   if(conn->is_websocket)
  {
   printf("-->URI %s \n",conn->uri);
   printf("-->Header Content -->%s<--\n\n",conn->content);
   return MG_TRUE;
  }
  return MG_FALSE;
}
static void push_message(struct mg_server *server, time_t current_time) {
    struct mg_connection *c;
    char *buf = "";
    for (c = mg_next(server, NULL); c != NULL; c = mg_next(server, c)) {
        if (c->is_websocket) {
            mg_websocket_write(c, 1, buf, strlen(buf));
        }
    }
}

static void *serving_thread_func(void *param) {
    struct mg_server *srv = (struct mg_server *) param;
    while (exit_flag == 0) {

        mg_poll_server(srv,300);

        current_timer = time(NULL);
        if (current_timer - last_timer > 0) {
            last_timer = current_timer;
            push_message(server, current_timer);
        }
    }
    return NULL;
}

int main(int argc, char *argv[]) {
    
    // Initialize Mongoose Server and Start
    init_server_name();
    start_mongoose(argc, argv);
    
    printf("%s serving [%s] on port %s\n",
               server_name, mg_get_option(server, "document_root"),
               mg_get_option(server, "listening_port"));

    serving_thread_func(server);
    
    printf("Exiting on signal %d ...", exit_flag);
    fflush(stdout);
    
    mg_destroy_server(&server);
    return 0;
}

The problem I am facing is, when there are more than 20 websocket requests coming at once, httpserver hangs / blocks inside ev_handler. And Polling also stops. Nothing happens further.

Please suggest what could be possibly wrong , what could be done to fix this.

Thanks & Regards,
Swati

swati joshi

unread,
Sep 24, 2014, 5:32:06 AM9/24/14
to mongoos...@googlegroups.com
Hi, 
Just to add to the problem details to the above, I have the following observations:
1. The Polling thread blocks when there are huge numbers of requests coming in Ex: 100+ in a sec. Also, for each request, a response is to be sent.
2. Unlike as mentioned in the above code snippet, in ev_handler( ) , If I have a logic to frame z response and send it using mg_websocket_write( ), then the process blocks. Other wise, Just returning a MG_TRUE / MG_FALSE  without the former, it doesnt block.
3. Clearly, the flaw is delay in sending response from server, while the frequency of incoming requests is too high. This seems to enter into deadlock situation.

Supporting the above, I have the following to mention :
1. When my process, httpserver - mongoose server is working fine, the below is observed
/tmp/root # netstat -t -p | grep httpserver
netstat: showing only processes with your user ID
tcp         0          0 localhost:8888          localhost:49824         ESTABLISHED 2321/httpserver
tcp         0          0 localhost:8888          localhost:49819         CLOSE_WAIT  2321/httpserver
tcp          0         0 localhost:8888          localhost:49818         CLOSE_WAIT  2321/httpserver
tcp        0           0 localhost:8888          localhost:49822         ESTABLISHED 2321/httpserver
tcp        0           0 localhost:8888          localhost:49825         ESTABLISHED 2321/httpserver
tcp        0           0 localhost:8888          localhost:49823         ESTABLISHED 2321/httpserver
tcp         0         0 localhost:8888          localhost:49817         CLOSE_WAIT  2321/httpserver
tcp          0         0 localhost:8888          localhost:49826         CLOSE_WAIT  2321/httpserver


2.  When it blocks, as explained above the below is observed.

/tmp/root # netstat -t -p | grep httpserver
netstat: showing only processes with your user ID
tcp    16341       0 localhost:8888          localhost:49824         ESTABLISHED 2321/httpserver
tcp      410         0 localhost:8888          localhost:49819         CLOSE_WAIT  2321/httpserver
tcp      410         0 localhost:8888          localhost:49818         CLOSE_WAIT  2321/httpserver
tcp        0           0 localhost:8888          localhost:49822         ESTABLISHED 2321/httpserver
tcp        0           0 localhost:8888          localhost:49825         ESTABLISHED 2321/httpserver
tcp        0           0 localhost:8888          localhost:49823         ESTABLISHED 2321/httpserver
tcp      410         0 localhost:8888          localhost:49817         CLOSE_WAIT  2321/httpserver
tcp      410         0 localhost:8888          localhost:49826         CLOSE_WAIT  2321/httpserver

It could be noticed that, the Recv-Q column is seen with numbers, which are the bytes not read / copied by the server program.

What could be the solution in this case ?

Kindly reply !!

Sergey Lyubka

unread,
Sep 24, 2014, 9:01:44 AM9/24/14
to mongoose-users

Thanks. What's the os  ?

--
You received this message because you are subscribed to the Google Groups "mongoose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongoose-user...@googlegroups.com.
To post to this group, send email to mongoos...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongoose-users.
For more options, visit https://groups.google.com/d/optout.

swati joshi

unread,
Sep 24, 2014, 11:35:57 PM9/24/14
to mongoos...@googlegroups.com
OS is linux - ubuntu

Scott M

unread,
Oct 21, 2014, 8:17:40 PM10/21/14
to mongoos...@googlegroups.com
I also see hangs when using websockets. However, I'm on windows; and my volume of use is MUCH lower. I have a handful of clients (often just 1 or 2) that make a very low volume of websocket requests. Sometimes the socket closes without warning from the client's perspective; sometimes it doesn't notice a close but the server is frozen and will not serve requests or react to new messages. 

Sergey Lyubka

unread,
Oct 22, 2014, 2:59:23 PM10/22/14
to mongoose-users
Okay, let me setup long-running stress test to catch that.

In the mean time, you might want to try net skeleton which has much more light
implementation of http/websocket functionality, examples are at

(note that net skeleton currently does not automatically PINGs client connections).


On Wed, Oct 22, 2014 at 1:17 AM, Scott M <scott....@gmail.com> wrote:
I also see hangs when using websockets. However, I'm on windows; and my volume of use is MUCH lower. I have a handful of clients (often just 1 or 2) that make a very low volume of websocket requests. Sometimes the socket closes without warning from the client's perspective; sometimes it doesn't notice a close but the server is frozen and will not serve requests or react to new messages. 

--

Scott M

unread,
Oct 22, 2014, 10:35:00 PM10/22/14
to mongoos...@googlegroups.com
For what it is worth, I confirmed that the hang causes mg_poll_server not to return (after running happily for many hours). I was using web sockets for small and fairly infrequent messages at the time.

I have only one call to mg_websocket_write and it's under a mutex. So while several threads call it, access is strictly serialized. 

While (probably) unrelated, I'm made nervous by the fact that ns_is_error is looking an errno after a call to recv() *on Windows*. errno could have a value of EAGAIN for all sorts of reasons, but recv() isn't one of them. And if you're going to check errno on unix you need to set it to 0 before calling recv...

--
You received this message because you are subscribed to a topic in the Google Groups "mongoose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongoose-users/NxNf_ST5y78/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongoose-user...@googlegroups.com.

Sergey Lyubka

unread,
Oct 23, 2014, 2:53:28 AM10/23/14
to mongoose-users
Why are you using a mutex for mg_websocket_write()?
Do you pass connection object to the other threads?

Scott M

unread,
Oct 23, 2014, 5:53:31 AM10/23/14
to mongoos...@googlegroups.com
My application is multithreaded, yes. Several different threads have the ability to send websock messages to a client. It's unlikely but possible that two would try it at the same time, so the mutex makes sure they don't.

I don't "pass connection objects" in the sense of making copies. But I do store the pointers to mg_connection in a carefully managed list that several different threads make use of.Why, are connection objects only valid within the ev_handler callback of mg_create_server?

Sergey Lyubka

unread,
Oct 23, 2014, 6:13:01 AM10/23/14
to mongoose-users
you can pass the pointers to the connection objects to other threads,
but you should know what you're doing. Any mg_poll_server() call
can destroy any of the connections, invalidating the pointer.
I suppose you're taking care of that.

The "best practice" way to push from multiple threads is to pass a server
object rather than connection objects, and call mg_broadcast().

Or, your threads could push messages to some sort of global queue.
On after each mg_poll_server() call, you can check that queue and 
push messages to the connections -- no locking would be required since
message push is going to be done from the IO thread.

Scott M

unread,
Oct 23, 2014, 10:07:11 AM10/23/14
to mongoos...@googlegroups.com
Hm. I don't see an mg_broadcast. I probably have an old version of mongoose.

I am managing my connection list correctly (as far as I can tell) - when the ev_handler announces MG_CLOSE and if is_websocket is nonzero, I lock my connection list and remove the dead connection. Any attempt to send involves locking that list and looking up the connection, so I'm fairly certain I can't ever get a dead one. 

Sergey Lyubka

unread,
Oct 23, 2014, 10:20:31 AM10/23/14
to mongoose-users
Yes that'll work.
Also make sure to lock on any IO too.
mg_poll_server() should be guarded, as well as pushes from other threads.
Otherwise simultaneous mg_write()s will produce garbage.

Scott M

unread,
Oct 23, 2014, 10:42:22 AM10/23/14
to mongoos...@googlegroups.com
Wait. Putting a lock around mg_server_poll is a non-starter. The lock would be held 99.999% of the time and other threads would never get to own the mutex and call ...write().

I think I see the problem. mongoose was designed for a single-threaded world in which all server work is done between calls to mg_poll_server, or in the ev_handler. That's not what I did. So while a lot of my code is running in the ev_handler callbacks and is safe, a few threads call ...write() whenever they want to, and the mutex locking I added will keep them from stepping on each other but not from stepping on things happening inside ...poll. It's amazing I haven't seen crashes.

I am going to have to take your suggestion and queue all my intended writes, to be picked up by the loop doing the polling. 

I don't think my misdesign is causing the hangs. Clobbering memory can do anything, but I'd expect garbaged output and the occasional crash, not a lockup, from mishandling writes...

Sergey Lyubka

unread,
Oct 23, 2014, 12:01:43 PM10/23/14
to mongoose-users
Provided low intensity of your IO, I doubt that unprotected writes caused the issue you're talking about.
The lockup is due to something else. Would you be able to make a snapshot of the thread stacks when lockup happens?

Scott M

unread,
Oct 23, 2014, 1:39:22 PM10/23/14
to mongoos...@googlegroups.com
Not easily - I've only gotten it to happen in production, and it generally happens maybe once a day when I'm not nearby. And my code detects the condition and aborts the process so it can be auto-restarted. If you know of a way to force a "core dump" from inside a windows process on demand, I could code that in as part of the abort.

My workaround for the race condition at the moment is to create a critical section (reentrant mutex), and hack ns_poll_server to lock it on entry, drop it before the select, acquire it again after the select, and drop it on exit. I also acquire it around calls to mg_websocket_write. I don't like this solution because it depends on the critical section being recursive (which is evil), and because it meant touching mongoose code. And I don't know if it's a complete solution, but the ...write function and ...poll functions are all I ever call once I'm up and running so it's likely to be complete enough for me.

It will be interesting to see if this eliminates the hangs, and occasional disconnects clients sometimes saw.

It would probably be nice if mongoose had two additional callback functions - one right before the select() and one right after. People doing what I'm doing, trying to get more thread safety, could use them to manage locking.

Sergey Lyubka

unread,
Oct 23, 2014, 2:48:50 PM10/23/14
to mongoose-users
Disconnects could be because either server closes the connection, or server hangs and doesn't send PINGs.
So server hanging could be the roots cause for all faulty behavior you see.

Now, hangups could be caused by many things, having a stack dump would be very helpful.
You can resort to something like this:

Your detector could invoke an external utility to create a dump then shoot the process.

Scott M

unread,
Oct 25, 2014, 12:32:29 PM10/25/14
to mongoos...@googlegroups.com
So I'm not easily able to get userdump installed on the production system, but instead I did the trick of recording __LINE__ at various points in ns_server_poll, and printing the last recorded value when another thread detects a hang. The last line visited is the first one noted here:

  lastAt = __LINE__; //************ LAST LINE RECORDED

  for (conn = server->active_connections; conn != NULL; conn = tmp_conn) {
    tmp_conn = conn->next;
    ns_call(conn, NS_POLL, &current_time);
    if (!(conn->flags & NSF_WANT_WRITE)) {
      //DBG(("%p read_set", conn));
  lastAt = __LINE__;
      ns_add_to_set(conn->sock, &read_set, &max_fd);
    }
    if (((conn->flags & NSF_CONNECTING) && !(conn->flags & NSF_WANT_READ)) ||
        (conn->send_iobuf.len > 0 && !(conn->flags & NSF_CONNECTING) &&
         !(conn->flags & NSF_BUFFER_BUT_DONT_SEND))) {
      //DBG(("%p write_set", conn));
  lastAt = __LINE__;
      ns_add_to_set(conn->sock, &write_set, &max_fd);
    }
    if (conn->flags & NSF_CLOSE_IMMEDIATELY) {
  lastAt = __LINE__;
      ns_close_conn(conn);
    }
  }

  lastAt = __LINE__;

From this it looks like the connection linked list got into a loop, and whatever connection(s) it is looping on didn't have any interesting flags set. I've glanced over the linked list code and it seems ok, so I'm adding a counter to see if the list is really looped. Unfortunately at about one crash a day, the printf method of debugging is slow.

Scott M

unread,
Oct 27, 2014, 1:06:44 AM10/27/14
to mongoos...@googlegroups.com
After more digging, my issue appears to have been memory corruption in my own code, causing an infinite loop. I don't think I have a mongoose issue.

Sergey Lyubka

unread,
Oct 27, 2014, 9:04:54 AM10/27/14
to mongoose-users
Thanks for following up on that.
For the reference, could you elaborate on how did you hunt down the corruption, please?

Scott M

unread,
Oct 27, 2014, 11:16:44 AM10/27/14
to mongoos...@googlegroups.com
I managed to reproduce the problem in a non-production server, under a debugger. I put a breakpoint on the code that exited the server when it hangs, and when it was hit I looked at what the threads were up to. One was in a tight loop and had smashed memory.

Scott M

unread,
Oct 28, 2014, 11:03:45 PM10/28/14
to mongoos...@googlegroups.com
Under some circumstances I'd like to return a 403 to anyone making any webpage request. I've tried hooking a call to send_http_error in after on_accept in the case for NS_ACCEPT, but it doesn't reliably show the permission denied message - usually the browser complains about an aborted connection. 

How do I?

Sergey Lyubka

unread,
Oct 29, 2014, 3:51:00 AM10/29/14
to mongoose-users
case MG_REQUEST:
   mg_printf(conn, "%s", "HTTP/1.1 403 Forbidden\r\nContent-Length: 3\r\n\r\n403");
   return MG_TRUE;

Scott M

unread,
Oct 29, 2014, 9:05:37 AM10/29/14
to mongoos...@googlegroups.com
Thanks!
Reply all
Reply to author
Forward
0 new messages