So far, a lot of people are reporting the stopping issue [1], both
using 0.3.3 and the development version that uses restkit as the http
client. Can you also try checking out my fork
https://github.com/raikage/hookbox.git , let me know the issue you
will encounter.
[1] http://groups.google.com/group/hookbox/browse_thread/thread/8528f6f5ffa756f2
Regards,
Dax
I've never worked with Ubuntu, so I don't know how close these instructions would be for you, but when you get into this situation, find the PID for your hookbox process, then issue the command:
lsof -p <pid> | wc
The first number is the number of lines (each line representing an open file - which is what a TCP socket is considered to be).
If that's over 1000, you can try increasing the file limit.
Also, I believe the default for the ulimit command is to show the soft limit and not the hard limit. Try issuing both the ulimit -Sn and ulimit -Hn commands to verify that they're each showing the 64K.
And, the ulimit is the per-process limit. I think there's also a system-wide limit in /proc/sys/fs/file-max which can be set directly (e.g. echo "20000" >/proc/sys/fs/file-max) or through the syscntl.conf file by adding the line fs.file-max=20000. (Note, this setting does affect the kernel memory usage, so I wouldn't just arbitrarily set it to some huge number - maybe increase it by 50% just to see if it changes the behavior of the system.
Ken
Inside the eventlet library (used by hookbox), there's a hard-coded limit of 1000 greenlet threads. It's possible that _that_ is the boundary being hit.)
I'm having a hard time figuring out the "proper" way to change the value, but if you're willing to edit the modules files, you could just give it a try -
There are two instances in eventlet/greenpool.py (lines 17 and 188) where a default parameter of 1000 is set in the __init__ method.
@ line 17:
def __init__(self, size=1000):
You could try changing that to 10000, likewise on line 188.
Then, in convenience.py on line 56, change the value of concurrency:
def serve(sock, handle, concurrency=1000):
Finally, in wsgi.py, change the line:
DEFAULT_MAX_SIMULTANEOUS_REQUESTS = 1024
WARNING: I have no idea whether or not this will work - I'm just looking for instances of the magic number that is appearing to be reached.
(First, I've got to do something to create the problem myself. I haven't been able to recreate it yet.)
Ken
Run strace on the hookbox process to track all the system calls it is
making. It's quite possible that when it hangs, it is due to a deadlock
on a mutex or such. I encountered a similar scenario recently when
writing my own eventlet network server.
strace -p <pid of hookbox process>
If it does indeed deadlock, attach gdb to the process and inspect the
backtrace for clues:
gdb -p <pid of hookbox process>
bt
Thanks,
Salman
Thanks for sharing this!
The output does NOT suggest a deadlock.
Once it hangs, are the 'gettimeofday' messages the only ones that are
printed?
Let's go back to basics. How about you put log statements in every
function in server.py, protocol.py, channel.py, user.py and trace the
program execution.
I realize this is tedious but we'll probably end up learning a lot even
if the cause of the bug doesn't become apparent.
Thanks,
Salman
ps: How are you reproducing this issue?
>> read more �
>
Specifically, the issue here is that I'm seeing references to limits in the "select" call that restrict it to file descriptors < 1024. If Hookbox/Eventlet are using the "select" hub, it might be possible that this is the problem.
Oops - I see from your log messages below that you're probably using epoll.
If you have a kernel version >= 2.6.28, you can look at /proc/sys/fs/epoll/max_user_instances to see if it's there and has a value of 1024. If so, you can try increasing that. (I haven't yet found a way to change this on earlier kernels.)
The other possibility that I can think of deals with the dynamic "ephemeral" ports assigned to the connection.
Please look at the contents of /proc/sys/net/ipv4/ip_local_port_range
It should have a range other than "1024 5000" (mine shows "32768 61000" - you may have different values). This can be changed either by echo-ing new values into that file:
echo "35000 65000" >/proc/sys/net/ipv4/ip_local_port_range
or making the change to /etc/sysctl.conf:
net.ipv4.ip_local_port_range="35000 65000"
Unfortunately though, that leaves me stuck. We've gone through everything that I can think to check. I'll keep digging - and will keep trying to recreate the problem myself, which might help me find the root cause.
Shame really, it seems like a very good piece of software that has been
well thought over. I hope we get some more updates from the developers
soon !