Hookbox hangs

85 views
Skip to first unread message

Valery Visnakov

unread,
Feb 14, 2011, 9:45:44 AM2/14/11
to Hookbox User Group
I'm currently using hookbox version 0.3.3 under ubuntu 10.04. The
usage is quite intensive more than 1000 connected clients. Last time
hookbox started to hang after this exception:

<quote>
ValueError: need more than 1 value to unpack
2011-02-14 14:27:17,247 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/unsubscribe
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,248 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/destroy_channel
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,248 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/unsubscribe
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,249 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/destroy_channel
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,249 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/unsubscribe
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,250 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/destroy_channel
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
2011-02-14 14:27:17,250 - hookbox - WARNING - Exception with webhook
http://mydomain.com/hookbox/disconnect
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 179, in http_request
gaierror: [Errno -3] Temporary failure in name resolution
</quote>

Does anybody faced the same problem? Could it be caused by low memory?


Raikage

unread,
Feb 15, 2011, 12:12:08 AM2/15/11
to hoo...@googlegroups.com
Hi Valery,

So far, a lot of people are reporting the stopping issue [1], both
using 0.3.3 and the development version that uses restkit as the http
client. Can you also try checking out my fork
https://github.com/raikage/hookbox.git , let me know the issue you
will encounter.

[1] http://groups.google.com/group/hookbox/browse_thread/thread/8528f6f5ffa756f2

Regards,
Dax

Valery Visnakov

unread,
Feb 15, 2011, 4:24:40 AM2/15/11
to Hookbox User Group
I see you "restkit 3.0.4" commit to master https://github.com/hookbox/hookbox
. Is this the same commit?

Do you have instructions on how to compile it from source? I'm not
familiar with python



On Feb 15, 7:12 am, Raikage <raik...@gmail.com> wrote:
> Hi Valery,
>
> So far, a lot of people are reporting the  stopping issue [1], both
> using 0.3.3 and the development version that uses restkit as the http
> client. Can you also try checking out my forkhttps://github.com/raikage/hookbox.git, let me know the issue you
> will encounter.
>
> [1]http://groups.google.com/group/hookbox/browse_thread/thread/8528f6f5f...

Dax

unread,
Feb 15, 2011, 6:11:57 AM2/15/11
to Hookbox User Group
Hi Valery,

Yes, https://github.com/hookbox/hookbox it is the same commit. to
install just go to the hookbox directory, you will see a setup.py
inside, as root do a python setup.py install, this will install an egg
into your /usr/local/lib/python2.6/dist-packages/ .

If you do not want to install system wide, you can do the virtualenv
way (http://pypi.python.org/pypi/virtualenv). Since you are on ubuntu
lucid (10.04) the instruction below should work.

Go to a directory where you have write access and where you want to
install your virtual environment

sudo apt-get install python-virtualenv
virtualenv --no-site-packages hookboxenv
source hookboxenv/bin/activate

cd to the hookbox directory (the one downloaded from github)

python setup.py install (this will also install the required restkit
package as part of its dependency)

you can now run your hookbox

Hope this helps.

Regards,
Dax

On Feb 15, 5:24 pm, Valery Visnakov <bal...@gmail.com> wrote:
> I see you "restkit 3.0.4" commit to masterhttps://github.com/hookbox/hookbox

Valery Visnakov

unread,
Feb 18, 2011, 8:09:22 AM2/18/11
to Hookbox User Group
Hello Dax,



I've installed new version 0.3.4dev. The problem still exists.

Here is a error log, after which, everything craches


Exception: Not Connected
2011-02-18 13:07:30,138 - HookboxConn - WARNING - Error reading frame
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.4dev-
py2.6.egg/hookbox/protocol.py", line 62, in run
fid, fname, fargs= self._rtjp_conn.recv_frame().wait()
ValueError: need more than 1 value to unpack
2011-02-18 13:07:32,433 - hookbox - WARNING - Exception with webhook
http://ask.fm
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.4dev-
py2.6.egg/hookbox/server.py", line 211, in http_request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/resource.py", line 188, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 581, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 547, in perform
RequestError: [Errno -5] No address associated with hostname
2011-02-18 13:07:32,434 - hookbox - WARNING - Exception with webhook
http://ask.fm
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.4dev-
py2.6.egg/hookbox/server.py", line 211, in http_request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/resource.py", line 188, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 581, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 547, in perform
RequestError: [Errno -5] No address associated with hostname
2011-02-18 13:07:32,567 - hookbox - WARNING - Exception with webhook
http://ask.fm
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.4dev-
py2.6.egg/hookbox/server.py", line 211, in http_request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/resource.py", line 188, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 581, in request
File "/usr/local/lib/python2.6/dist-packages/restkit-3.0.4-py2.6.egg/
restkit/client.py", line 547, in perform
RequestError: [Errno -5] No address associated with hostname





On Feb 15, 1:11 pm, Dax <raik...@gmail.com> wrote:
> Hi Valery,
>
> Yes,https://github.com/hookbox/hookboxit is the same commit. to

Valery Visnakov

unread,
Feb 18, 2011, 8:32:17 AM2/18/11
to Hookbox User Group
Hey Dax,

Is there any way I could help you to find out the cause of problem?

The crush usually happens when instance is using around 74-76Mb of RAM
and with average 770 users connected.

Here are some stats

| users | memory used |

| 760 | 73928 |
| 768 | 71772 |
| 766 | 74152 |
| 753 | 74996 |

It looks like there is some kind of limit in users connected
simultaneously.

Hope that help.


On Feb 15, 1:11 pm, Dax <raik...@gmail.com> wrote:
> Hi Valery,
>
> Yes,https://github.com/hookbox/hookboxit is the same commit. to

Whitesell, Ken

unread,
Feb 18, 2011, 8:52:28 AM2/18/11
to hoo...@googlegroups.com
Just off the top of my head, this is sounding like a system problem and not specifically a "hookbox" issue. It seems like you might be getting suspiciously close to the 1K file limit per process - especially if some of these connections are transient and not being fully closed quickly.

I've never worked with Ubuntu, so I don't know how close these instructions would be for you, but when you get into this situation, find the PID for your hookbox process, then issue the command:

lsof -p <pid> | wc

The first number is the number of lines (each line representing an open file - which is what a TCP socket is considered to be).

If that's over 1000, you can try increasing the file limit.

Valery Visnakov

unread,
Feb 18, 2011, 9:56:08 AM2/18/11
to Hookbox User Group
Yes, that look like this could be a problem

Here's a log:
==================

deploy@li139-33:~$ sudo lsof -p 4536 | wc
1003 10076 129418
deploy@li139-33:~$ sudo lsof -p 4536 | wc
995 9982 125720
deploy@li139-33:~$ sudo lsof -p 4536 | wc
987 9913 127250
deploy@li139-33:~$ sudo lsof -p 4536 | wc
1020 10237 128645
deploy@li139-33:~$ sudo lsof -p 4536 | wc
1045 10499 134698
deploy@li139-33:~$ sudo lsof -p 4536 | wc
981 9827 124854
deploy@li139-33:~$ sudo lsof -p 4536 | wc
925 9268 117692

After it hits 1045, hookbox goes down. But I've configured /etc/
security/limits.conf and now it sas `ulimit -n` => 65535. But the
problem is still the same.





On Feb 18, 3:52 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
wrote:
> Just off the top of my head, this is sounding like a system problem and not specifically a "hookbox" issue. It seems like you might be getting suspiciously close to the 1K file limit per process - especially if some of these connections are transient and not being fully closed quickly.
>
> I've never worked with Ubuntu, so I don't know how close these instructions would be for you, but when you get into this situation, find the PID for your hookbox process, then issue the command:
>
> lsof -p <pid> | wc
>
> The first number is the number of lines (each line representing an open file - which is what a TCP socket is considered to be).
>
> If that's over 1000, you can try increasing the file limit.
>
>
>
>
>
>
>
> -----Original Message-----
> From: hoo...@googlegroups.com [mailto:hoo...@googlegroups.com] On Behalf Of Valery Visnakov
> Sent: Friday, February 18, 2011 8:32 AM
> To: Hookbox User Group
> Subject: [hookbox] Re: Hookbox hangs
>
> Hey Dax,
>
> Is there any way I could help you to find out the cause of problem?
>
> The crush usually happens when instance is using around 74-76Mb of RAM and with average 770 users connected.
>
> Here are some stats
>
> | users | memory used |
>
> | 760   | 73928       |
> | 768   | 71772       |
> | 766   | 74152       |
> | 753   | 74996       |
>
> It looks like there is some kind of limit in users connected simultaneously.
>
> Hope that help.
>
> On Feb 15, 1:11 pm, Dax <raik...@gmail.com> wrote:
> > Hi Valery,
>
> > Yes,https://github.com/hookbox/hookboxitis the same commit. to
> > > > >webhookhttp://mydomain.com/hookbox/unsubscribe
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,248 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/destroy_channel
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,248 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/unsubscribe
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,249 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/destroy_channel
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,249 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/unsubscribe
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,250 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/destroy_channel
> > > > > Traceback (most recent call last):
> > > > >  File
> > > > >"/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
> > > > > hookbox/server.py", line 179, in http_request
> > > > > gaierror: [Errno -3] Temporary failure in name resolution
> > > > > 2011-02-14 14:27:17,250 - hookbox - WARNING - Exception with
> > > > >webhookhttp://mydomain.com/hookbox/disconnect

Whitesell, Ken

unread,
Feb 18, 2011, 10:15:29 AM2/18/11
to hoo...@googlegroups.com
Did you reboot? The changes in limits.conf aren't dynamic - the system needs to be rebooted before it takes effect.

Also, I believe the default for the ulimit command is to show the soft limit and not the hard limit. Try issuing both the ulimit -Sn and ulimit -Hn commands to verify that they're each showing the 64K.

And, the ulimit is the per-process limit. I think there's also a system-wide limit in /proc/sys/fs/file-max which can be set directly (e.g. echo "20000" >/proc/sys/fs/file-max) or through the syscntl.conf file by adding the line fs.file-max=20000. (Note, this setting does affect the kernel memory usage, so I wouldn't just arbitrarily set it to some huge number - maybe increase it by 50% just to see if it changes the behavior of the system.

Ken

Valery Visnakov

unread,
Feb 18, 2011, 11:22:10 AM2/18/11
to Hookbox User Group
Well, it's quite strange. I've all that settings present and rebooted.

deploy@li139-33:~$ ulimit -Sn
65535

deploy@li139-33:~$ ulimit -Hn
65535


deploy@li139-33:~$ cat /proc/sys/fs/file-max
76866


deploy@li139-33:~$ sysctl -a | grep fs.file-max
fs.file-max = 76866




On Feb 18, 5:15 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
> > > Yes,https://github.com/hookbox/hookboxitisthe same commit. to

Whitesell, Ken

unread,
Feb 18, 2011, 3:11:48 PM2/18/11
to hoo...@googlegroups.com
There's yet another possibility -

Inside the eventlet library (used by hookbox), there's a hard-coded limit of 1000 greenlet threads. It's possible that _that_ is the boundary being hit.)

I'm having a hard time figuring out the "proper" way to change the value, but if you're willing to edit the modules files, you could just give it a try -

There are two instances in eventlet/greenpool.py (lines 17 and 188) where a default parameter of 1000 is set in the __init__ method.
@ line 17:
def __init__(self, size=1000):
You could try changing that to 10000, likewise on line 188.

Then, in convenience.py on line 56, change the value of concurrency:
def serve(sock, handle, concurrency=1000):

Finally, in wsgi.py, change the line:
DEFAULT_MAX_SIMULTANEOUS_REQUESTS = 1024

WARNING: I have no idea whether or not this will work - I'm just looking for instances of the magic number that is appearing to be reached.

Whitesell, Ken

unread,
Feb 18, 2011, 3:13:40 PM2/18/11
to hoo...@googlegroups.com
Note to all: Yes, I know the following is extremely bad form - I'm just trying to do some basic debugging here. If it turns out to fix the problem, I'll try to find a more appropriate solution.

(First, I've got to do something to create the problem myself. I haven't been able to recreate it yet.)

Ken

Salman Haq

unread,
Feb 18, 2011, 4:21:45 PM2/18/11
to hoo...@googlegroups.com
Here's another basic debugging tip to compliment the other ideas:

Run strace on the hookbox process to track all the system calls it is
making. It's quite possible that when it hangs, it is due to a deadlock
on a mutex or such. I encountered a similar scenario recently when
writing my own eventlet network server.

strace -p <pid of hookbox process>

If it does indeed deadlock, attach gdb to the process and inspect the
backtrace for clues:

gdb -p <pid of hookbox process>
bt

Thanks,
Salman

Valery Visnakov

unread,
Feb 21, 2011, 8:22:12 AM2/21/11
to Hookbox User Group
I've tried to set params. Restarted. Rebuilded hookbox. Still the same
issue.

On Feb 18, 10:11 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
> > > > Yes,https://github.com/hookbox/hookboxitisthesame commit. to
> > > > > > > gaierror: [Errno -3] Temporary failure in name resolution...
>
> read more »

Valery Visnakov

unread,
Feb 21, 2011, 8:31:25 AM2/21/11
to Hookbox User Group
The process itself doesn't hang. Here is what happens.

After hitting a number of ~700 connected users, process is still
alive. And strace continues to output log messages:

<quote>
end(5, "action=disconnect&name=867d8174e"..., 55, 0) = 55
recv(5, 0xa5279dc, 8192, 0) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_ctl(8, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP,
{u32=5, u64=579066045105438725}}) = 0
gettimeofday({1298294706, 801006}, NULL) = 0
gettimeofday({1298294706, 801133}, NULL) = 0
epoll_wait(8, {{EPOLLIN, {u32=840, u64=579066045105439560}}}, 1023,
71) = 1
epoll_ctl(8, EPOLL_CTL_DEL, 840, {EPOLLOUT|EPOLLRDBAND|EPOLLWRNORM|
EPOLLMSG|EPOLLERR|EPOLLET|0x3ffd3020, {u32=0,
u64=579066045105438720}}) = 0
recv(840, "GET /csp/comet?s=0b90517315b2434"..., 8192, 0) = 964
getsockname(840, {sa_family=AF_INET, sin_port=htons(8001),
sin_addr=inet_addr("109.74.192.33")}, [16]) = 0
gettimeofday({1298294706, 802390}, NULL) = 0
gettimeofday({1298294706, 802688}, NULL) = 0
gettimeofday({1298294706, 802873}, NULL) = 0
gettimeofday({1298294706, 803051}, NULL) = 0
gettimeofday({1298294706, 803237}, NULL) = 0
gettimeofday({1298294706, 803401}, NULL) = 0
epoll_wait(8, {}, 1023, 69) = 0
gettimeofday({1298294706, 872751}, NULL) = 0
gettimeofday({1298294706, 872890}, NULL) = 0
epoll_wait(8, {}, 1023, 6) = 0
</quote>

But hookbox error log stops producing any messages and nobody can
connect to hookbox anymore.

And i'm seeing in admin part that number of users goes down.
> >>>> Yes,https://github.com/hookbox/hookboxitisthesame commit. to
> >>>>>>> 2011-02-14...
>
> read more »

Valery Visnakov

unread,
Feb 21, 2011, 8:34:26 AM2/21/11
to Hookbox User Group
I've attached strace log and here's what happens.

After hitting the limit of ~700 conencted users. strace still
continues to output messages:

<quote>
send(143, "HTTP/1.1 200 Ok\r\nContent-Type: a"..., 266, 0) = 266
gettimeofday({1298295038, 145967}, NULL) = 0
gettimeofday({1298295038, 146116}, NULL) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
recv(143, 0xaf2e9d4, 8192, 0) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_ctl(8, EPOLL_CTL_ADD, 143, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP,
{u32=143, u64=579066045105438863}}) = 0
gettimeofday({1298295038, 147033}, NULL) = 0
epoll_wait(8, {}, 1023, 28) = 0
gettimeofday({1298295038, 174672}, NULL) = 0
gettimeofday({1298295038, 174822}, NULL) = 0
epoll_wait(8, {}, 1023, 0) = 0
gettimeofday({1298295038, 175114}, NULL) = 0
gettimeofday({1298295038, 175246}, NULL) = 0
epoll_wait(8, {}, 1023, 0) = 0
gettimeofday({1298295038, 175525}, NULL) = 0
gettimeofday({1298295038, 175665}, NULL) = 0
epoll_wait(8, {}, 1023, 7) = 0
gettimeofday({1298295038, 182709}, NULL) = 0
gettimeofday({1298295038, 182798}, NULL) = 0
epoll_wait(8, {}, 1023, 0) = 0
gettimeofday({1298295038, 182984}, NULL) = 0
gettimeofday({1298295038, 183070}, NULL) = 0
epoll_wait(8, {}, 1023, 0) = 0
gettimeofday({1298295038, 183238}, NULL) = 0
gettimeofday({1298295038, 183324}, NULL) = 0
epoll_wait(8, {}, 1023, 0) = 0
gettimeofday({1298295038, 183603}, NULL) = 0
gettimeofday({1298295038, 183769}, NULL) = 0
epoll_wait(8, {{EPOLLIN, {u32=143, u64=579066045105438863}}}, 1023,
721) = 1
epoll_ctl(8, EPOLL_CTL_DEL, 143, {EPOLLOUT|EPOLLRDBAND|EPOLLWRNORM|
EPOLLMSG|EPOLLERR|EPOLLET|0x3ffd3020, {u32=0,
u64=579066045105438720}}) = 0
recv(143, "POST /admin/csp/comet?s=6d6a523f"..., 8192, 0) = 727
getsockname(143, {sa_family=AF_INET, sin_port=htons(8001),
sin_addr=inet_addr("109.74.192.33")}, [16]) = 0
gettimeofday({1298295038, 233014}, NULL) = 0
gettimeofday({1298295038, 233252}, NULL) = 0
gettimeofday({1298295038, 233406}, NULL) = 0
gettimeofday({1298295038, 233576}, NULL) = 0
gettimeofday({1298295038, 233737}, NULL) = 0
gettimeofday({1298295038, 233881}, NULL) = 0
</quote>

But hookbox error log doesn't produce anything and users cannot
connect to the server.




On Feb 18, 11:21 pm, Salman Haq <salman....@asti-usa.com> wrote:
> >>>> Yes,https://github.com/hookbox/hookboxitisthesame commit. to
> >>>>>>> 2011-02-14...
>
> read more »

Salman Haq

unread,
Feb 21, 2011, 10:40:46 AM2/21/11
to hoo...@googlegroups.com
Valery,

Thanks for sharing this!

The output does NOT suggest a deadlock.

Once it hangs, are the 'gettimeofday' messages the only ones that are
printed?

Let's go back to basics. How about you put log statements in every
function in server.py, protocol.py, channel.py, user.py and trace the
program execution.

I realize this is tedious but we'll probably end up learning a lot even
if the cause of the bug doesn't become apparent.

Thanks,
Salman
ps: How are you reproducing this issue?

>> read more �
>

Whitesell, Ken

unread,
Feb 21, 2011, 1:33:49 PM2/21/11
to hoo...@googlegroups.com
There are two other possible limits you might be encountering - the first are still about the limits that may exist in the python libraries. In the hookbox/server.py, in the "HookboxServer.run" method, right before the "return ev" statement, (line 129 in my copy) please print or log the value of ev.hubs.get_hub().

Specifically, the issue here is that I'm seeing references to limits in the "select" call that restrict it to file descriptors < 1024. If Hookbox/Eventlet are using the "select" hub, it might be possible that this is the problem.

Oops - I see from your log messages below that you're probably using epoll.

If you have a kernel version >= 2.6.28, you can look at /proc/sys/fs/epoll/max_user_instances to see if it's there and has a value of 1024. If so, you can try increasing that. (I haven't yet found a way to change this on earlier kernels.)

The other possibility that I can think of deals with the dynamic "ephemeral" ports assigned to the connection.

Please look at the contents of /proc/sys/net/ipv4/ip_local_port_range
It should have a range other than "1024 5000" (mine shows "32768 61000" - you may have different values). This can be changed either by echo-ing new values into that file:
echo "35000 65000" >/proc/sys/net/ipv4/ip_local_port_range

or making the change to /etc/sysctl.conf:
net.ipv4.ip_local_port_range="35000 65000"

Valery Visnakov

unread,
Feb 22, 2011, 6:45:39 AM2/22/11
to Hookbox User Group
Hello Ken,


On Feb 21, 8:33 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
wrote:
> There are two other possible limits you might be encountering - the first are still about the limits that may exist in the python libraries. In the hookbox/server.py, in the "HookboxServer.run" method, right before the "return ev" statement, (line 129 in my copy) please print or log the value of ev.hubs.get_hub().
>

Here is an error hookbox says after adding log parametrs.

File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.4dev-
py2.6.egg/hookbox/server.py", line 130, in run
logger.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" + ev.hubs.get_hub())
AttributeError: 'Event' object has no attribute 'hubs'


> Specifically, the issue here is that I'm seeing references to limits in the "select" call that restrict it to file descriptors < 1024. If Hookbox/Eventlet are using the "select" hub, it might be possible that this is the problem.
>
> Oops - I see from your log messages below that you're probably using epoll.
>
> If you have a kernel version >= 2.6.28, you can look at /proc/sys/fs/epoll/max_user_instances to see if it's there and has a value of 1024. If so, you can try increasing that. (I haven't yet found a way to change this on earlier kernels.)

I don't have this param at all. What I have is

deploy@li139-33:~$ cat /proc/sys/fs/epoll/max_user_watches
284396




>
> The other possibility that I can think of deals with the dynamic "ephemeral" ports assigned to the connection.
>
> Please look at the contents of /proc/sys/net/ipv4/ip_local_port_range
> It should have a range other than "1024 5000" (mine shows "32768 61000" - you may have different values). This can be changed either by echo-ing new values into that file:
> echo "35000 65000" >/proc/sys/net/ipv4/ip_local_port_range
>
> or making the change to /etc/sysctl.conf:
> net.ipv4.ip_local_port_range="35000 65000"


Here is a port range

deploy@li139-33:~$ cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
> > >>> | 768   | 71772    ...
>
> read more »

Whitesell, Ken

unread,
Feb 22, 2011, 10:51:12 AM2/22/11
to hoo...@googlegroups.com
Ok, the max_user_watches and ip_local_port_range both look good.

Oops, sorry about that first request - I mis-read the code. It's eventlet.hubs.get_hub() instead of ev.hubs.get_hub().

Valery Visnakov

unread,
Mar 7, 2011, 7:06:48 AM3/7/11
to Hookbox User Group
What is the proper way to log it?

Here's how I do it


logger.warn('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
logger.warn(eventlet.hubs.get_hub())

logger.warn('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')

And here is what I get.

2011-03-07 12:05:43,654 - hookbox - WARNING
- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2011-03-07 12:05:43,654 - hookbox - WARNING -
<eventlet.hubs.epolls.Hub object at 0xb73778cc>
2011-03-07 12:05:43,655 - hookbox - WARNING
- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!





On Feb 22, 5:51 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
> > > >> After it hits 1045, hookbox goes down. But I've configured /etc/ security/limits.conf and now it sas `ulimit -n` =>  65535. But the...
>
> read more »

Valery Visnakov

unread,
Mar 7, 2011, 7:27:31 AM3/7/11
to Hookbox User Group
I've tried to install it on other node, the issue is the same. What OS
do you use for testing? Can you handle more than 1k users on that?

Whitesell, Ken

unread,
Mar 8, 2011, 9:33:45 AM3/8/11
to hoo...@googlegroups.com
That's exactly what I wanted to see - I wanted to verify that you're using the epolls hub and not one of the others.

Unfortunately though, that leaves me stuck. We've gone through everything that I can think to check. I'll keep digging - and will keep trying to recreate the problem myself, which might help me find the root cause.

Michael Carter

unread,
Mar 8, 2011, 5:10:33 PM3/8/11
to hoo...@googlegroups.com, Whitesell, Ken
Use the pyevent hub. It works far better, I've found, and ultimately uses epoll directly.

Valery Visnakov

unread,
Mar 9, 2011, 6:37:37 AM3/9/11
to Hookbox User Group
Is it possible to say hookbox to use pyevent hub?
> ...
>
> read more »

Valery Visnakov

unread,
Mar 9, 2011, 6:42:08 AM3/9/11
to Hookbox User Group
Ken,

What environment do you use for testing hookbox (OS, params)? I could
help you debugging too.

The other questions, if there any way I can scale hookbox. Eg, I could
distribute the load between multiple servers, bet there should be a
way to communicate between that.

Currently what I want to understand is if we can move forward with
hookbox. The userbase grows really fast and I need to figure out a way
to handle their responses.

Thanks in advance.






On Mar 8, 4:33 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
> > > > >> And, the ulimit is the per-process limit. I think...
>
> read more »

marie_dk

unread,
Mar 9, 2011, 8:37:05 AM3/9/11
to Hookbox User Group


On 9 Mar., 12:42, Valery Visnakov <bal...@gmail.com> wrote:
> Ken,
>
> What environment do you use for testing hookbox (OS, params)? I could
> help you debugging too.
>
> The other questions, if there any way I can scale hookbox. Eg, I could
> distribute the load between multiple servers, bet there should be a
> way to communicate between that.
>
> Currently what I want to understand is if we can move forward with
> hookbox. The userbase grows really fast and I need to figure out a way
> to handle their responses.
>
> Thanks in advance.
>

Have a look at this thread:
https://groups.google.com/group/hookbox/browse_thread/thread/11ce8edd5189528e?hl=da

A bit down the thread the talk is about replication between servers.
Maybe it can give you an idea of how it's done.

I have a setup of 5 servers each with hookbox installed, and they
replicate data via the REST interface.

I will be happy to answer your question....

/marie_dk

Valery Visnakov

unread,
Mar 16, 2011, 6:17:47 AM3/16/11
to Hookbox User Group
Well, I've finally found the bug.

You were right about limit on soft and hard links.

This was not working for root:

* soft nofile 65535
* hard nofile 65535

So I've added

root hard nofile 65535
root soft nofile 65535

And it started working. I'm now running 0.3.3 and everything seems to
be working fine. One thing you could take a look at is the following.
I've got this error only when running hookbox directly, this message
doesn't appear in error_log or access_log. And that was confusing.

error: [Errno 24] Too many open files
Removing descriptor: 5


Thanks everybody for help.


On Feb 18, 5:15 pm, "Whitesell, Ken" <Ken.Whites...@Transamerica.com>
> > > Yes,https://github.com/hookbox/hookboxitisthe same commit. to

Valery Visnakov

unread,
Mar 16, 2011, 11:03:10 AM3/16/11
to Hookbox User Group
Here is a new bug I've faced. After some time of working it stops with
following exception.



2011-03-16 14:30:07,618 - access - INFO - Incoming CSP connection
78.84.73.140 h.ask.fm:8001
2011-03-16 14:30:07,708 - access - INFO - Incoming WebSocket
connection 88.17.59.253 h.ask.fm:8001
2011-03-16 14:30:07,780 - access - INFO - Incoming CSP connection
79.153.169.128 h.ask.fm:8001
2011-03-16 14:30:07,784 - access - INFO - Incoming CSP connection
95.125.189.111 h.ask.fm:8001
2011-03-16 14:30:07,854 - hookbox - INFO - Hookbox Daemon Stopped
Traceback (most recent call last):
File "/usr/local/bin/hookbox", line 9, in <module>
load_entry_point('hookbox==0.3.3', 'console_scripts', 'hookbox')()
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/start.py", line 33, in main
server.run().wait()
File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.14-
py2.6.egg/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.14-
py2.6.egg/eventlet/hubs/hub.py", line 177, in switch
return self.greenlet.switch()
File "/usr/local/lib/python2.6/dist-packages/hookbox-0.3.3-py2.6.egg/
hookbox/server.py", line 110, in _run
rtjp_conn._sock.environ.get('REMOTE_ADDR', ''),
AttributeError: 'NoneType' object has no attribute 'environ'


What could cause this?

Andy K

unread,
Mar 17, 2011, 3:20:32 AM3/17/11
to Hookbox User Group
Exactly the same error that causes mine to stop working. Same version
of hookbox (0.3.3) running on Ubuntu 10.04.1 LTS.

I posted my issue not long ago in this thread.
https://groups.google.com/group/hookbox/browse_thread/thread/b656b4f4f21159b8?hl=en

How often is this happening on yours?

Valery Visnakov

unread,
Mar 18, 2011, 9:37:42 AM3/18/11
to Hookbox User Group
Quite often. Actually it cannot work more than an hour without
failing.

On Mar 17, 9:20 am, Andy K <a...@airslash.net> wrote:
> Exactly the same error that causes mine to stop working. Same version
> of hookbox (0.3.3) running on Ubuntu 10.04.1 LTS.
>
> I posted my issue not long ago in this thread.https://groups.google.com/group/hookbox/browse_thread/thread/b656b4f4...

marie_dk

unread,
Apr 3, 2011, 12:48:43 PM4/3/11
to Hookbox User Group
On Mar 18, 3:37 pm, Valery Visnakov <bal...@gmail.com> wrote:
> Quite often. Actually it cannot work more than an hour without
> failing.
>
> On Mar 17, 9:20 am, Andy K <a...@airslash.net> wrote:
>
>

Hi Valery

Initially I went for at setup with multiple servers, but the
replication of data via the REST interface was too much to handle for
the hookbox instances. Therefore we now have a setup with one server.

But this means that I am now facing the same issue...Hookbox stops
responding every hour and has to be restarted. I need hookbox to
handle 1000-1500 users, but at the moment I can't see how this is
possible.

I am desperate at the moment... Did you manage to fix this or come up
with a workaround?


/marie_dk

Valery Visnakov

unread,
Apr 4, 2011, 1:09:58 PM4/4/11
to Hookbox User Group
Hello,

Not really. Have the same thing. First 800-900 users are processed
fine, after that there is big slow down (handshake takes about minute)
and after that hookbox instance just hangs.

I'm now diving into hookbox source code. Would try to figure out where
the bottleneck is.

krimson

unread,
Apr 4, 2011, 5:49:20 AM4/4/11
to hoo...@googlegroups.com
Messages like these make me a little uncertain if hookbox is the right
way to go for my project. I know there are at least two people working
on the project at the moment but its a pity mailinglist traffic is very
low and we get very little updates from the developers. IRC is also very
quiet :(

Shame really, it seems like a very good piece of software that has been
well thought over. I hope we get some more updates from the developers
soon !

Rob Weiss

unread,
Apr 5, 2011, 8:52:53 AM4/5/11
to hoo...@googlegroups.com
We are trying. I just put together a team of developers and part of our responsibility is to fix hookbox and ensure that it is following the CSP, HTTP, SSL/TLS protocols properly. I hope to start to put together a plan and estimated release schedule soon. Ergo^ and I volunteered to take it over after hookbox's creator stepped aside. Hopefully we can breathe life back into the project soon.

I am in the same boat as you, we need this for our project and changing it out at this point is a no go. So I *have* to fix it.

That being said, I am on the IRC all day, and ping me if you get no response.

Thanks,
Rob.

marie_dk

unread,
Apr 5, 2011, 11:10:15 AM4/5/11
to Hookbox User Group
On 5 Apr., 14:52, Rob Weiss <j105....@gmail.com> wrote:
> We are trying. I just put together a team of developers and part of our
> responsibility is to fix hookbox and ensure that it is following the CSP,
> HTTP, SSL/TLS protocols properly. I hope to start to put together a plan and
> estimated release schedule soon. Ergo^ and I volunteered to take it over
> after hookbox's creator stepped aside. Hopefully we can breathe life back
> into the project soon.
>
> I am in the same boat as you, we need this for our project and changing it
> out at this point is a no go. So I *have* to fix it.
>
> That being said, I am on the IRC all day, and ping me if you get no
> response.
>
> Thanks,
> Rob.
>
>

We finally released the chat to our website users this afternoon. I
just hope it is not going to be too popular ;-)

I couldn't hold it back any longer... The boss was breathing down my
neck, so I had to just come up with a workaround.

So this message from you makes me very very happy :-D

Hookbox has given me some sleepless nights, but I can't imagine what
else to use... Hookbox is simple but powerful in its features, and the
way it is able to communicate with existing software is just
brilliant.


/marie_dk
Reply all
Reply to author
Forward
0 new messages