Upgraded to RabbitMQ 3.5.6 and continue to get erl_crash.dump and rabbitmq is unresponsive

vitaly numenta

unread,

Oct 18, 2015, 7:41:17 AM10/18/15

to rabbitmq-users

Rabbitmq became completely unresponsive on my Mac OS X Yosemite machine today. I was running 3.5.1 at the time. `rabbitmqctl stop` hangs. Browser can't connect to the rabbitmq Web GUI. The log appears to get stuck on "

msg_store_persistent: rebuilding indices from scratch". So, I thought I'd upgrade to 3.5.6 (via homebrew). The upgrade went without any problems. However, rabbitmq is still "stuck". I rebooted the Mac and restarted rabbitmq, but the problem just won't go away.

Please help. The crash and log files are on google drive:

Many thanks,

Vitaly

erl_crash.dump

rabbit@localhost.log

vitaly numenta

unread,

Oct 18, 2015, 7:48:58 AM10/18/15

to rabbitmq-users

The erl_crash.dump and rab...@localhost.log are attached with my prior post. The sasl log is on google drive at https://drive.google.com/file/d/0B826DlZRam76OTBrZ3V3NmNKcW8/view?usp=sharing

Alvaro Videla

unread,

Oct 18, 2015, 7:58:27 AM10/18/15

to rabbitm...@googlegroups.com

The sasl-log shows that RabbitMQ is not able to obtain information on how much disk space is available on your machine.

What's the output of this command on your machine:

/bin/df -kP /path/to/mnesia/dir

Also, is the user running rabbitmq able to un said command against that folder?

On Sun, Oct 18, 2015 at 1:48 PM, vitaly numenta <vitaly.kru...@gmail.com> wrote:

The erl_crash.dump and rab...@localhost.log are attached with my prior post. The sasl log is on google drive at https://drive.google.com/file/d/0B826DlZRam76OTBrZ3V3NmNKcW8/view?usp=sharing

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Oct 18, 2015, 8:08:35 AM10/18/15

to rabbitm...@googlegroups.com, vitaly numenta

On 18 Oct 2015 at 14:41:24, vitaly numenta (vitaly.kru...@gmail.com) wrote:

> However, rabbitmq is still "stuck". I rebooted the Mac and restarted
> rabbitmq, but the problem just won't go away.

Check hostname resolution: does your machine’s hostname resolve to 127.0.0.1, via DNS or /etc/hosts?

On OS X it is very common to see similar issues (servers or clients “get stuck”) when VPN is in effect

because it changes the resolver.

Another possible cause for broker not finishing boot is various security mechanisms preventing

it from binding to ports, although that should much more quickly than hostname lookups .

--
MK

Staff Software Engineer, Pivotal/RabbitMQ

vitaly numenta

unread,

Oct 18, 2015, 4:15:19 PM10/18/15

to rabbitmq-users

Hi Alvaro, Michael, et al,

here are the results:

1. This problem is happening with the VPN OFF. I am aware of issues when running with VPN, as experienced personally and discussed on numerous threads here. RabbtMQ would slow down severely with VPN ON, but then would recover once the VPN was turned OFF. Every now and then, it wouldn't recover after turning VPN OFF, and I would have to kill it, and then restart rabbit to get back to normal. This time is different: RabbitMQ seems "stuck" all the time regardless of VPN ON/OFF and rebooting the Mac OS X 10.10.5 doesn't help at all.

2. This all has been running fine for a long time with 3.5.1, except the pesky erl_crash.dump discussed in the thread https://groups.google.com/forum/#!searchin/rabbitmq-users/erl_crash.dump%7Csort:date/rabbitmq-users/3oYTb9feQr8/iSBiRGy8tZ4J. The problem in the current thread started suddenly this night, after successfully running tests and code for much of the preceding day. Now, every time I execute `rabbitmq-server -detached`, erl_crash.dump shows up in the current directory. Per referenced thread, the regular erl_crash dumps were going to be addressed. I installed 3.5.6 in hopes that it would help recover from the stuck state (before posting this thread), but it didn't help. However, I can't tell whether the erl_crash dump in the other thread is related to the one in this thread.

3.

$ /bin/df -kP /usr/local/var/lib/rabbitmq/mnesia/

Filesystem 1024-blocks Used Available Capacity Mounted on

/dev/disk1 487385236 154144276 332984960 32% /

4.

I start rabbitmq via `rabbitmq-server -detached` from my own login user account - the same account from which I ran the above `/bin/df ...` command.

5. I believe that my hostname resolves via /etc/hosts or /private/etc/hosts, which both have the exact same content on my machine:

=====

$ cat /etc/hosts

##

# Host Database

#

# localhost is used to configure the loopback interface

# when the system is booting. Do not change this entry.

##

127.0.0.1 localhost vkruglikovs-MacBook-Pro.local

255.255.255.255 broadcasthost

::1 localhost vkruglikovs-MacBook-Pro.local

=====

$ hostname

vkruglikovs-MacBook-Pro.local

This shows that the pure DNS resolver doesn't see it:

$ host `hostname`

Host vkruglikovs-MacBook-Pro.local not found: 3(NXDOMAIN)

But the "official" system resolver on Mac OS X sees it just fine (aided by the hosts file, I guess):

$ dscacheutil -q host -a name `hostname`

name: localhost

alias: vkruglikovs-macbook-pro.local

ipv6_address: ::1

name: localhost

alias: vkruglikovs-macbook-pro.local

ip_address: 127.0.0.1

And ping works, too:

$ ping `hostname`

PING localhost (127.0.0.1): 56 data bytes

64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.065 ms

And in Python:

In [7]: socket.gethostname()

Out[7]: 'vkruglikovs-MacBook-Pro.local'

In [8]: socket.gethostbyname('vkruglikovs-MacBook-Pro.local')

Out[8]: '127.0.0.1'

vitaly numenta

unread,

Oct 18, 2015, 4:39:35 PM10/18/15

to rabbitmq-users

When I start rabbitmq without `-detached`, the console output gets stuck here:

$ rabbitmq-server

## ## Licensed under the MPL. See http://www.rabbitmq.com/

## ##

########## Logs: /usr/local/var/log/rabbitmq/rab...@localhost.log

###### ## /usr/local/var/log/rabbitmq/rab...@localhost-sasl.log

##########

Starting broker...

Michael Klishin

unread,

Oct 18, 2015, 4:54:07 PM10/18/15

to rabbitm...@googlegroups.com, vitaly numenta

Vitaly,

This is not a particularly productive way of investigating what blocks
a process. Tracing with dtrace (or strace, or similar) is.

Michael Klishin

unread,

Oct 18, 2015, 4:56:01 PM10/18/15

to rabbitm...@googlegroups.com, vitaly numenta

On 18 October 2015 at 23:39:39, vitaly numenta (vitaly.kru...@gmail.com) wrote:

> 2. This all has been running fine for a long time with 3.5.1, except
> the pesky erl_crash.dump discussed in the thread https://groups.google.com/forum/#!searchin/rabbitmq-users/erl_crash.dump%7Csort:date/rabbitmq-users/3oYTb9feQr8/iSBiRGy8tZ4J.
> The problem in the current thread started suddenly this night,
> after successfully running tests and code for much of the preceding
> day. Now, every time I execute `rabbitmq-server -detached`,
> erl_crash.dump shows up in the current directory. Per referenced
> thread, the regular erl_crash dumps were going to be addressed.
> I installed 3.5.6 in hopes that it would help recover from the
> stuck state (before posting this thread), but it didn't help.
> However, I can't tell whether the erl_crash dump in the other
> thread is related to the one in this thread.

The issue in the linked thread is in no way related to RabbitMQ server being blocked
on start. Dumps there come from rabbitmqctl, not rabbitmq-server.

Have you tried copying RabbitMQ's database directory and deleting it at its original location
to see if it can start as a virgin node?

Alvaro Videla

unread,

Oct 18, 2015, 5:01:32 PM10/18/15

to rabbitm...@googlegroups.com

You seem to have two different errors on sasl logs:

1) the disk monitor is not able to tell how much disk space is available, crashes, and then restarts, and crashes and so on.

2) The rabbit.erl file (which is the file that starts the broker and all its subcomponets) fails here: https://github.com/rabbitmq/rabbitmq-server/blob/rabbitmq_v3_5_6/src/rabbit.erl#L943 which begs the question if you by chance didn't end up with a garbled update at some point? I ask that because your error logs say:

... omitted lines ...

exception exit: {{badmatch,{error,enoent}},

[{filename,absname,1,[{file,"filename.erl"},{line,67}]},

{rabbit,config_files,0,

[{file,"src/rabbit.erl"},{line,863}]},

... omitted lines ...

line 863 on rabbit.erl on the 3.5.6 tag doesn't call the config_files/0 function, nor the filename:absname/1 function. That happens on line 943: https://github.com/rabbitmq/rabbitmq-server/blob/stable/src/rabbit.erl#L943, but on the other hand, on version 3.5.1 the call to file:absname/1 was indeed on line 863 https://github.com/rabbitmq/rabbitmq-server/blob/rabbitmq_v3_5_1/src/rabbit.erl#L863

So perhaps make sure you have a clean install/upgrade?

BTW, the file:absname seems to fail to find the rabbitmq config files.

--

vitaly numenta

unread,

Oct 18, 2015, 5:13:56 PM10/18/15

to rabbitmq-users

Being concerned about bad sw, I uninstalled rabbitmq and erlang, and then re-installed rabbitmq. (`brew install rabbitmq`). Rebooted the machine. The problem didn't go away. I subsequently got rid of the mnesia repository/logs (didn't care about contents on this development machine) and restarted rabbitmq. After this, rabbitmq was able to complete the startup sequence and appears to be functional.

Could it be that the log files became corrupted and rabbitmq couldn't deal with that in a graceful way? Makes me wish I kept a copy of the log files for debugging. I will save them if I run into this again.

Michael, thanks for the dtrace suggestion. I will try it next time things go awry.

Thank you,

Vitaly

vitaly numenta

unread,

Nov 5, 2015, 9:56:09 PM11/5/15

to rabbitmq-users

The problem is back, now entirely on 3.5.6. If you recall, last time rabbitmq got into this predicament, deleting the mnesia database files "fixed" the problem. Of course, this is not a desirable solution. I don't recall ever having this problem with 3.4.x, so it may be new to the 3.5.x series.

Here is how things got to this point today:

Running on Mac OS X Yosemite 10.10.5

I had both rabbitmq and VPN running for many hours today. I wasn't using rabbitmq for most of the day. Then, I tried running a test, and it was timing out while trying to connect to rabbitmq.

At this point, trying to load http://localhost:15672/ displays "This webpage is not available".

I waited for a while, then ran `rabbitmqctl stop`. It completed this time, and didn't seem to complain.

I disconnected from VPN.

I restarted rabbitmq via `rabbitmq-server -detatched`, but connecting to rabbitmq continued to timet out over and over.

I ran `rabbitmqctl stop`, but this time it hung (didn't return). After a while, I exited it via CTL-C.

vkruglikov@vkruglikovs-MacBook-Pro-2:~$ ps aux | grep -i rabbit

root 19212 1.9 0.6 2672648 97024 s004 S 6:00PM 0:34.58 /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../../erts-6.4/bin/beam.smp -W w -A 64 -P 1048576 -K true -B i -- -root /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../.. -progname erl -- -home /Users/vkruglikov -- -pa /usr/local/Cellar/rabbitmq/3.5.6/ebin -noshell -noinput -s rabbit boot -sname rabbit@localhost -boot /usr/local/Cellar/rabbitmq/3.5.6/releases/3.5.6/start_sasl -kernel inet_default_connect_options [{nodelay,true}] -rabbit tcp_listeners [{"127.0.0.1",5672}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/usr/local/var/log/rabbitmq/rab...@localhost.log"} -rabbit sasl_error_logger {file,"/usr/local/var/log/rabbitmq/rab...@localhost-sasl.log"} -rabbit enabled_plugins_file "/usr/local/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/local/Cellar/rabbitmq/3.5.6/plugins" -rabbit plugins_expand_dir "/usr/local/var/lib/rabbitmq/mnesia/rabbit@localhost-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/usr/local/var/lib/rabbitmq/mnesia/rabbit@localhost" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672

root 19209 0.0 0.0 2467612 388 ?? S 6:00PM 0:00.04 /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../../erts-6.4/bin/epmd -daemon

I killed both rabbit processes via `kill PID`, and restarted `rabbitmq-server -detached` again. Ended up in the same unresponsive state.

LOGS:

dtrace output of running `sudo dtruss rabbitmq-server` after killing both rabbit processes: https://gist.github.com/vitaly-krugl/0e069eecf19db31bd24c

dtrace output for /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../../erts-6.4/bin/beam.smp log: https://gist.github.com/vitaly-krugl/739ab8b4e146c31741e1

dtrace output for /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../../erts-6.4/bin/epmd -daemon: https://gist.github.com/vitaly-krugl/fc63a0d4d336f7ebc7d7

rab...@localhost-sasl.log: https://gist.github.com/vitaly-krugl/865beb80fff5a9c060d4

rab...@localhost.log: https://gist.github.com/vitaly-krugl/a137daddeff175522254

vitaly numenta

unread,

Nov 5, 2015, 10:06:36 PM11/5/15

to rabbitmq-users

Got this crash dump on subsequent attempt to restart rabbitmq via `rabbitmq-server -detached`: https://gist.github.com/vitaly-krugl/6f091d1c84e078f4303a. This restart was performed after killing both rabbit processes (using kill, since `rabbitmqctl stop` just hang).

ps shows that only one rabbit process remains running after attempted restart:

vkruglikov@vkruglikovs-MacBook-Pro-2:~$ ps aux | grep -i rabbit

vkruglikov 20213 0.0 0.0 2432772 680 s003 S+ 6:59PM 0:00.00 grep -i rabbit

vkruglikov 20165 0.0 0.0 2459420 372 ?? S 6:57PM 0:00.00 /usr/local/Cellar/rabbitmq/3.5.6/erts-6.4/bin/../../erts-6.4/bin/epmd -daemon

vkruglikov@vkruglikovs-MacBook-Pro-2:~$

vitaly numenta

unread,

Nov 5, 2015, 10:10:44 PM11/5/15

to rabbitmq-users

Just as before, after renaming /usr/local/var/lib/rabbitmq/mnesia, I was able to restart rabbitmq and start using it again successfully. I saved the old mnesia directory. Please let me know if you need to see any parts of it.

Michael Klishin

unread,

Nov 5, 2015, 10:58:53 PM11/5/15

to rabbitm...@googlegroups.com

SASL log is full of nxdomain errors which is a domain resolution failure. This may be a red herring but is specific to your environment (VPN affects hostname resolution)

Plus the node tries to rebuild indices from scratch.

How much data was there on disk at the time of restart?

--

Michael Klishin

unread,

Nov 5, 2015, 10:59:29 PM11/5/15

to rabbitm...@googlegroups.com

All of it.

vitaly numenta

unread,

Nov 6, 2015, 1:21:37 AM11/6/15

to rabbitmq-users

There was very little in mnesia at the time that rabbitmq was unresponsive:

vkruglikov@vkruglikovs-MacBook-Pro-2:/usr/local/var/lib/rabbitmq (master)$ du -sh UNRESPONSIVE-MNESIA-OLD/

4.1M UNRESPONSIVE-MNESIA-OLD/

vkruglikov@vkruglikovs-MacBook-Pro-2:/usr/local/var/lib/rabbitmq (master)$ ls UNRESPONSIVE-MNESIA-OLD/

rabbit@localhost rabbit@localhost-plugins-expand rab...@localhost.pid

vkruglikov@vkruglikovs-MacBook-Pro-2:/usr/local/var/lib/rabbitmq (master)$

And there is a ton of free disk space on the system:

$ df -h /

Filesystem Size Used Avail Capacity iused ifree %iused Mounted on

/dev/disk1 465Gi 153Gi 312Gi 33% 40125538 81720769 33% /

Michael Klishin

unread,

Nov 6, 2015, 1:30:45 AM11/6/15

to rabbitm...@googlegroups.com, vitaly numenta

On 6 November 2015 at 09:21:40, vitaly numenta (vitaly.kru...@gmail.com) wrote:
> There was very little in mnesia at the time that rabbitmq was unresponsive:
>
> vkruglikov@vkruglikovs-MacBook-Pro-2:/usr/local/var/lib/rabbitmq
> (master)$ du -sh UNRESPONSIVE-MNESIA-OLD/
>
>
> 4.1M UNRESPONSIVE-MNESIA-OLD/

Can you please archive it and send it to us (off list, if necessary)?

vitaly numenta

unread,

Nov 6, 2015, 1:37:45 AM11/6/15

to rabbitmq-users, vitaly.kru...@gmail.com

The archived mnesia directory is here: https://github.com/vitaly-krugl/publicly-shared-files/blob/master/unresponsive.mnesia.tar

Alvaro Videla

unread,

Nov 6, 2015, 4:41:18 AM11/6/15

to rabbitm...@googlegroups.com, vitaly numenta

You have the error {error,nxdomain} which might be caused by hostname resolution. This could be related to using the VPN, or it could be related to a lack of file descriptors. Erlang starts a new connection in order to see if it can resolve the hostname, so if you don't have enough file descriptors, then it will fail with the nxdomain error.

About the VPN, if the VPN changes the machine name, and then you switch the VPN off, and the machine name goes back to normal, then it's quite possible that it won't start.

On Fri, Nov 6, 2015 at 7:37 AM, vitaly numenta <vitaly.kru...@gmail.com> wrote:

The archived mnesia directory is here: https://github.com/vitaly-krugl/publicly-shared-files/blob/master/unresponsive.mnesia.tar

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Michael Klishin

unread,

Nov 6, 2015, 5:06:54 AM11/6/15

to rabbitm...@googlegroups.com, Alvaro Videla, vitaly numenta

Álvaro,

Can we try restoring from the database archive

provided by Vitaly? I think I've seen at least one

similar report.

In particular, rebuilding indices that takes forever

on a small data set doesn't sound right.

MK

Alvaro Videla

unread,

Nov 6, 2015, 6:35:04 AM11/6/15

to rabbitm...@googlegroups.com

I can reproduce the issue, but I need to get some things ready before diving into this.

vitaly numenta

unread,

Nov 6, 2015, 11:55:08 AM11/6/15

to rabbitmq-users, vitaly.kru...@gmail.com

Alvaro,

once it gets into this state, the unresponsive state persists even after stopping VPN; and even after rebooting the machine and not starting VPN at all.

Alvaro Videla

unread,

Nov 9, 2015, 8:39:54 PM11/9/15

to rabbitm...@googlegroups.com, vitaly numenta

Hi,

After some debugging, this is what I've found:

The server is blocked during startup because the index is trying to recover, but it fails since it can't open the index journal due to lack of file descriptors.

What's your ulimit -n value?

In my Mac that value is 256 by default, but once I bumped it to 1024, then I could start RabbitMQ using the Mnesia folder you shared above.

I'll try to dig deeper into this so we are able to report the right error message back, and most probably make the broker start fail, instead of just hang.

Regards,

Alvaro

vitaly numenta

unread,

Nov 9, 2015, 8:46:54 PM11/9/15

to rabbitmq-users, vitaly.kru...@gmail.com

Hi Alvaro,

+10 for reporting/failing and not hanging.

My `ulimit -n` is 256 as well. I will restore the amnesia files and try both with the current as well as bumped up limits, and post my results here.

vitaly numenta

unread,

Nov 9, 2015, 8:54:50 PM11/9/15

to rabbitmq-users, vitaly.kru...@gmail.com

Alvaro, what is the appropriate way to set and persist the desired `ulimit -n` setting on Mac OS X? I did some googling, and there appears to be be confusion about how to do this on the Mac.

Thx.

Alvaro Videla

unread,

Nov 9, 2015, 9:11:32 PM11/9/15

to rabbitm...@googlegroups.com, vitaly numenta

See our guide here: https://www.rabbitmq.com/install-homebrew.html#osx-max-open-files-limit

--

vitaly numenta

unread,

Nov 10, 2015, 1:42:23 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Hi Alvaro, I have good news and bad news - when I set ulimit to 1024 via rabbitmq-env.conf and restart RabbitMQ with the "unresponsive" mnesia database, it appears to start up okay, with the exception of these two WARNINGS:

=WARNING REPORT==== 9-Nov-2015::22:18:24 ===

msg_store_persistent: rebuilding indices from scratch

=WARNING REPORT==== 9-Nov-2015::22:18:24 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

However, if I gracefully stop RabbitMQ (`rabbitmqctl stop`), then set `ulimit -n` back to the default (256) and restart RabbitMq (`rabbitmq-server -detached`), it *hangs* again.

My expectation was that once Mnesia indices are rebuilt, RabbitMQ would be able to work fine with the default `ulimit -n` of 256 after that.

The logs with `ulimit -n`=1024 and subsequent graceful restart with `ulimit -n`=256 are below

Starting with "UNRESPONSIVE" mnesia database and `ulimit -n`=1024 ==> NO HANG

=INFO REPORT==== 9-Nov-2015::22:17:58 ===

Halting Erlang VM

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Starting RabbitMQ 3.5.6 on Erlang 17.5

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

node : rabbit@localhost

home dir : /Users/vkruglikov

config file(s) : /usr/local/etc/rabbitmq/rabbitmq.config (not found)

cookie hash : weyu9iv/l8eIOTGaIPFcYw==

log : /usr/local/var/log/rabbitmq/rab...@localhost.log

sasl log : /usr/local/var/log/rabbitmq/rab...@localhost-sasl.log

database dir : /usr/local/var/lib/rabbitmq/mnesia/rabbit@localhost

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Memory limit set to 6006MB of 15015MB total.

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Disk free limit set to 50MB

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Limiting to approx 924 file handles (829 sockets)

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

FHC read buffering: ON

FHC write buffering: ON

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 9-Nov-2015::22:18:23 ===

Management plugin: using rates mode 'basic'

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 9-Nov-2015::22:18:24 ===

msg_store_persistent: rebuilding indices from scratch

=WARNING REPORT==== 9-Nov-2015::22:18:24 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

started TCP Listener on 127.0.0.1:5672

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

rabbit_stomp: default user 'guest' enabled

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

started STOMP TCP Listener on [::]:61613

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

started MQTT TCP Listener on [::]:1883

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

Management plugin started. Port: 15672

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

Statistics database started.

=INFO REPORT==== 9-Nov-2015::22:18:24 ===

Server startup complete; 10 plugins started.

Restarting gracefully after above, but with `ulimit -n` = 256 ==> HANG AGAIN

=INFO REPORT==== 9-Nov-2015::22:32:13 ===

Halting Erlang VM

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Starting RabbitMQ 3.5.6 on Erlang 17.5

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

node : rabbit@localhost

home dir : /Users/vkruglikov

config file(s) : /usr/local/etc/rabbitmq/rabbitmq.config (not found)

cookie hash : weyu9iv/l8eIOTGaIPFcYw==

log : /usr/local/var/log/rabbitmq/rab...@localhost.log

sasl log : /usr/local/var/log/rabbitmq/rab...@localhost-sasl.log

database dir : /usr/local/var/lib/rabbitmq/mnesia/rabbit@localhost

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Memory limit set to 6023MB of 15059MB total.

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Disk free limit set to 50MB

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Limiting to approx 156 file handles (138 sockets)

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

FHC read buffering: ON

FHC write buffering: ON

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

Management plugin: using rates mode 'basic'

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 9-Nov-2015::22:32:16 ===

msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 9-Nov-2015::22:32:16 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

=WARNING REPORT==== 9-Nov-2015::22:32:16 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

=WARNING REPORT==== 9-Nov-2015::22:32:16 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

=WARNING REPORT==== 9-Nov-2015::22:32:16 ===

Mnesia(rabbit@localhost): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

vitaly numenta

unread,

Nov 10, 2015, 1:51:30 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

After rebuilding the indices on this database with `ulimit -n` = 1024, the number of file descriptors remains at over 600 *without any AMQP connections* besides the Web GUI management plugin. It's also over 600 on subsequent graceful restarts. However, when running on a recently-created dataset that doesn't exhibit this issue, the WebGUI displays the number of file descriptors as a little over 80.

Alvaro Videla

unread,

Nov 10, 2015, 1:53:19 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

The problem is not related to index rebuild, but to RabbitMQ running out of file descriptors. Each queue in the system needs one FD for the queue index journal. So what you see is expected behavior so to speak.

--

Alvaro Videla

unread,

Nov 10, 2015, 1:54:54 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

i.e.: the problem is about the number of queues in the system vs. the number of available FDs.

vitaly numenta

unread,

Nov 10, 2015, 2:11:36 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Alvaro, thank you for the clarification. I recall reading somewhere in the docs that RabbitMQ maintains a cache of open file descriptors, but is able to close them as needed in order to make resources available for other operations. Did I get it wrong? If this many queue (I had almost 600) requires so many file descriptors (far above 256 that my system allowed originally), then how was RabbitMQ even able to create so many queue in the first place without running out of file descriptors?

Alvaro Videla

unread,

Nov 10, 2015, 2:20:06 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

The queue index journal FD is not created right away during queue creation.

On Tue, Nov 10, 2015 at 8:11 AM vitaly numenta <vitaly.kru...@gmail.com> wrote:

Alvaro, thank you for the clarification. I recall reading somewhere in the docs that RabbitMQ maintains a cache of open file descriptors, but is able to close them as needed in order to make resources available for other operations. Did I get it wrong? If this many queue (I had almost 600) requires so many file descriptors (far above 256 that my system allowed originally), then how was RabbitMQ even able to create so many queue in the first place without running out of file descriptors?

vitaly numenta

unread,

Nov 10, 2015, 2:25:49 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Great, thank you! Indeed, after deleting a bunch of queues, I was able to restart RabbitMQ successfully with `ulimit -n` = 256.

It would be awesome if RabbitMQ could log something really informative and fail, instead of hanging in this scenario.

Many thanks!

Vitaly

Michael Klishin

unread,

Nov 10, 2015, 2:44:58 AM11/10/15

to rabbitm...@googlegroups.com, vitaly.kru...@gmail.com

What if RabbitMQ needs more than 256 file handles (which includes sockets) overall?

Sorry but this expectation is unreasonable.
No developer environment should have such low limits: it is dirt cheap to bump that to 100K and forget about this issue forever.

ulimit defaults in modern OS X and Linux
versions are absolutely insane, in particular for running data services. Use them at your own peril, in development environments included.

Michael Klishin

unread,

Nov 10, 2015, 2:47:01 AM11/10/15

to rabbitm...@googlegroups.com, vitaly.kru...@gmail.com

Logging is a good idea if we can do it.
I'm much less convinced about nodes stopping.

vitaly numenta

unread,

Nov 10, 2015, 2:50:52 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Hi Michael, thanks for chiming in. I only mentioned 256 as confirmation that things work as expected, and there aren't any other problems. I get it now. I bumped up the limit to a much higher number. What caught me off-guard is that I've been running RabbitMQ for about 4 years now with lots of queues and have never run into this issue until recently.

Best regards,

Vitaly

Michael Klishin

unread,

Nov 10, 2015, 2:51:45 AM11/10/15

to rabbitm...@googlegroups.com, vitaly.kru...@gmail.com

Having a cache does not mean you would never need more than 256 file descriptors per process.

Please bump up the number to 100-500K in all environments you have and move on.

Not having enough file descriptors is not a problem data services such as RabbitMQ should be solving for developers and ops.

PostgreSQL, Riak, Cassandra docs all talk about bumping kernel limits. So does RabbitMQ's. And it is not a coincidence.

On 10 nov 2015, at 10:11, vitaly numenta <vitaly.kru...@gmail.com> wrote:

Alvaro, thank you for the clarification. I recall reading somewhere in the docs that RabbitMQ maintains a cache of open file descriptors, but is able to close them as needed in order to make resources available for other operations. Did I get it wrong? If this many queue (I had almost 600) requires so many file descriptors (far above 256 that my system allowed originally), then how was RabbitMQ even able to create so many queue in the first place without running out of file descriptors?

vitaly numenta

unread,

Nov 10, 2015, 2:53:10 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Yes, an informative log message would be awesome all by itself.

vitaly numenta

unread,

Nov 10, 2015, 2:59:01 AM11/10/15

to rabbitmq-users, vitaly.kru...@gmail.com

Thanks Michael. I get it. I don't expect RabbitMQ to optimize for low file descriptor limits. You guys are doing an amazing job. I am personally always amazed, pleased, and surprised that you and Alvaro can be so responsive on this list.

Michael Klishin

unread,

Nov 10, 2015, 3:18:36 AM11/10/15

to rabbitm...@googlegroups.com, vitaly numenta

On 10 November 2015 at 10:53:20, vitaly numenta (vitaly.kru...@gmail.com) wrote:
> Yes, an informative log message would be awesome all by itself.

Issue filed:
https://github.com/rabbitmq/rabbitmq-server/issues/413

Reply all

Reply to author

Forward