Help needed with server down issue.

731 views
Skip to first unread message

lazar kanev

unread,
Feb 22, 2015, 5:47:22 PM2/22/15
to rabbitm...@googlegroups.com
Hi, All.

Recently I've integrated rabbitMQ server into our application suite and it's pretty working well partially.  But if i send bunch of messages, the rabbit server is down and it's out of control. After I stoped aws ec2 instance, wait for a while and reboot it, I could open the log file and it says :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 

=ERROR REPORT==== 18-Feb-2015::18:21:01 ===

** Generic server rabbit_disk_monitor terminating

** Last message in was update

** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit@ip-172-31-26-2",

                               2000000000,5928431616,100,10000,

                               #Ref<0.0.22.11681>,false}

** Reason for termination ==

** {unparseable,[]}


=INFO REPORT==== 18-Feb-2015::18:21:02 ===

Disabling disk free space monitoring on unsupported platform:

{{'EXIT',{unparseable,[]}},3951349760}


=ERROR REPORT==== 18-Feb-2015::18:21:06 ===

** Generic server rabbit_mgmt_external_stats terminating

** Last message in was emit_update

** When Server state == {state,1024}

** Reason for termination ==

** {unsupported_platform,{gen_server,call,

                                     [rabbit_disk_monitor,get_disk_free_limit,

                                      infinity]}}


=INFO REPORT==== 18-Feb-2015::18:21:11 ===

Disabling disk free space monitoring on unsupported platform:

{{'EXIT',{badarg,[{erlang,port_command,

                          [#Port<0.156531>,

                           [40,

                            "/bin/df -kP /var/lib/rabbitmq/mnesia/rabbit@ip-172-31-26-2",

                            10,41,32,60,47,100,101,118,47,110,117,108,108,59,

                            32,101,99,104,111,32,32,34,4,34,10]]},

                  {os,'-unix_cmd/1-fun-0-',2}]}},

 3951349760}

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


So, to sum up, I had an issue with rabbitmq disk free space monitor. I'm running rabbitmq server on The Amazon Linux AMI (64bit).

Is there anyone who can assist me with this issue ?

Michael Klishin

unread,
Feb 22, 2015, 6:05:38 PM2/22/15
to rabbitm...@googlegroups.com, lazar kanev
 On 23 February 2015 at 01:47:24, lazar kanev (lazar...@gmail.com) wrote:
> Recently I've integrated rabbitMQ server into our application
> suite and it's pretty working well partially.
> But if i send bunch
> of messages, the rabbit server is down and it's out of control. 

This is not particularly specific.

How many is "a bunch"? What is average message size? What exactly does
"out of control" mean? What else is in the log?

Not having disk monitoring has only one downside: your publishers may run
the machine out of disk space and RabbitMQ cannot block them because it has no
idea how much disk space is left on the partition where its database is located.

> After I stoped aws ec2 instance, wait for a while and reboot it

Does your node have the same hostname between boots? RabbitMQ database
location currently includes hostname and some parts of the database store
hostnames as part of cluster member names.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

lazar kanev

unread,
Feb 23, 2015, 2:40:23 AM2/23/15
to rabbitm...@googlegroups.com, lazar...@gmail.com
Hi,

Thanks for your quick reply. 

I'm sending data records in our csv files through rabbit queue and the size of one file is around 90MB. And one file is split into around 1500 messages so the size of one message would be around 60~70 kb.

I have uploaded 2 or 3 files before server is crashed and 2500 messages were queued at that time. At that time of server crash, the disk read operation was rapidly increased by monitoring (120,000). 

After crash, I couldn't access server (through ssh) because it's timed out and also rabbitmq management console in the browser. I stopped instance and started it again but it didn't work out immediately. I waited several hours(5 or 6) and then I could access to server again.

I also set higher limit in the configuration like below:

  1. vm_memory_high_watermark, 0.6
  2. vm_memory_high_watermark_paging_ratio, 0.6
  3. disk_free_limit, 2000000000
My server has 4GB ram.

And I found out another warnings and errors in my log file:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

=WARNING REPORT==== 21-Feb-2015::02:06:07 ===

Mnesia('rabbit@ip-172-31-26-2'): ** WARNING ** Mnesia is overloaded: {dump_log,

                                                                      time_threshold}


=ERROR REPORT==== 21-Feb-2015::02:06:17 ===

** Generic server rabbit_mgmt_db terminating

.........
 
.........

** Reason for termination ==

** {badarith,[{rabbit_mgmt_db,append_sample,6},

              {rabbit_mgmt_db,'-append_samples/6-lc$^1/1-0-',6},

              {rabbit_mgmt_db,append_samples,6},

              {rabbit_mgmt_db,handle_stats,6},

              {rabbit_mgmt_db,handle_cast,2},

              {gen_server2,handle_msg,2},

              {proc_lib,wake_up,3}]}

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I didn't set any other configurations besides above ones related with memory. I checked out the node name after reboot which is <rabbit@ip-172-31-26-2> and node name before crash in the log file which is the same <rabbit@ip-172-31-26-2>. And the mnesia db file name was <rabbit@ip-172-31-26-2>.

Can you please explain me what has happened ?

Thanks in advance.

Michael Klishin

unread,
Feb 23, 2015, 3:44:22 AM2/23/15
to rabbitm...@googlegroups.com, lazar kanev
On 23 February 2015 at 10:40:24, lazar kanev (lazar...@gmail.com) wrote:
> After crash, I couldn't access server (through ssh) because
> it's timed out and also rabbitmq management console in the browser.
> I stopped instance and started it again but it didn't work out
> immediately. I waited several hours(5 or 6) and then I could access
> to server again.

RabbitMQ cannot affect sshd (even if it uses about 100% of every available core, you should
still be able to ssh in)

> I also set higher limit in the configuration like below:
>
> vm_memory_high_watermark, 0.6
> vm_memory_high_watermark_paging_ratio, 0.6
> disk_free_limit, 2000000000

It's nice that you have a higher-than-default disk limit. As you can see in the log, RabbitMQ disk
monitor cannot run on your distribution and is thus disabled. disk_free_limit has no affect
after that.

RabbitMQ disk monitor uses `/bin/df -kP [database directory]` to monitor disk space.
That command outputs a value that the disk monitor cannot parse (seemingly no output at all).

I suspect that your machine simply runs out of disk space, which affects everything until for some
reason some space is cleared (e.g. some temp files are cleaned up).

df -h and VM monitoring should clarify this.

Please provide

 * the entire log
 * `rabbitmqctl status` output
 * "/bin/df -kP /var/lib/rabbitmq/mnesia/rabbit@ip-172-31-26-2" output (this is the 2nd time we are asking for this)
 * `lsb_release -a` output.

Having your AMI UID would be helpful, too. 

andre...@gmail.com

unread,
Jul 8, 2015, 10:33:56 AM7/8/15
to rabbitm...@googlegroups.com, lazar...@gmail.com
Hi,

I am having exactly the same issue on multiple servers.

Errors in the logs


=INFO REPORT==== 8-Jul-2015::14:29:51 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:51 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=ERROR REPORT==== 8-Jul-2015::14:29:52 ===
Error in process <0.19927.945> on node 'rabbit@rabbitmq-staging-02' with exit value: {badarg,[{erlang,port_command,[#Port<0.21065126>,[40,<<63 bytes>>,[10,41,32,60,47,100,101,118,47,110,117,108,108,59,32,101,99,104,111,32,32,34,4,34,10]]],[]},{os,'-unix_cmd/1-fun-0-',2,[{
file
,"os.erl"},{line,219}]}]}


=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{badarg,[{erlang,port_command,

                         
[#Port<0.21065126>,
                           
[40,
                           
<<"/bin/df -kP /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-staging-02">>,
                           
[10,41,32,60,47,100,101,118,47,110,117,108,108,59,
                             
32,101,99,104,111,32,32,34,4,34,10]]],
                         
[]},
                 
{os,'-unix_cmd/1-fun-0-',2,[{file,"os.erl"},{line,219}]}]}},
 
1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}

=INFO REPORT==== 8-Jul-2015::14:29:52 ===

Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{unparseable,[]}},1962668032}



Here is the output of the commands run as  rabbitmq user

bash-4.1$ whoami
rabbitmq
bash
-4.1$ /bin/df -kP /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-staging-02
Filesystem     1024-blocks     Used Available Capacity Mounted on
/dev/xvda1        41278544 17095100  22086920      44% /
bash
-4.1$


Status

bash-4.1$ rabbitmqctl status
Status of node 'rabbit@rabbitmq-staging-02' ...
[{pid,8849},
 
{running_applications,
     
[{rabbitmq_management_visualiser,"RabbitMQ Visualiser","3.4.4"},
     
{rabbitmq_management,"RabbitMQ Management Console","3.4.4"},
     
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.4.4"},
     
{webmachine,"webmachine","1.10.3-rmq3.4.4-gite9359c7"},
     
{mochiweb,"MochiMedia Web Server","2.7.0-rmq3.4.4-git680dba8"},
     
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.4.4"},
     
{rabbit,"RabbitMQ","3.4.4"},
     
{os_mon,"CPO  CXC 138 46","2.3.1"},
     
{inets,"INETS  CXC 138 49","5.10.6"},
     
{mnesia,"MNESIA  CXC 138 12","4.12.5"},
     
{amqp_client,"RabbitMQ AMQP Client","3.4.4"},
     
{xmerl,"XML parser","1.3.7"},
     
{sasl,"SASL  CXC 138 11","2.4.1"},
     
{stdlib,"ERTS  CXC 138 10","2.4"},
     
{kernel,"ERTS  CXC 138 10","3.2"}]},
 
{os,{unix,linux}},
 
{erlang_version,
     
"Erlang/OTP 17 [erts-6.4] [source-2e19e2f] [64-bit] [smp:2:2] [async-threads:30] [hipe] [kernel-poll:true]\n"},
 
{memory,
     
[{total,45585256},
     
{connection_readers,0},
     
{connection_writers,0},
     
{connection_channels,0},
     
{connection_other,5616},
     
{queue_procs,44968},
     
{queue_slave_procs,86208},
     
{plugins,86544},
     
{other_proc,13846008},
     
{mnesia,127040},
     
{mgmt_db,11912},
     
{msg_index,157368},
     
{other_ets,1146496},
     
{binary,158328},
     
{code,19948934},
     
{atom,2795937},
     
{other_system,7169897}]},
 
{alarms,[]},
 
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 
{vm_memory_high_watermark,0.4},
 
{vm_memory_limit,785067212},
 
{file_descriptors,
     
[{total_limit,924},{total_used,4},{sockets_limit,829},{sockets_used,1}]},
 
{processes,[{limit,1048576},{used,188}]},
 
{run_queue,0},
 
{uptime,4067521}]
bash
-4.1$




Release :

bash-4.1$ lsb_release -a
LSB
Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch
Distributor ID:    CentOS
Description:    CentOS release 6.6 (Final)
Release:    6.6
Codename:    Final
bash
-4.1$



Any help appreciated.
Reply all
Reply to author
Forward
0 new messages