MQTT Federation plugin fails to start with below error

588 views
Skip to first unread message

杜昱萱

unread,
Jun 27, 2018, 6:15:59 AM6/27/18
to rabbitmq-users

2018-06-27 08:39:22.355 [info] <0.3327.0> connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672): user 'impact' authenticated and granted access to vhost '/'
2018-06-27 08:39:22.378 [info] <0.3327.0> closing AMQP connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672, vhost: '/', user: 'impact')
2018-06-27 08:39:36.975 [error] <0.3287.0> ** Generic server <0.3287.0> terminating

    • Last message in was {'$gen_cast',maybe_go}
    • When Server state == {not_started,upstream,[<<"amqp://impact:impa...@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,'on-confirm',none,<<"worker-federation-upstream">>,false},{upstream_params,<<"amqp://impact:impa...@rabbitmqedge.default.svc.cluster.local.">>,{amqp_params_network,<<"impact">>,<<"impact123">>,<<"/">>,"rabbitmqedge.default.svc.cluster.local.",undefined,0,0,10,60000,none,Fun<amqp_uri.11.130013139>,#Fun<amqp_uri.11.130013139>,[],[]},{exchange,{resource,<<"/">>,exchange,<<"amq.topic">>},topic,true,false,false,[],[{federation,[]}],[{vhost,<<"/">>},{name,<<"workerforwardpolicy">>},{pattern,<<"amq.topic">>},{'apply-to',<<"exchanges">>},{definition,[\{<<"federation-upstream-set">>,<<"all">>}]},{priority,0}],undefined,{[],[rabbit_federation_exchange]},#{user => <<"rmq-internal">>,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>,[\{<<"uri">>,longstr,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>},\{<<"exchange">>,longstr,<<"amq.topic">>}]},{resource,<<"/">>,exchange,<<"amq.topic">>}}}
    • Reason for termination ==
    • {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
      2018-06-27 08:39:36.976 [error] <0.3287.0> CRASH REPORT Process <0.3287.0> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}} in gen_server2:terminate/3 line 1161
      2018-06-27 08:39:36.976 [error] <0.2344.0> Supervisor {<0.2344.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqp://impact:impa...@rabbitmqedge.default.svc.cluster.local.">>],
      <<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,
      'on-confirm',none,<<"worker-federation-upstream">>,false} started with rabbit_federation_exchange_link:start_link(upstream,[<<"amqp://impact:impa...@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,...},...}) at <0.3287.0> exit with reason {timeout,{gen_server,call,[<0.3303.0>,connect,60000] in context child_terminated
      2018-06-27 08:39:36.976 [warning] <0.3297.0> Channel (<0.3297.0>): Unregistering confirm handler <0.3287.0> because it died. Reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
      2018-06-27 08:39:36.986 [error] <0.3307.0> ** Generic server <0.3307.0> terminating
    • Last message in was {inet_async,#Port<0.35539>,346,{error,timeout}}
    • When Server state == {state,#Port<0.35539>,<0.3303.0>,<0.3305.0>,{method,rabbit_framing_amqp_0_9_1},{expecting_header,<<>>}}
    • Reason for termination ==
    • {socket_error,timeout}
      2018-06-27 08:39:36.986 [error] <0.3307.0> CRASH REPORT Process <0.3307.0> with 0 neighbours exited with reason: {socket_error,timeout} in gen_server:handle_common_reply/8 line 726
      2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason {socket_error,timeout} in context child_terminated
      2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason reached_max_restart_intensity in context shutdown

Michael Klishin

unread,
Jun 27, 2018, 6:40:16 AM6/27/18
to rabbitm...@googlegroups.com
All this says that a federation link failed to connect, and the root cause seems to be a `{socket_error,timeout}`.

There is no such thing as "MQTT federation", federation can involve queues used by other protocols
but those connections are AMQP 0-9-1 ones. See [1].


On Wed, Jun 27, 2018 at 1:15 PM, 杜昱萱 <beibei1...@gmail.com> wrote:

2018-06-27 08:39:22.355 [info] <0.3327.0> connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672): user 'impact' authenticated and granted access to vhost '/'
2018-06-27 08:39:22.378 [info] <0.3327.0> closing AMQP connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672, vhost: '/', user: 'impact')
2018-06-27 08:39:36.975 [error] <0.3287.0> ** Generic server <0.3287.0> terminating

    • Last message in was {'$gen_cast',maybe_go}
    • When Server state == {not_started,upstream,[<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,'on-confirm',none,<<"worker-federation-upstream">>,false},{upstream_params,<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>,{amqp_params_network,<<"impact">>,<<"impact123">>,<<"/">>,"rabbitmqedge.default.svc.cluster.local.",undefined,0,0,10,60000,none,Fun<amqp_uri.11.130013139>,#Fun<amqp_uri.11.130013139>,[],[]},{exchange,{resource,<<"/">>,exchange,<<"amq.topic">>},topic,true,false,false,[],[{federation,[]}],[{vhost,<<"/">>},{name,<<"workerforwardpolicy">>},{pattern,<<"amq.topic">>},{'apply-to',<<"exchanges">>},{definition,[\{<<"federation-upstream-set">>,<<"all">>}]},{priority,0}],undefined,{[],[rabbit_federation_exchange]},#{user => <<"rmq-internal">>,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>,[\{<<"uri">>,longstr,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>},\{<<"exchange">>,longstr,<<"amq.topic">>}]},{resource,<<"/">>,exchange,<<"amq.topic">>}}}
    • Reason for termination ==
    • {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
      2018-06-27 08:39:36.976 [error] <0.3287.0> CRASH REPORT Process <0.3287.0> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}} in gen_server2:terminate/3 line 1161
    • 2018-06-27 08:39:36.976 [error] <0.2344.0> Supervisor {<0.2344.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>],
      <<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,
      'on-confirm',none,<<"worker-federation-upstream">>,false} started with rabbit_federation_exchange_link:start_link(upstream,[<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,...},...}) at <0.3287.0> exit with reason {timeout,{gen_server,call,[<0.3303.0>,connect,60000] in context child_terminated

    • 2018-06-27 08:39:36.976 [warning] <0.3297.0> Channel (<0.3297.0>): Unregistering confirm handler <0.3287.0> because it died. Reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
      2018-06-27 08:39:36.986 [error] <0.3307.0> ** Generic server <0.3307.0> terminating
    • Last message in was {inet_async,#Port<0.35539>,346,{error,timeout}}
    • When Server state == {state,#Port<0.35539>,<0.3303.0>,<0.3305.0>,{method,rabbit_framing_amqp_0_9_1},{expecting_header,<<>>}}
    • Reason for termination ==
    • {socket_error,timeout}
      2018-06-27 08:39:36.986 [error] <0.3307.0> CRASH REPORT Process <0.3307.0> with 0 neighbours exited with reason: {socket_error,timeout} in gen_server:handle_common_reply/8 line 726
      2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason {socket_error,timeout} in context child_terminated
      2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason reached_max_restart_intensity in context shutdown

    --
    You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.



    --
    MK

    Staff Software Engineer, Pivotal/RabbitMQ

    杜昱萱

    unread,
    Jun 27, 2018, 7:10:16 AM6/27/18
    to rabbitmq-users
    Using federation exchange.
    When run rabbitmqctl list_connections on the upstream node, it hangs there.

    在 2018年6月27日星期三 UTC+8下午6:40:16,Michael Klishin写道:
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
    To post to this group, send email to rabbitm...@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    Michael Klishin

    unread,
    Jun 27, 2018, 7:16:43 AM6/27/18
    to rabbitm...@googlegroups.com
    I reiterate the recommendation to use the process and tools covered in [1]. Do not guess, collect evidence instead.

    See server logs on the upstream node as well (192.168.1.177). They may contain clues.


    On Wed, Jun 27, 2018 at 2:10 PM, 杜昱萱 <beibei1...@gmail.com> wrote:
    Using federation exchange.
    When run rabbitmqctl list_connections on the upstream node, it hangs there.

    在 2018年6月27日星期三 UTC+8下午6:40:16,Michael Klishin写道:
    All this says that a federation link failed to connect, and the root cause seems to be a `{socket_error,timeout}`.

    There is no such thing as "MQTT federation", federation can involve queues used by other protocols
    but those connections are AMQP 0-9-1 ones. See [1].

    On Wed, Jun 27, 2018 at 1:15 PM, 杜昱萱 <beibei1...@gmail.com> wrote:

    2018-06-27 08:39:22.355 [info] <0.3327.0> connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672): user 'impact' authenticated and granted access to vhost '/'
    2018-06-27 08:39:22.378 [info] <0.3327.0> closing AMQP connection <0.3327.0> (192.168.1.25:36789 -> 192.168.1.242:5672, vhost: '/', user: 'impact')
    2018-06-27 08:39:36.975 [error] <0.3287.0> ** Generic server <0.3287.0> terminating

      • Last message in was {'$gen_cast',maybe_go}
      • When Server state == {not_started,upstream,[<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,'on-confirm',none,<<"worker-federation-upstream">>,false},{upstream_params,<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>,{amqp_params_network,<<"impact">>,<<"impact123">>,<<"/">>,"rabbitmqedge.default.svc.cluster.local.",undefined,0,0,10,60000,none,Fun<amqp_uri.11.130013139>,#Fun<amqp_uri.11.130013139>,[],[]},{exchange,{resource,<<"/">>,exchange,<<"amq.topic">>},topic,true,false,false,[],[{federation,[]}],[{vhost,<<"/">>},{name,<<"workerforwardpolicy">>},{pattern,<<"amq.topic">>},{'apply-to',<<"exchanges">>},{definition,[\{<<"federation-upstream-set">>,<<"all">>}]},{priority,0}],undefined,{[],[rabbit_federation_exchange]},#{user => <<"rmq-internal">>,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>,[\{<<"uri">>,longstr,<<"amqp://rabbitmqedge.default.svc.cluster.local.">>},\{<<"exchange">>,longstr,<<"amq.topic">>}]},{resource,<<"/">>,exchange,<<"amq.topic">>}}}
      • Reason for termination ==
      • {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
        2018-06-27 08:39:36.976 [error] <0.3287.0> CRASH REPORT Process <0.3287.0> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}} in gen_server2:terminate/3 line 1161
      • 2018-06-27 08:39:36.976 [error] <0.2344.0> Supervisor {<0.2344.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqp://impact:impac...@rabbitmqedge.default.svc.cluster.local.">>],

      • <<"amq.topic">>,<<"amq.topic">>,1000,1,60,none,none,true,
        'on-confirm',none,<<"worker-federation-upstream">>,false} started with rabbit_federation_exchange_link:start_link(upstream,[<<"amqp://impact:impact123@rabbitmqedge.default.svc.cluster.local.">>],<<"amq.topic">>,...},...}) at <0.3287.0> exit with reason {timeout,{gen_server,call,[<0.3303.0>,connect,60000] in context child_terminated
        2018-06-27 08:39:36.976 [warning] <0.3297.0> Channel (<0.3297.0>): Unregistering confirm handler <0.3287.0> because it died. Reason: {timeout,{gen_server,call,[<0.3303.0>,connect,60000]}}
        2018-06-27 08:39:36.986 [error] <0.3307.0> ** Generic server <0.3307.0> terminating
      • Last message in was {inet_async,#Port<0.35539>,346,{error,timeout}}
      • When Server state == {state,#Port<0.35539>,<0.3303.0>,<0.3305.0>,{method,rabbit_framing_amqp_0_9_1},{expecting_header,<<>>}}
      • Reason for termination ==
      • {socket_error,timeout}
        2018-06-27 08:39:36.986 [error] <0.3307.0> CRASH REPORT Process <0.3307.0> with 0 neighbours exited with reason: {socket_error,timeout} in gen_server:handle_common_reply/8 line 726
        2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason {socket_error,timeout} in context child_terminated
        2018-06-27 08:39:36.986 [error] <0.3302.0> Supervisor {<0.3302.0>,amqp_connection_type_sup} had child main_reader started with amqp_main_reader:start_link(#Port<0.35539>, <0.3303.0>, <0.3305.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 192.168.1.242:52698 -> 192.168.1.177:5672">>) at <0.3307.0> exit with reason reached_max_restart_intensity in context shutdown

    --
    You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
    To post to this group, send email to rabbitm...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.



    --
    MK

    Staff Software Engineer, Pivotal/RabbitMQ

    --
    You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    杜昱萱

    unread,
    Jun 27, 2018, 7:54:09 AM6/27/18
    to rabbitmq-users
    Timeout error from server log.


    2018-06-27 11:27:14.887 [info] <0.1292.0> Connection <0.1292.0> (192.168.1.242:48532 -> 192.168.1.177:5672) has a client-provided name: Federation link (upstream: worker-federation-upstream, policy: workerforwardpolicy)
    2018-06-27 11:27:21.500 [error] <0.1285.0> ** Generic server <0.1285.0> terminating
    ** Last message in was {inet_async,#Port<0.35128>,304,{ok,<<16,88,0,6,77,81,73,115,100,112,3,194,0,60,0,17,111,115,98,111,120,101,115,95,49,53,53,56,51,54,50,53,54,0,9,77,81,84,84,84,69,83,84,50,0,44,104,98,87,67,121,52,78,81,75,84,50,51,81,110,104,67,81,56,73,54,86,101,74,84,43,120,54,57,110,119,111,87,121,88,68,116,54,82,65,117,75,115,85,61>>}}
    ** When Server state == {state,#Port<0.35128>,"192.168.1.24:56884 -> 192.168.1.177:1883",true,undefined,false,running,{none,none},<0.1284.0>,false,none,{proc_state,#Port<0.35128>,#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"amq.topic">>,{amqp_adapter_info,{0,0,0,0,0,65535,49320,433},1883,{0,0,0,0,0,65535,49320,280},56884,<<"192.168.1.24:56884 -> 192.168.1.177:1883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,false}]},none,undefined,undefined,#Fun<rabbit_mqtt_processor.0.7917564>},undefined,{state,fine,5000,undefined}}
    ** Reason for termination ==
    ** {timeout,{gen_server,call,[<0.1288.0>,connect,60000]}}
    2018-06-27 11:27:21.500 [error] <0.1285.0> CRASH REPORT Process <0.1285.0> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.1288.0>,connect,60000]}} in gen_server2:terminate/3 line 1161
    2018-06-27 11:27:21.500 [error] <0.1283.0> Supervisor {<0.1283.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1284.0>, {acceptor,{0,0,0,0,0,0,0,0},1883}, #Port<0.35128>) at <0.1285.0> exit with reason {timeout,{gen_server,call,[<0.1288.0>,connect,60000]}} in context child_terminated
    2018-06-27 11:27:21.500 [error] <0.1283.0> Supervisor {<0.1283.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1284.0>, {acceptor,{0,0,0,0,0,0,0,0},1883}, #Port<0.35128>) at <0.1285.0> exit with reason reached_max_restart_intensity in context shutdown
    2018-06-27 11:27:22.112 [warning] <0.1268.0> closing AMQP connection <0.1268.0> (192.168.1.242:39616 -> 192.168.1.177:5672 - Federation link (upstream: worker-federation-upstream, policy: workerforwardpolicy)):
    {handshake_timeout,frame_header}
    2018-06-27 11:27:27.741 [info] <0.1299.0> MQTT vhost picked using plugin configuration or default
    2018-06-27 11:28:27.744 [error] <0.1299.0> ** Generic server <0.1299.0> terminating
    ** Last message in was {inet_async,#Port<0.35244>,312,{ok,<<16,86,0,4,77,81,84,84,4,194,0,60,0,17,111,115,98,111,120,101,115,95,49,53,53,56,51,54,50,53,54,0,9,77,81,84,84,84,69,83,84,50,0,44,104,98,87,67,121,52,78,81,75,84,50,51,81,110,104,67,81,56,73,54,86,101,74,84,43,120,54,57,110,119,111,87,121,88,68,116,54,82,65,117,75,115,85,61>>}}
    ** When Server state == {state,#Port<0.35244>,"192.168.1.176:62205 -> 192.168.1.177:1883",true,undefined,false,running,{none,none},<0.1298.0>,false,none,{proc_state,#Port<0.35244>,#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"amq.topic">>,{amqp_adapter_info,{0,0,0,0,0,65535,49320,433},1883,{0,0,0,0,0,65535,49320,432},62205,<<"192.168.1.176:62205 -> 192.168.1.177:1883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,false}]},none,undefined,undefined,#Fun<rabbit_mqtt_processor.0.7917564>},undefined,{state,fine,5000,undefined}}
    ** Reason for termination ==
    ** {timeout,{gen_server,call,[<0.1302.0>,connect,60000]}}
    2018-06-27 11:28:27.744 [error] <0.1299.0> CRASH REPORT Process <0.1299.0> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.1302.0>,connect,60000]}} in gen_server2:terminate/3 line 1161
    2018-06-27 11:28:27.744 [error] <0.1297.0> Supervisor {<0.1297.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1298.0>, {acceptor,{0,0,0,0,0,0,0,0},1883}, #Port<0.35244>) at <0.1299.0> exit with reason {timeout,{gen_server,call,[<0.1302.0>,connect,60000]}} in context child_terminated
    2018-06-27 11:28:27.745 [error] <0.1297.0> Supervisor {<0.1297.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1298.0>, {acceptor,{0,0,0,0,0,0,0,0},1883}, #Port<0.35244>) at <0.1299.0> exit with reason reached_max_restart_intensity in context shutdown
    2018-06-27 11:28:27.867 [info] <0.1306.0> MQTT vhost picked using plugin configuration or default

    杜昱萱

    unread,
    Jun 27, 2018, 7:58:41 AM6/27/18
    to rabbitmq-users
    Two clusters can ping to each other.

    在 2018年6月27日星期三 UTC+8下午7:54:09,杜昱萱写道:

    Michael Klishin

    unread,
    Jun 27, 2018, 8:36:17 AM6/27/18
    to rabbitm...@googlegroups.com
    There are two separate things going on in this log. In one case

    >2018-06-27 11:27:22.112 [warning] <0.1268.0> closing AMQP connection <0.1268.0> (192.168.1.242:39616 -> 192.168.1.177:5672 - Federation link (upstream: worker-federation-upstream, policy: workerforwardpolicy)):
    {handshake_timeout,frame_header}

    a federation link timed out when it tried to connect ({timeout,{gen_server,call,[<0.1288.0>,connect,60000]}} is the same thing in a different stack frame)

    I suspect that could trip up an MQTT connection in turn but we don't have enough information about the system to come up with a hypothesis as to how that could happen.

    I'm not sure what you mean by "ping" but what happened some time ago reflects the state of the system then, and not now.
    `ping` the command line tool is nearly useless in investigating issues with TCP-based protocols since `ping` does not use TCP.

    Again, use the tools (such as telnet, traceroute and so on) to narrow down the issue if it is still ongoing.

    --
    You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    杜昱萱

    unread,
    Jun 28, 2018, 7:58:27 AM6/28/18
    to rabbitmq-users
    Using tcpdump, the network works fine, but the connection hasn't been setup:



    在 2018年6月27日星期三 UTC+8下午8:36:17,Michael Klishin写道:
    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
    To post to this group, send email to rabbitm...@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    Michael Klishin

    unread,
    Jun 28, 2018, 9:33:39 AM6/28/18
    to rabbitm...@googlegroups.com
    What Erlang version is used? Versions up to 19.3.6.4 are known to have severe bugs that prevent
    nodes from accepting any TCP connections [1].

    See server logs as all authentication failures will be logged [2] (including from other nodes and CLI tools, and thus downstream "direct" connections exchange federation uses).


    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    杜昱萱

    unread,
    Jun 28, 2018, 10:48:00 AM6/28/18
    to rabbitmq-users
    The rabbitmq-server is 3.7.0, the erlang version is 20.x.


    在 2018年6月28日星期四 UTC+8下午9:33:39,Michael Klishin写道:

    杜昱萱

    unread,
    Jun 28, 2018, 11:21:03 AM6/28/18
    to rabbitmq-users
    Run list_vhosts and list_permissions on both upstream cluster and downstream cluser, get the same result as following:
    bash-4.2# rabbitmqctl list_vhosts
    warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell)
    Listing vhosts ...
    /
    bash-4.2#
    bash-4.2# rabbitmqctl list_permissions -p /
    warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell)
    Listing permissions for vhost "/" ...
    impact  .*      .*      .*

    And from the tcpdump package, find that:

    No.1408, (and before) 192.168.1.242 it use 46663  to send AMQP to 192.168.1.177:5672.  Everything seems is ok.

    From No.1964, port 55331 is used and fail begins. Curious about why 55331 is used and whether is the apply to amqp protocol?



    在 2018年6月28日星期四 UTC+8下午10:48:00,杜昱萱写道:

    Michael Klishin

    unread,
    Jun 28, 2018, 4:45:28 PM6/28/18
    to rabbitm...@googlegroups.com
    46663 and 55331 are client TCP ports. They are allocated by the client OS from the ephemeral port range.

    See server logs around the time of the events. If some clients can connect but others can't there has to be a reason,
    such as authentication failure, that is logged.

    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    杜昱萱

    unread,
    Jun 29, 2018, 1:52:02 AM6/29/18
    to rabbitmq-users
    So the 2 port belongs to 2 dependent sessions.
    From the server log, it shows timeout. No authentication related info.
    And in the tcpdump, analysis the connection between 46663 and 5672, the amqp response [4879] to [1408] is too late after more than 100 seconds.
    While don't know the delay reasons.



    在 2018年6月29日星期五 UTC+8上午4:45:28,Michael Klishin写道:

    Michael Klishin

    unread,
    Jun 29, 2018, 2:19:20 AM6/29/18
    to rabbitm...@googlegroups.com
    So the link sends connection.start-ok and then nothing, so the server times out the handshake.
    I don’t think I’ve seen such scenarios before.

    Does federation connect successfully from other nodes?
    Message has been deleted

    杜昱萱

    unread,
    Jun 29, 2018, 6:47:06 AM6/29/18
    to rabbitmq-users
    This is a bi-direct federation design:
    192.168.1.177 is in Cluster1 and 196.168.1.242 is in Cluster2.
    Cluster1 is federation upstream of cluster2;
    Cluster2 is federation upstream of cluster1.
     
    From cluster1, the federation link is in starting status;
    From cluster2, the federation link is in running status.

    在 2018年6月29日星期五 UTC+8下午2:19:20,Michael Klishin写道:

    Michael Klishin

    unread,
    Jun 29, 2018, 3:47:49 PM6/29/18
    to rabbitm...@googlegroups.com
    That doesn't really answer my question. Are there any other clients that successfully connect besides one-off experiments,
    in particular federation links from other hosts?

    To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
    To post to this group, send email to rabbitmq-users@googlegroups.com.

    For more options, visit https://groups.google.com/d/optout.

    杜昱萱

    unread,
    Jun 29, 2018, 9:41:42 PM6/29/18
    to rabbitmq-users
    No, no client can successfully build federation link.
    I'm collecting debug level log for more clue.

    在 2018年6月30日星期六 UTC+8上午3:47:49,Michael Klishin写道:
    Reply all
    Reply to author
    Forward
    0 new messages