Minions stuck on 'Initializing new AsyncZeroMQReqChannel'

1,827 views
Skip to first unread message

Artur Ergashev

unread,
Feb 2, 2016, 2:31:12 AM2/2/16
to Salt-users
I have freshly setup minions that refuse to communicate to my salt-master post system restart (worked prior to restart). I have verified that the relevant ports are open (from the minion): 

# nc -v -z 10.0.10.2 4505
Connection to 10.0.10.2 4505 port [tcp/*] succeeded!

# nc -v -z 10.0.10.2 4506
Connection to 10.0.10.2 4506 port [tcp/*] succeeded!

# nc -v -z salt 4505
Connection to salt 4505 port [tcp/*] succeeded!

# nc -v -z salt 4506
Connection to salt 4506 port [tcp/*] succeeded!

The salt name points to 10.0.10.2. There are hits on the IPTables firewall log (either for Port 4505/4506 or for the IPs of either minion or master).

Attempting to run even a salt-call command on the minion:
# salt-call -l debug test.ping
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: pr-ord-all-003
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Please install 'virt-what' to improve results of the 'virtual' grain.
[DEBUG   ] Initializing new SAuth for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (3678)
[DEBUG   ] Setting zmq_reconnect_ivl to '3678ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506', 'aes')
[DEBUG   ] Initializing new SAuth for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506')
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506', 'clear')


or the salt-minion itself:
# salt-minion -l debug
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: pr-ord-all-003
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[INFO    ] Setting up the Salt Minion "pr-ord-all-003"
[DEBUG   ] Created pidfile: /var/run/salt-minion.pid
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Please install 'virt-what' to improve results of the 'virtual' grain.
[INFO    ] The salt minion is starting up
[INFO    ] Minion is starting as user 'root'
[DEBUG   ] AsyncEventPublisher PUB socket URI: ipc:///var/run/salt/minion/minion_event_4513dcab56_pub.ipc
[DEBUG   ] AsyncEventPublisher PULL socket URI: ipc:///var/run/salt/minion/minion_event_4513dcab56_pull.ipc
[INFO    ] Starting pub socket on ipc:///var/run/salt/minion/minion_event_4513dcab56_pub.ipc
[INFO    ] Starting pull socket on ipc:///var/run/salt/minion/minion_event_4513dcab56_pull.ipc
[DEBUG   ] Minion 'pr-ord-all-003' trying to tune in
[DEBUG   ] Initializing new SAuth for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (10299)
[DEBUG   ] Setting zmq_reconnect_ivl to '10299ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506', 'clear')

... results in no activity past the last line last line posted above. Nothing happens past this point, I have left this running for about twenty minutes now with no progress. 

Terminating the salt-call process results in:
^C
Exiting gracefully on Ctrl-c
[DEBUG   ] Error deleting fd from IOLoop
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 554, in remove_handler
    self._impl.unregister(fd)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 125, in unregister
    return self._poller.unregister(fd)
  File "/usr/lib/python2.7/dist-packages/zmq/sugar/poll.py", line 80, in unregister
    idx = self._map.pop(socket)
KeyError: None


And the salt-minion results in:
^C[ERROR   ] No connection to master found. Scheduled jobs will not run.
[CRITICAL] The beacon errored:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1741, in handle_beacons
    beacons = self.process_beacons(self.functions)
AttributeError: 'Minion' object has no attribute 'functions'
[CRITICAL] The beacon errored:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1741, in handle_beacons
    beacons = self.process_beacons(self.functions)
AttributeError: 'Minion' object has no attribute 'functions'
[CRITICAL] The beacon errored:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1741, in handle_beacons
    beacons = self.process_beacons(self.functions)
AttributeError: 'Minion' object has no attribute 'functions'
[CRITICAL] The beacon errored:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1741, in handle_beacons
    beacons = self.process_beacons(self.functions)
AttributeError: 'Minion' object has no attribute 'functions'
[CRITICAL] The beacon errored:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1741, in handle_beacons
    beacons = self.process_beacons(self.functions)
AttributeError: 'Minion' object has no attribute 'functions'
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.7/multiprocessing/util.py", line 325, in _exit_function
    p.join()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 145, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 154, in wait
    return self.poll(0)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 135, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Error in sys.exitfunc:
[INFO    ] The salt minion is shut down


I am running Ubuntu 14.04.3 LTS:
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty


While attempting to run either command I see the following (on the minion) when I run netstat (removed non-related entries):

# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 pr-ord-all-003:41184    salt:4506               TIME_WAIT
tcp        0      0 pr-ord-all-003:41185    salt:4506               TIME_WAIT


What's even more interesting is the master sees the non-working minions as connected:
# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 *:4505                  *:*                     LISTEN
tcp        0      0 *:4506                  *:*                     LISTEN
tcp        0      0 *:1822                  *:*                     LISTEN
tcp        0      0 ip6-localhost:4505      ip6-localhost:35832     ESTABLISHED
tcp        0      0 ip6-localhost:38506     ip6-localhost:4506      TIME_WAIT
tcp        0      0 pr-ord-sal-001:4506     pr-ord-all-002:59864    ESTABLISHED    THIS INSTANCE DOES NOT CONNECT
tcp        0      0 pr-ord-sal-001:1822     66.185.106.232:35386    ESTABLISHED
tcp        0      0 ip6-localhost:38505     ip6-localhost:4506      TIME_WAIT
tcp        0      0 pr-ord-sal-001:4506     pr-ord-all-003:41200    ESTABLISHED    THIS INSTANCE DOES NOT CONNECT
tcp        0      0 pr-ord-sal-001:1822     66.185.106.232:33849    ESTABLISHED
tcp        0      0 ip6-localhost:35832     ip6-localhost:4505      ESTABLISHED
tcp        0      0 pr-ord-sal-001:4505     pr-ord-fs-002:42404     ESTABLISHED
tcp        0      0 pr-ord-sal-001:4506     pr-ord-all-001:37336    ESTABLISHED    THIS INSTANCE DOES NOT CONNECT
tcp        0      0 ip6-localhost:38504     ip6-localhost:4506      TIME_WAIT
tcp        0      0 pr-ord-sal-001:4505     pr-ord-fs-001:38153     ESTABLISHED
tcp        0      0 ip6-localhost:38507     ip6-localhost:4506      TIME_WAIT
tcp6       0      0 [::]:1822               [::]:*                  LISTEN
tcp6       0      0 pr-ord-sal-001:41513    agent-endpoint-lo:https ESTABLISHED
tcp6       0      0 pr-ord-sal-001:59848    agent-endpoint-or:https ESTABLISHED
tcp6       0      0 pr-ord-sal-001:34671    agent-endpoint-df:https ESTABLISHED



Additional information on my installation: 

Master:
# salt --versions-report
Salt Version:
           Salt: 2015.8.3

Dependency Versions:
         Jinja2: 2.7.2
       M2Crypto: Not Installed
           Mako: 0.9.1
         PyYAML: 3.10
          PyZMQ: 14.0.1
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.3.0
   mysql-python: 1.2.3
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: 0.3.6
          smmap: 0.8.2
        timelib: Not Installed

System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-58-generic
         system: Ubuntu 14.04 trusty


Minion:
# salt-call --versions-report
Salt Version:
           Salt: 2015.8.4

Dependency Versions:
         Jinja2: 2.7.2
       M2Crypto: Not Installed
           Mako: 0.9.1
         PyYAML: 3.10
          PyZMQ: 14.0.1
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
           RAET: Not Installed
        Tornado: 3.2
            ZMQ: 4.0.4
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
        libgit2: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.3.0
   mysql-python: 1.2.3
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: 0.3.6
          smmap: 0.8.2
        timelib: Not Installed

System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-58-generic
         system: Ubuntu 14.04 trusty



What's particularly strange is that I provisioned six instances today. Half of them work, can be restarted, re-restablish connection to the master no problem. The others exhibit this behavior. Their configurations are similar but not identical. The three non-working instances were provisioned and highstate-d within five minutes of each other. The OS, Networking and Firewall configurations are identical. 


There is absolutely nothing interesting in /var/log/salt/master. 

This seems like it's a problem in 0MQ but restarting both minions, master, and finally the machines does not seem to remedy the problem. 

Any pointers would be very appreciated. 

Thanks,
Artur

Artur Ergashev

unread,
Feb 2, 2016, 2:39:47 AM2/2/16
to Salt-users
It occurred to me that my initial post did not clarify that I was able to run a highstate on all of the now non-connecting minions through my master. The keys were accepted, the highstate was ran. It was only after restarting the minions that they no longer connect. 

# salt-key -L
Accepted Keys:
pr-ord-all-001
pr-ord-all-002
pr-ord-all-003
pr-ord-fs-001
pr-ord-fs-002
pr-ord-sal-001
Denied Keys:
Unaccepted Keys:
Rejected Keys:

Megan Wilhite

unread,
Feb 2, 2016, 2:15:59 PM2/2/16
to Salt-users
This is definitely weird since the network communication is up and running. Is there anything in the master debug log? Maybe the master process is not running correctly.

My other thought is it looks like your minion is a higher version of salt then your master. According to the documentation: When upgrading Salt, the master(s) should always be upgraded first. Backward compatibility for minions running newer versions of salt than their masters is not guaranteed. 

Maybe upgrade your master or downgrade your minions?

Artur Ergashev

unread,
Feb 2, 2016, 8:56:27 PM2/2/16
to Salt-users
Didn't even notice the version difference, I upgraded so both are on 2015.8.4, there is nothing in my master log. I attempt to run the salt-master with the debug flag and when restarting the broken minions or attempting to issue a command w/ salt-call I get... nothing. 

When I try the same against a working minion I get lots of chattiness in the log (and the command works). 

I attempted to remove the minion from the master (salt-key -d), as well as reinstall the minion. Also no dice. There's no indication that there's anything wrong which is frustrating. Also attempting restarting all servers. 

I took a TCPDump of both Salt + Master, it's available here: http://hastebin.com/vopigafafe.tex

This seems like an outright bug right? Any idea on a workaround? 

Thomas Phipps

unread,
Feb 2, 2016, 9:10:49 PM2/2/16
to salt-...@googlegroups.com
try using salt-call -l trace
 on the minion to find exactly where in the code it is stopping. 

--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Thomas Phipps
Linux/Openstack/Logstash

Artur Ergashev

unread,
Feb 2, 2016, 9:23:49 PM2/2/16
to Salt-users
Same, lots of more data but just freezes on: 

[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'pr-ord-all-003', 'tcp://10.0.10.2:4506', 'clear')


I also attempted to upgrade PyZMQ following the recommendation in https://github.com/saltstack/salt/issues/29453#issuecomment-162096727

So now running 14.5.0 of PyZMQ (and verified in versions report), and... same story. 

Megan Wilhite

unread,
Feb 3, 2016, 7:20:57 PM2/3/16
to Salt-users
hmmmm maybe run tcpdump on your master to make sure you are getting the entire SYN/SYNACK communications between the master and minion when you startup your minion? 

And here is a far fetch idea but possibly try the following
1. delete key on salt-master if pre-accepted before
2. stop salt-minion
3. rm -rf /etc/salt/pki/minion/*
4. start salt-minion

Pierre R

unread,
Apr 27, 2016, 5:23:02 AM4/27/16
to Salt-users
For the record, I have had the exact same problem using the same version 2015.8.8.2 on the minion and the master with RHEL.

The minion was using a very old version of RHEL (6.4 with no updates applied). Updating the OS on the minion solves the problem.

Cheers,

Colton Myers

unread,
Apr 27, 2016, 11:35:08 AM4/27/16
to salt-...@googlegroups.com
One of my coworkers ran into a similar issue. It was caused by an out of date version of ZMQ/PyZMQ. You might try upgrading those, especially if the master has newer versions.

--
Colton Myers

l.ski...@elifesciences.org

unread,
Jul 14, 2016, 6:44:51 AM7/14/16
to Salt-users
Getting the exact same problems here, but with 2016.3

# salt-minion --versions
Salt Version:
           Salt: 2016.3.1
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 0.9.1
   msgpack-pure: Not Installed
 msgpack-python: 0.3.0
   mysql-python: 1.2.3
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
   python-gnupg: Not Installed
         PyYAML: 3.10
          PyZMQ: 14.0.1
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
 
System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-86-generic
         system: Linux
        version: Ubuntu 14.04 trusty

and on the master:


# salt --versions
Salt Version:
           Salt: 2016.3.1
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 0.9.1
   msgpack-pure: Not Installed
 msgpack-python: 0.3.0
   mysql-python: 1.2.3
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
   python-gnupg: Not Installed
         PyYAML: 3.10
          PyZMQ: 14.0.1
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
 
System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-86-generic
         system: Linux
        version: Ubuntu 14.04 trusty



the tailend of the minion call with -l trace looks like:

[DEBUG   ] Connecting to master. Attempt 1 of 1
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'api-gateway--end2end', 'tcp://172.31.24.151:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (5536)
[DEBUG   ] Setting zmq_reconnect_ivl to '5536ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'api-gateway--end2end', 'tcp://172.31.24.151:4506', 'clear')
[TRACE   ] Inserted key into loop_instance_map id 139898824015744 for key ('/etc/salt/pki/minion', 'api-gateway--end2end', 'tcp://172.31.24.151:4506', 'clear') and process 3403
[DEBUG   ] SaltReqTimeoutError, retrying. (1/7)
[DEBUG   ] SaltReqTimeoutError, retrying. (2/7)


will try upgrading PyZMQ now

l.ski...@elifesciences.org

unread,
Jul 14, 2016, 6:49:07 AM7/14/16
to Salt-users
huh - I've just noticed I have an salt-master orphaned from upstart.

..

ok, "killall salt-master" as root and then a salt call on the minion has unblocked it this time, but as the OP said, it is intermittent and inconsistent. Will keep an eye on it. I did *not* upgrade PyZMQ.
Reply all
Reply to author
Forward
0 new messages