Test.ping returns nothing

1,053 views
Skip to first unread message

David Kleiner

unread,
Oct 22, 2012, 10:45:59 PM10/22/12
to salt-...@googlegroups.com
Hello,

On a freshly install 10.3 cluster, the master registered keys and is failing to ping to all minions except for itself. No errors reported by salt-call test.ping. 

The environment is hosted with Softlayer, the minions only have the master: master.host.name in /etc/salt/minion file.  No errors are reported in the log files. 

Where should I look next? 

Thanks,

David

Thomas S Hatch

unread,
Oct 22, 2012, 11:03:10 PM10/22/12
to salt-...@googlegroups.com
I would check the firewall settings on the master first, it needs to have tcp on ports 4505 and 4506 open. then I would try running the minions in the foreground with "salt-minion -l debug" to see what they are doing.

David Kleiner

unread,
Oct 23, 2012, 1:05:09 AM10/23/12
to salt-...@googlegroups.com
Thank you Thomas -

There is nothing interesting in the logs:

[root@minion bin]# ./salt-minion -l debug
[INFO    ] Loaded configuration file: /etc/salt/minion
[WARNING ] Setting up the Salt Minion "minion"
[DEBUG   ] Attempting to authenticate with the Salt Master at master
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion.pem
[INFO    ] Authentication with master successful!
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion.pem
[INFO    ] Minion is starting as user 'root'
[DEBUG   ] Minion "minion" trying to tune in
[DEBUG   ] Minion PUB socket URI: ipc:///var/run/salt/minion_event_hex-string_pub.ipc
[DEBUG   ] Minion PULL socket URI: ipc:///var/run/salt/minion_event_hex_stringpull.ipc

on the master:

master:/usr/local/python/bin# ./salt -v 'minion' test.ping
Executing job with jid 20121023000147216024
-------------------------------------------

The following minions did not return:
minion
#

And:

# ./salt-key -L 
Unaccepted Keys:
Accepted Keys:
minion

Sean Channel

unread,
Oct 23, 2012, 1:40:39 AM10/23/12
to salt-...@googlegroups.com, David Kleiner
Hi David.

Setting "pub_refresh: False" in the master config and re-starting the
master could be worth a try if nothing else is working.

_S.
>> On Mon, Oct 22, 2012 at 8:45 PM, David Kleiner <david....@gmail.com<javascript:>

Matt Black

unread,
Oct 23, 2012, 1:43:09 AM10/23/12
to salt-...@googlegroups.com, David Kleiner
What does the pub_refresh setting do exactly?

I've found that a restart of the master almost always solves any kind of "no response" states I end up in. I've been meaning to investigate why it happens occasionally, but I've not had the time..

Sean Channel

unread,
Oct 23, 2012, 1:51:15 AM10/23/12
to salt-...@googlegroups.com, Matt Black, David Kleiner
pub_refresh, so far as I understand, causes the master to aggressively
try to 'restore' connections to minions in a way that has turned out to
be more disruptive than helpful in some situations. It was added as a
fix for some VM based minions that were loosing connectivity too easily.

In 0.10.3 pub_refresh is True (on) by default, but it is will again be
False by default in 0.10.4, which is about set for release perhaps
within the next day.

_S.

Matt Black

unread,
Oct 23, 2012, 2:37:57 AM10/23/12
to Sean Channel, salt-...@googlegroups.com, David Kleiner
Thanks Sean. That would explain the instability I've seen since 0.10.3.. I did assume it would be ironed out at some point in the future though.

David Kleiner

unread,
Oct 23, 2012, 7:15:49 PM10/23/12
to salt-...@googlegroups.com
A follow-up:

Moved the master from an older debian instance to another, centos 5.8 box and bound minions to the new master.  Took some wiping of master keys in /etc/salt/pki but they started responding to test.ping.  

I kept pub_refresh setting at False. 

Onward with the dev cluster config.

Thank you everyone.

David
Message has been deleted

David Ward

unread,
Oct 23, 2012, 9:48:15 PM10/23/12
to salt-...@googlegroups.com
So is the issue here an OS level issue?

I am having the same problem on Ubuntu 12.04 with both 0.10.2 and now 0.10.3.

Restarting the salt-master daily is my short term solution.
I have an strace of both the salt-master and salt-minion in this state.

Thomas S Hatch

unread,
Oct 23, 2012, 9:50:00 PM10/23/12
to salt-...@googlegroups.com
I think that the thing that we need is actually a tcpdump, I think that there is an issue in Zeromq here.

David Ward

unread,
Oct 23, 2012, 10:00:09 PM10/23/12
to salt-...@googlegroups.com
Hey Thomas.

No worries. I'll do that tomorrow morning when I find it in this state. Which port should I be listening on? 4506? Or both?

Thanks.

PS
libzmq1                                         2.2.0-1chl1~precise
python-zmq                                      2.1.11-1

Thomas S Hatch

unread,
Oct 23, 2012, 10:20:42 PM10/23/12
to salt-...@googlegroups.com
If you can recreate this on a regular basis, then this will be great! The port we need to look at is 4505, we need to get this data back to the ZeroMQ guys. I will look into getting you some updated debs with zeromq 3 to see if that helps as well.

David Ward

unread,
Oct 23, 2012, 10:24:43 PM10/23/12
to salt-...@googlegroups.com
Great. Thanks Thomas. This sounds like a plan.
signature.asc

Sean Channel

unread,
Oct 23, 2012, 10:37:30 PM10/23/12
to salt-...@googlegroups.com, David Ward
Hi David.

There are two pernicious bugs that have been dogging 0.10.3 and previous
on debian in particular. Note that 0.10.4 was just released today and
will be in the ubuntu PPA soon (I'm shooting for tonight).

The two work-arounds are:
1. setting "pub_refresh: False" in the master config. please add that to
your master's config file if it's not there
2. the file /etc/init/salt-master.conf (and minion, etc) contains a line
with the keyword "respawn" that should be commented out or deleted

You'll need to re-start the master daemon after the above. If you find
Salt is unresponsive to re-starting, try "initctl stop salt-master",
followed by same with start (same for minion, etc. also part of #2)

Explanations:
#1 was added to address a niche situlation of VM minions loosing
connections, but turned out to be disruptive to a wider range of situations.
#2 "respawn" is inappropriate for the type of daemon that Salt is. It's
really intended for something like a login program ('getty'). It affects
the daemon's relationship with upstart and hence starting and stopping.

Both of these are addressed in 0.10.4. I sure hope these work-arounds
are helpful for you.

_S.

David Ward

unread,
Oct 24, 2012, 1:03:28 AM10/24/12
to salt-...@googlegroups.com
Thanks Sean.

I read about the pub_refresh in google groups and had made that change
today.

I have just made the 2nd change.

We'll see how it goes.

Thank you for the thorough explanation on the issues at hand.


Regards
David Ward
m: 0410 472 531
skype: DaveQB
twitter: DaveQB14
www: www.dward.us
signature.asc

David Ward

unread,
Oct 25, 2012, 9:46:56 PM10/25/12
to salt-...@googlegroups.com, David Ward
Hi Sean,

So far, so good with those 2 changes. The last two mornings have seen no dropouts in communication with the minions.

I see 10.4 is in the PPA, so I will upgrade to that.

Thanks.

Sean Channel

unread,
Oct 25, 2012, 10:52:51 PM10/25/12
to salt-...@googlegroups.com, David Ward
Good news. Fingers crossed! :)

_S.

Thomas S Hatch

unread,
Oct 26, 2012, 11:45:40 AM10/26/12
to salt-...@googlegroups.com
Yes, I am interested, can you shoot it over?

On Tue, Oct 23, 2012 at 5:47 PM, David Ward <dav...@gmail.com> wrote:
I am seeing a similar issue on 10.2 and now 10.3.

Running on Ubuntu 12.04 master and clients.

This normally appears in the morning when I first login and ping check. Nothing responds. A restart of the salt-master solves this. I have set a cron job to do this nightly now as a work around.

I have an strace log of the salt-master and salt-minion while in this state if anyone is interested.

Thanks.




On Tuesday, October 23, 2012 1:45:59 PM UTC+11, David Kleiner wrote:

David Ward

unread,
Oct 28, 2012, 11:36:52 PM10/28/12
to salt-...@googlegroups.com
Hey Thomas.

I emailed over the strace logs to your email address. Let me know if you didn't get them or want them posted here.

Thanks.

David Ward

unread,
Oct 30, 2012, 11:49:44 PM10/30/12
to salt-...@googlegroups.com
FYI
I am getting a similar symptom issue now but seems to be minion based (on 0.10.4)
https://github.com/saltstack/salt/issues/2402
Reply all
Reply to author
Forward
0 new messages