Dead salt master killed all minions

105 views
Skip to first unread message

Hinnerk

unread,
Mar 19, 2013, 12:07:08 PM3/19/13
to salt-...@googlegroups.com
This morning our salt master died, probably because of faulty hardware.

Over the night all minions started to spawn processes up to the available memory limit and, having accomplished that, began to fill the available disk storage with the log message cited below (please consider the time stamp increments). After the minions were restarted, they behaved civilized. Now that the master is rebuilt (old name, new ip address, DNS TTL is short enough for clients to get the new ip _if_ they ask), none of the minions (re)connected to it (no new keys, no connection via test.ping).

Is any of this known behaviour or should I start to bug-bomb the issue tracker?


~hinnerk

2013-03-19 09:15:01,433 [salt.minion      ][CRITICAL] Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 756, in tune_in
    self.schedule.eval()
  File "/usr/lib/pymodules/python2.7/salt/utils/schedule.py", line 135, in eval
    thread_cls(target=self.handle_func, args=(func, data)).start()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 120, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

2013-03-19 09:15:01,434 [salt.minion      ][CRITICAL] Traceback (most recent call last):
  [… same as before…]

2013-03-19 09:15:01,434 [salt.minion      ][CRITICAL] Traceback (most recent call last):
  [… same as before…]

2013-03-19 09:15:01,435 [salt.minion      ][CRITICAL] Traceback (most recent call last):
  [… same as before…]



Thomas S Hatch

unread,
Mar 19, 2013, 12:41:24 PM3/19/13
to salt-...@googlegroups.com
Good land, this is new! What version are you on? Please open an issue for us to track this! Are you using the scheduler for anything?

Thomas S. Hatch  |  Founder, CTO


5272 South College Drive, Suite 301 | Murray, UT 84123





--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Hinnerk

unread,
Mar 19, 2013, 12:58:10 PM3/19/13
to salt-...@googlegroups.com
Sorry, somehow this didn't make it into my first message:

Master and all minions did run 0.13.1 on Ubuntu 12.04.2 LTS when this happened. The new master is now at 0.13.3.

Hinnerk Haardt

unread,
Mar 19, 2013, 1:01:03 PM3/19/13
to salt-...@googlegroups.com
Had to dig for this: We use the scheduler to keep everything up to date. Every minion has this included in his pillar data:

schedule:
highstate:
function: state.highstate
minutes: 15
Reply all
Reply to author
Forward
0 new messages