Memory Leak in salt-master?

203 views
Skip to first unread message

Rob Dinoff

unread,
Jul 31, 2015, 11:41:21 AM7/31/15
to Salt-users
I have migrated our salt master from ubuntu 12 to ubuntu 14 and now the new system keeps on swapping.  I have played around with lowering/raising the threads because I sometimes get this error

"Salt request timed out. The master is not responding. If this error persists after verifying the master is up, worker_threads may need to be increased."

If I get this error it means the salt-master has caused the system to swap.  No mater what thread configuration I use after a short time the system will start to swap.  I have 25 minions configured.  Also saw this same problem with 2015.5.2  

If I restart the salt-master everything will work perfectly. 

thanks for any help,

Rob

$ salt-master --version
salt-master 2015.5.3 (Lithium)

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

top - 14:27:50 up 20:37,  1 user,  load average: 0.07, 0.06, 0.07
Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.2 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   2047528 total,  1956616 used,    90912 free,   167100 buffers
KiB Swap:   262140 total,   262140 used,        0 free.    76240 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                     SWAP
 3412 root      20   0 1145260  39644   1756 S   0.0  1.9   0:17.29 salt-master               104436
 3381 root      20   0 1140652  69112   2564 S   0.0  3.4   0:21.39 salt-master                71332
 3421 root      20   0 1122936  99052   3120 S   0.0  4.8   0:38.67 salt-master                23996
 3441 root      20   0 1252336   7340    392 S   0.0  0.4   0:02.06 salt-master                17916
 3371 root      20   0  220904  12056    940 S   0.0  0.6   0:01.02 salt-master                17704
 3199 root      20   0  138028   3708     12 S   0.0  0.2   0:00.34 salt-master                17672
 3370 root      20   0  219956   5916   1388 S   0.0  0.3   0:00.03 salt-master                17436
 3409 root      20   0 1122160 104980   2196 S   0.0  5.1   0:25.51 salt-master                16368
 3387 root      20   0 1146100 130416   3252 S   0.0  6.4   0:29.04 salt-master                16192
 3372 root      20   0  138028   6540   1244 S   0.0  0.3   0:00.10 salt-master                16124
 3428 root      20   0 1108704  94360   4116 S   0.0  4.6   0:03.90 salt-master                15820
 3431 root      20   0 1145532 130576   3324 S   0.0  6.4   0:41.87 salt-master                15668
 3427 root      20   0 1147060 132636   3608 S   0.0  6.5   0:27.15 salt-master                15364
14495 root      20   0 1096096  81664   3036 S   0.0  4.0   0:15.41 salt-master                13492
14726 root      20   0  974332 110724   2944 S   0.0  5.4   0:11.20 salt-master                 9688
15449 root      20   0 1030872  20840   4052 S   0.0  1.0   0:01.10 salt-master                 9120
14748 root      20   0  995768 133764   3392 S   0.0  6.5   0:13.89 salt-master                 8964
15229 root      20   0 1031036  21948   5024 S   0.0  1.1   0:01.36 salt-master                 8904
15165 root      20   0  883576  21228   4132 S   0.0  1.0   0:01.31 salt-master                 8864
15195 root      20   0 1121792 113240   4756 S   0.0  5.5   0:10.34 salt-master                 8860

Thomas Jackson

unread,
Aug 1, 2015, 1:01:58 PM8/1/15
to Salt-users
If you are using zmq (which I'll assume you are) the publisher process has a very bad memory leak. Its been reported a few times (https://github.com/zeromq/libzmq/issues/954 and https://github.com/zeromq/libzmq/issues/1256) but upstream still hasn't fixed it. If you want to verify that its the publisher process you can install python-setproctitle and all of those process names then include their actual function. If thats the case (which I'm fairly certain it is) there isn't much to do :/ You can create some watcher to kill the publisher process when it gets too big-- as the process manager within salt will restart it (usually cases ~100ms of interruption). This is one of the main driving factors behind the recent transport overhaul-- the next release of salt will have a TCP option which (hopefully) won't have this issue ;)

Rob Dinoff

unread,
Aug 1, 2015, 4:07:12 PM8/1/15
to Salt-users
Yes I am using the default zmq.  

thanks .... Rob

Rob Dinoff

unread,
Aug 25, 2015, 1:37:49 PM8/25/15
to Salt-users
With the new version of Salt 2015.5.5 has the zmq upstream patches been incorporated?

thanks,

Rob

On Saturday, August 1, 2015 at 1:01:58 PM UTC-4, Thomas Jackson wrote:

Thomas Jackson

unread,
Aug 26, 2015, 8:04:12 PM8/26/15
to Salt-users
At least at last check (a month or so ago) there is no patch upstream in zmq :(
Reply all
Reply to author
Forward
0 new messages