salt-stack multi-master setup is unreliable - What am I doing wrong?

93 views
Skip to first unread message

Stefan Bogun

unread,
Jan 20, 2021, 4:13:25 PM1/20/21
to Salt-users
I have to manage a cluster of ~600 ubuntu (16.04-20.04) servers using Saltstack 3002.

I decided to install a multi-master setup for load-distribution and fault-tolerance. salt-syndic appeared not the right choice for me. Instead I thought the salt-minions should pick a master from a list by random (?) at minion start. So my config looks as follows (excerpts):

master:

auto_accept: True
master_sign_pubkey: True
master_use_pubkey_signature: True

minion:

master:
 - saltmaster001
 - saltmaster002
 - saltmaster003
verify_master_pubkey_sign: True
retry_dns: 0
master_type: failover
random_master: True


(three salt masters as you can see). I basically followed this tutorial: https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html. All masters have the same master keypair, so the minions can connect with any of them. And they do, as the minions' keys are listed on all masters as accepted

But, it doesn't work really well... For various reasons:

salt 'tnscass*' test.ping
tnscass011.mo-mobile-prod.ams2.cloud: True
tnscass010.mo-mobile-prod.ams2.cloud: True 
tnscass004.mo-mobile-prod.ams2.cloud: True 
tnscass005.mo-mobile-prod.ams2.cloud: Minion did not return. [Not connected] 
tnscass003.mo-mobile-prod.ams2.cloud: Minion did not return. [Not connected] 
tnscass007.mo-mobile-prod.ams2.cloud: Minion did not return. [Not connected]

It seems that each minion is connected to one master only at any particular moment in time, and only on that master all salt run succeeds for that minion... I would have expected that I can run salt on any master and it would connect to all targeted minions. Why is this not the case?

On a minion a sall-call run does often look as follows:

root@minion:~# salt-call state.apply
[WARNING ] Master ip address changed from 10.48.40.93 to 10.48.42.32

[WARNING ] Master ip address changed from 10.48.42.32 to 10.48.42.35

So the minion decides to switch to another master and the salt-call takes ages... The rules that determine under which condition a minion decides to switch to another master are not explained (at least I couldn't find anything)... Is it the load on the master? The number of connected minions?...

Another problem is the salt mines. I'm using code as follows:

salt.saltutil.runner('mine.get', tgt='role:mopsbrokeraggr', fun='network.get_hostname', tgt_type='grain') 

Unfortunately, the values of the mines differ badly from minion to minion, so also mines are unusable.

I should mention that my masters are big machines with 16 cores and 128GB RAM, so this is not a matter of resource shortage.

To me, the scenario described in https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html does simply not work at all.

  • So if anybody could tell me how to create a proper setup with 3 saltmasters for load distribution?
  • Is salt-syndic actually the better approach?
  • Can salt-syndic be used with randomly assigning the minions to the masters based on load or whatever?
  • what is the purpose of the mentioned tutorial? Or do I just have overlooked anything?


Phipps, Thomas

unread,
Jan 20, 2021, 5:00:50 PM1/20/21
to salt-...@googlegroups.com
The problem is master_type: failover. failover means the minion will connect to a single master. and if it loses connection to that master it will reconnect to another master. This is normally used with syndic at the same time so you have a master of masters that you can control all of the masters at the same time.
comment out that setting and the minion will connect to all of the masters listed, instead of just one. 

For mines with multimaster you want to use a different data cache. by default salt uses the localfs cache so the mines are not shared between masters. see https://docs.saltproject.io/en/latest/ref/cache/all/index.html for a list of other data cache modules.

--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/salt-users/3001bf6e-c78f-4c77-b373-115fe4e0a06en%40googlegroups.com.

Simon Lundström

unread,
Jan 21, 2021, 4:38:08 AM1/21/21
to salt-...@googlegroups.com
How would you deal with salt-api and/or zeromq when using multi-master?

Does multi-master really provide load-distribution since all minions are
connected to all minions? I guess jobs might be distributed if they
create a high load of the masters?

We thought about this really hard and could not figure it out so we went
with active/passive masters.

BR,
- Simon

On Wed, 2021-01-20 at 23:00:29 +0100, Phipps, Thomas wrote:
>The problem is master_type: failover. failover means the minion will connect to a single master. and if it loses connection to that master it will reconnect to another master. This is normally used with syndic at the same time so you have a master of masters that you can control all of the masters at the same time.
>comment out that setting and the minion will connect to all of the masters listed, instead of just one.
>
>For mines with multimaster you want to use a different data cache. by default salt uses the localfs cache so the mines are not shared between masters. see https://docs.saltproject.io/en/latest/ref/cache/all/index.html<https://docs.saltproject.io/en/latest/ref/cache/all/index.html> for a list of other data cache modules.
> * So if anybody could tell me how to create a proper setup with 3 saltmasters for load distribution?
> * Is salt-syndic actually the better approach?
> * Can salt-syndic be used with randomly assigning the minions to the masters based on load or whatever?
> * what is the purpose of the mentioned tutorial? Or do I just have overlooked anything?
>
>
>
>--
>You received this message because you are subscribed to the Google Groups "Salt-users" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com<mailto:salt-users+...@googlegroups.com>.
>To view this discussion on the web visit https://groups.google.com/d/msgid/salt-users/3001bf6e-c78f-4c77-b373-115fe4e0a06en%40googlegroups.com<https://groups.google.com/d/msgid/salt-users/3001bf6e-c78f-4c77-b373-115fe4e0a06en%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>--
>You received this message because you are subscribed to the Google Groups "Salt-users" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com<mailto:salt-users+...@googlegroups.com>.
>To view this discussion on the web visit https://groups.google.com/d/msgid/salt-users/CAPaX09hHEaDT2xudLpJ6ep-xA%3DVkMqBbK5oTtexKphxe_FjEkQ%40mail.gmail.com<https://groups.google.com/d/msgid/salt-users/CAPaX09hHEaDT2xudLpJ6ep-xA%3DVkMqBbK5oTtexKphxe_FjEkQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
/salt-users/CAPaX09hHEaDT2xudLpJ6ep-xA%3DVkMqBbK5oTtexKphxe_FjEkQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Cleber Paiva de Souza

unread,
Feb 13, 2021, 12:55:22 PM2/13/21
to salt-...@googlegroups.com
Hi Simon,

We have done a setup like that and the connections are distributed among multi-master nodes in a round-robin basis and not at load basis. salt-api was behind a load balancer, but remember that data cache needs to be shared among multi-master nodes. In our case we have chosen redis but there are other options over there.
Get out of salt-sync for this kind of setup.

To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/salt-users/YAlLdybmoag371%2Bb%40kaka.it.su.se.


--
Cleber Paiva de Souza
Reply all
Reply to author
Forward
0 new messages