Extremely high CPU load/utilization on master during highstate run

149 views
Skip to first unread message

Amse Master

unread,
Nov 8, 2017, 6:37:41 PM11/8/17
to Salt-users

I'm running a master on Solaris (version details below), with about 370 minions, a mix of Linux and Solaris.  I know this isn't a huge number, which is why I'm stumped as to the huge CPU load (% utlilized as well as loadavg) I'm seeing when doing highstate runs.  Other state runs and one-off execution calls run fine.

The master is on a Solaris 11 logical domain, with 192 VCPUs and 64GB RAM.

Here are the master settings I've changed so far:

worker_threads: 190
salt_event_pub_hwm: 140000
event_publisher_pub_hwm: 70000


Highstate setup:

top file:
base:
  '*':
    - cpu_test


cpu_test.sls
/tmp/cpu_test:
  file.exists

Running state.apply causes the CPU% to spike to 100, loadavg peaks around 135, and it takes about a minute and a half to complete.
Running state.sls with that same cpu_test SLS, it barely uses any CPU, and completes in about 12 seconds.

There's obviously some serious overhead caused by the state.apply run somewhere. 
I know there are sites out there running thousands of minions, so something must be wrong with my setup, or something.
Can anyone point me in a direction to look for the cause?  I'll be happy to provide any info needed.

-----------------------------------------------------------------------------------
Salt Version:
           Salt: 2016.11.6

Dependency Versions:
           cffi: 1.10.0
       cherrypy: 11.0.0
       dateutil: 2.6.1
      docker-py: Not Installed
          gitdb: 2.0.2
      gitpython: 2.1.5
          ioflo: Not Installed
         Jinja2: 2.9.6
        libgit2: Not Installed
        libnacl: 1.5.1
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.18
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.11 (default, Jul 18 2017, 12:56:12)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: 2.0.3
        timelib: 0.2.4
        Tornado: 4.5.1
            ZMQ: 4.1.6

System Versions:
           dist:
        machine: sun4v
        release: 5.11
         system: SunOS
        version: Not Installed

Pim Jeursen

unread,
Nov 30, 2017, 5:03:54 AM11/30/17
to Salt-users
Something you should absolutely never do is increase your worker threads to such a large number!
Basically your now creating 190 cpu threads which causes the massive delay.
Default the worker_threads is 5, and should be increased with steps of 1 - 2.
I'd start by increasing it to 7, and see what happens.

Also running the master in debug mode gives a lot of information to do so run the following:
Stop the master, than run
salt-master -l debug

Now open a new terminal and kick in the saltstate, and see what happens on your other terminal.

Clint Allen

unread,
Nov 30, 2017, 8:56:16 AM11/30/17
to salt-...@googlegroups.com
Well, the system has 192 CPUs, so I was taking advantage of that. 

I have since discovered that for some reason Salt runs far more efficiently on x86 than on SPARC.  The same job took about 5 minutes to run on an x86 system with 24 CPUs.
--
You received this message because you are subscribed to a topic in the Google Groups "Salt-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/salt-users/5FNVC-IRTzA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to salt-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/salt-users/d72f3f9d-2f1a-4c29-9e02-bf6688477d38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages