Multi threading support in Ansible

326 views
Skip to first unread message

Jagadeeshkumar Dittakavi

unread,
Jun 4, 2020, 2:18:31 PM6/4/20
to Ansible Project
I am a newbie to ansible but I got to explore how to run a tasks in parallel by spawing a thread for each task instead of a process. My requirement is to run the playbook on my localhost and there is no remote task execution needed.
I also would like to wait for all threads to complete before I move on to a task that has to be serialised.

Can I chose thread vs process when it comes to parallel task execution?
If it is possible to spawn threads from ansible, are they equivalent to python greenthreads or pthreads or something else?

Thank you in advance!


Matt Martz

unread,
Jun 4, 2020, 2:31:14 PM6/4/20
to ansible...@googlegroups.com
The only current process model is forking.  There has been some work done to add a threaded process model, but there are some large hurdles to overcome.

In practice, it is not necessarily more performant, and in many cases it was less performant, as it causes more CPU contention on a single core that is already resource constrained.

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/0d59cf96-b053-4390-8dbb-663c70403104o%40googlegroups.com.


--
Matt Martz
@sivel
sivel.net

Jagadeeshkumar Dittakavi

unread,
Jun 4, 2020, 2:50:38 PM6/4/20
to Ansible Project
Thank you for the prompt reply.. Just a curious question: Is the threading work that is underway based on python threads or pthreads or any other threading mechanism? As you mentioned that the threading model is not going to be performant, was the reason being the python's GIL?


On Friday, June 5, 2020 at 12:01:14 AM UTC+5:30, Matt Martz wrote:
The only current process model is forking.  There has been some work done to add a threaded process model, but there are some large hurdles to overcome.

In practice, it is not necessarily more performant, and in many cases it was less performant, as it causes more CPU contention on a single core that is already resource constrained.

On Thu, Jun 4, 2020 at 1:18 PM Jagadeeshkumar Dittakavi <d.jagad...@gmail.com> wrote:
I am a newbie to ansible but I got to explore how to run a tasks in parallel by spawing a thread for each task instead of a process. My requirement is to run the playbook on my localhost and there is no remote task execution needed.
I also would like to wait for all threads to complete before I move on to a task that has to be serialised.

Can I chose thread vs process when it comes to parallel task execution?
If it is possible to spawn threads from ansible, are they equivalent to python greenthreads or pthreads or something else?

Thank you in advance!


--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible...@googlegroups.com.

Matt Martz

unread,
Jun 4, 2020, 2:59:30 PM6/4/20
to ansible...@googlegroups.com
Yes, it would utilize the threading library in Python.  The GIL is a primary cause to the CPU restrictions.  Our main process that orchestrates all of the task executions is already heavily CPU bound, so adding additional threads to the same core can cause a decrease in performance.  Assuming we create a process model plugin type, other process models are possible, such as using asyncio, concurrent.futures, gevent, etc.  But I don't expect this work to be complete any time soon.

So for now, consider forking the only process model for the near future.

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/48f0ed5f-83c7-4bf4-a991-94ec78ce832eo%40googlegroups.com.

Jagadeeshkumar Dittakavi

unread,
Jun 5, 2020, 12:20:19 AM6/5/20
to Ansible Project
Thank you Matt for the detailed and quick reply.. Much appreciated the support from the community.


On Friday, June 5, 2020 at 12:29:30 AM UTC+5:30, Matt Martz wrote:
Yes, it would utilize the threading library in Python.  The GIL is a primary cause to the CPU restrictions.  Our main process that orchestrates all of the task executions is already heavily CPU bound, so adding additional threads to the same core can cause a decrease in performance.  Assuming we create a process model plugin type, other process models are possible, such as using asyncio, concurrent.futures, gevent, etc.  But I don't expect this work to be complete any time soon.

So for now, consider forking the only process model for the near future.

On Thu, Jun 4, 2020 at 1:51 PM Jagadeeshkumar Dittakavi <d.jagad...@gmail.com> wrote:
Thank you for the prompt reply.. Just a curious question: Is the threading work that is underway based on python threads or pthreads or any other threading mechanism? As you mentioned that the threading model is not going to be performant, was the reason being the python's GIL?


On Friday, June 5, 2020 at 12:01:14 AM UTC+5:30, Matt Martz wrote:
The only current process model is forking.  There has been some work done to add a threaded process model, but there are some large hurdles to overcome.

In practice, it is not necessarily more performant, and in many cases it was less performant, as it causes more CPU contention on a single core that is already resource constrained.

On Thu, Jun 4, 2020 at 1:18 PM Jagadeeshkumar Dittakavi <d.jagad...@gmail.com> wrote:
I am a newbie to ansible but I got to explore how to run a tasks in parallel by spawing a thread for each task instead of a process. My requirement is to run the playbook on my localhost and there is no remote task execution needed.
I also would like to wait for all threads to complete before I move on to a task that has to be serialised.

Can I chose thread vs process when it comes to parallel task execution?
If it is possible to spawn threads from ansible, are they equivalent to python greenthreads or pthreads or something else?

Thank you in advance!


--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/0d59cf96-b053-4390-8dbb-663c70403104o%40googlegroups.com.


--
Matt Martz
@sivel
sivel.net

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible...@googlegroups.com.

Jagadeeshkumar Dittakavi

unread,
Jun 11, 2020, 8:00:59 AM6/11/20
to Ansible Project
@Matt, 

Got another question in concurrency support in Ansible.
Is there any way to limit the number of processes that could be spawned on a given host?
My requirement is not to execute the commands/scripts remotely. In my case, the whole play needs to be executed on locahost only.
I have tried a simple test program and noticed that there are as many as 6 processes are spawned to execute 'sleep 20' asynchronously. 

Please kindly revert. Thank you inadvance.

Command: ansible-playbook test_playbook.yml --forks=1

 

Processes:

root      69484  34309  9 04:50 pts/10   00:00:00 /usr/bin/python2 /usr/bin/ansible-playbook test_playbook.yml --forks=1

root      69509      1  0 04:50 ?        00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/async_wrapper.py 198806654079 50 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py _

root      69510  69509  0 04:50 ?        00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/async_wrapper.py 198806654079 50 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py _

root      69511  69510  0 04:50 ?        00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py

root      69512  69511  1 04:50 ?        00:00:00 /usr/bin/python2 /tmp/ansible_f9ckPD/ansible_module_command.py

root      69520  69484  3 04:50 pts/10   00:00:00 /usr/bin/python2 /usr/bin/ansible-playbook test_playbook.yml --forks=1

 

 

Code:

[root@oracle-siha file_copy_test]# cat test_playbook.yml 

- name: Testing processes

  gather_facts: no

  hosts: localhost

  tasks:

    - name: run sleep command

      async: 50

      poll: 0

      command: sleep 20

      register: res

    - name: wait for the completion

      async_status:

        jid: "{{ res.ansible_job_id }}"

      register: output

      until: output.finished

      delay: 5

      retries: 10

Matt Martz

unread,
Jun 11, 2020, 10:24:01 AM6/11/20
to ansible...@googlegroups.com
There are a number of steps involved here.

1. The primary playbook process spawns a worker
2. The worker executes the async_wrapper for the command module
3. The async_wrapper forks to daemonize
4. The async_wrapper executes the transferred module
5. The actual module is contained within what we call AnsiballZ which is a compressed archive, and it extracts and executes the actual python code
6. Actual module executing.

`forks` only limits how many workers can be launched by the primary playbook process, not how many processes will be spawned as a result of the worker.

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/f0093a03-d20b-4187-8bca-913e10770f5fo%40googlegroups.com.

Jagadeeshkumar Dittakavi

unread,
Jun 11, 2020, 10:42:53 AM6/11/20
to Ansible Project
Thank you Matt! 
In the above example I have explicitly passed --forks=1 but still there are 2 worker processes(PIDs 69484 and 69520) were spawned, that means there will be minimum 2 workers get spawned and we can't limit that to one? I understand that there is no control to limit the total number of processes will be spawned by the workers.


On Thursday, June 11, 2020 at 7:54:01 PM UTC+5:30, Matt Martz wrote:
There are a number of steps involved here.

1. The primary playbook process spawns a worker
2. The worker executes the async_wrapper for the command module
3. The async_wrapper forks to daemonize
4. The async_wrapper executes the transferred module
5. The actual module is contained within what we call AnsiballZ which is a compressed archive, and it extracts and executes the actual python code
6. Actual module executing.

`forks` only limits how many workers can be launched by the primary playbook process, not how many processes will be spawned as a result of the worker.

Matt Martz

unread,
Jun 11, 2020, 11:04:00 AM6/11/20
to ansible...@googlegroups.com
You have 1 worker process. One ansible-playbook process is the control process, the other is the worker.

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/e87deb71-f620-461f-ac30-2c61931d8a51o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages