Not forking parrallel tasks?

883 views
Skip to first unread message

David Brossard

unread,
Nov 7, 2013, 1:04:46 PM11/7/13
to ansible...@googlegroups.com
I have found when running ansible with either playbooks or direct commands that the forking is not working as I'd expect. If I add -f 20 or -f 3 or --forks #, it appears I still end up with only 1 remote command running at a time. I check this by looking at the processes running on the ansible server and I only see 1 remote ssh session at a time to my remote machines.
I am running an older version of ansible 1.2.2 because it is the latest in the Ubuntu LTS repo. Has anyone else seen this? Is there a problem with my syntax?

Thanks

Matt Martz

unread,
Nov 7, 2013, 1:06:21 PM11/7/13
to ansible...@googlegroups.com
Forks are used for multiple servers, not multiple commands per server.

So if you are operating over 30 servers, you could set 30 forks and run a single command across all servers at once.
-- 
Matt Martz
ma...@sivel.net
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David Brossard

unread,
Nov 7, 2013, 1:29:32 PM11/7/13
to ansible...@googlegroups.com
Correct. I am running a single command (such as apt-get update) against a dozen or so servers and want it to happen in parallel. Unfortunately, when using --forks 20 or -f 20 I still only get connections to 1 server at a time. They run consecutively and not concurrently.

Brian Coca

unread,
Nov 7, 2013, 2:27:04 PM11/7/13
to ansible...@googlegroups.com
sounds like you have 'serial: 1'   set in the playbook.

David Brossard

unread,
Nov 7, 2013, 5:05:19 PM11/7/13
to ansible...@googlegroups.com
This happens whether I use ansible-playbook or ansible. I do not have serial:1 (or any serial setting) in my playbooks or my ansible.cfg

David Brossard

unread,
Nov 7, 2013, 5:16:10 PM11/7/13
to ansible...@googlegroups.com
For example, something as simple as:

ansible MyServers -m shell -a "uptime " -f 20

Only returns the uptime for server at a time. I have also tried using a sleep statement for example and only a single server returns its results at a time after the sleep interval. They are not running in parallel for some reason.

Brian Coca

unread,
Nov 7, 2013, 5:21:04 PM11/7/13
to ansible...@googlegroups.com
output to your screen is serialized (otherwise it would be intermixed) but the actual remote actions should be in parallel.

--
Brian Coca
Stultorum infinitus est numerus
0110000101110010011001010110111000100111011101000010000001111001011011110111010100100000011100110110110101100001011100100111010000100001
Pedo mellon a minno

David Brossard

unread,
Nov 7, 2013, 5:30:29 PM11/7/13
to ansible...@googlegroups.com
The don't appear to be however, that is my concern. For example I assume a command like:

ansible MyServers -m shell -a "sleep 20" -f 20

When run on a list of less than 20 servers should return output from all servers around 20 seconds after it is initiated. It is not. 1 server replies, 20 seconds later the next server replies, 20 seconds later the next server replies etc.

They are not running in parallel. What am I missing?

Brian Coca

unread,
Nov 7, 2013, 5:36:04 PM11/7/13
to ansible...@googlegroups.com
just tested :

ansible all -m shell -a "sleep 10 && uptime" -f 10

and i get batches of 10 every 10 seconds (as expected)
tested with git checkout and version 1.3.3

David Brossard

unread,
Nov 7, 2013, 5:46:46 PM11/7/13
to ansible...@googlegroups.com
That is the expected behavior I was also hoping to see. But I do not however. How can I trouble shoot this?

James Tanner

unread,
Nov 7, 2013, 5:47:35 PM11/7/13
to ansible...@googlegroups.com, David Brossard
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What connection method are you using?

What verification steps are you taking to count the number of simultaneous connections?

Brian Coca

unread,
Nov 7, 2013, 5:48:42 PM11/7/13
to ansible...@googlegroups.com
start with -vvvv, also I'll check 1.2.2 .. there might be a locking issue that has been fixed in newer versions.

David Brossard

unread,
Nov 7, 2013, 6:06:03 PM11/7/13
to ansible...@googlegroups.com, David Brossard
I have not manually change any connection method so I assume it should still be the default paramiko with shared SSH keys. Here is a test showing that it does not run in parallel. Notice that each response is 10 seconds after the previous one even though -f 10 is specified.

ansible Dev -m shell -a "sleep 10 && date" -f 10
pdx-cass-d02 | success | rc=0 >>
Thu Nov  7 22:54:39 UTC 2013                                                                                                         

pdx-extws-d02 | success | rc=0 >>
Thu Nov  7 22:54:49 UTC 2013                                                                                                         

pdx-intws-d01 | success | rc=0 >>
Thu Nov  7 22:55:00 UTC 2013                                                                                                         

pdx-extws-d01 | success | rc=0 >>
Thu Nov  7 22:55:10 UTC 2013                                                                                                         

pdx-fep-d01 | success | rc=0 >>
Thu Nov  7 22:55:20 UTC 2013                                                                                                         

pdx-cass-d01 | success | rc=0 >>
Thu Nov  7 22:55:30 UTC 2013                                                                                                         

pdx-lb-d01 | success | rc=0 >>
Thu Nov  7 22:55:40 UTC 2013                                                                                                         

pdx-mq-d01 | success | rc=0 >>
Thu Nov  7 22:55:51 UTC 2013                                                                                                         

pdx-sql-d01 | success | rc=0 >>
Thu Nov  7 22:57:35 UTC 2013                                                                                                         

pdx-job-d01 | success | rc=0 >>
Thu Nov  7 22:56:11 UTC 2013                                                                                                         

pdx-listen-d01 | success | rc=0 >>
Thu Nov  7 22:56:21 UTC 2013                                                                                                         

pdx-sql-d02 | success | rc=0 >>
Thu Nov  7 22:58:05 UTC 2013               

David Brossard

unread,
Nov 7, 2013, 6:16:35 PM11/7/13
to ansible...@googlegroups.com
On Thursday, November 7, 2013 2:48:42 PM UTC-8, Brian Coca wrote:
start with -vvvv, also I'll check 1.2.2 .. there might be a locking issue that has been fixed in newer versions.

So using -vvvv it appears that ansible first connects to ALL servers to copy the tmp file for execution and the password files etc. It then connects individually to the first server to execute that command. Once a response is returned it then connects to the next server.
The verbose output is quite extensive. I'm not sure what else I should be looking for...


James Tanner

unread,
Nov 7, 2013, 7:10:35 PM11/7/13
to ansible...@googlegroups.com, David Brossard
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you able to reduce this down to a simple playbook or a module call and reproduce the behavior?

David Brossard

unread,
Nov 7, 2013, 7:28:52 PM11/7/13
to ansible...@googlegroups.com, David Brossard
That is what I was hoping to demonstrate with my simple module call of:

ansible Dev -m shell -a "sleep 10 && date" -f 10

Each result comes back 10 seconds after the previous result.


pdx-cass-d02 | success | rc=0 >>
Thu Nov  7 22:54:39 UTC 2013                                                                                                         

pdx-extws-d02 | success | rc=0 >>
Thu Nov  7 22:54:49 UTC 2013                                                                                                         

pdx-intws-d01 | success | rc=0 >>
Thu Nov  7 22:55:00 UTC 2013                                                                                                         

pdx-extws-d01 | success | rc=0 >>
Thu Nov  7 22:55:10 UTC 2013      


Brian Coca

unread,
Nov 8, 2013, 12:02:21 PM11/8/13
to ansible...@googlegroups.com
so I tested with 1.2.2 and 1.2.3 and cannot reproduce the issue:

ansible all -m shell -a "sleep 10 && uptime" -f 10 

it returns 10 hosts every 10s, as expected

David Brossard

unread,
Nov 8, 2013, 5:16:56 PM11/8/13
to ansible...@googlegroups.com
I will upgrade to ansible 1.3.3 today and see if that doesn't solve my issues.
Thanks

David Brossard

unread,
Nov 25, 2013, 12:19:45 PM11/25/13
to ansible...@googlegroups.com
FYI- Upgrading to 1.3.3 did the trick. I'm not sure why I had the issue earlier that was unreproducible by others.
Thanks for your help everyone.

Silvio Tomatis

unread,
Aug 6, 2015, 1:22:07 PM8/6/15
to Ansible Project
Posting on an old thread, since I just had this same issue with ansible 1.9.2.
In the end, I could solve it removing my persisted ssh connections like this
  rm ~/.ansible/cp/*
I have no idea why exactly this solved the problem but in case you encounter the same issue you can try it.

Silvio Tomatis

unread,
Aug 11, 2015, 6:20:41 AM8/11/15
to Ansible Project
Removing the connections as I mentioned in the last message solved the problem only temporarily (and to be honest I don't know how).
When the issue resurfaced, I dug a little more and ended up on this thread: https://groups.google.com/forum/#!msg/ansible-project/8p3XWlo83ho/KlIqch_UxTEJ

To summarize, I had UserKnownHostsFile /dev/null, so ansible smellt an empty known_hosts file and run in a serial fashion to allow me to type "yes" atfer each host connection.
Setting ansible_host_key_checking=0 solved the issue for good for me.
That explains why David's issue was solved upgrading: ansible 1.3.3 had support for hashed hosts that ansible 1.2.3 hadn't.
I really hope the solution proposed in the other thread will be accepted: a warning about the cause of the serial behaviour would have been priceless.

       Silvio
Reply all
Reply to author
Forward
0 new messages