parallel execution of playbook at a time in multiple hosts

4,348 views
Skip to first unread message

anand...@greycampus.com

unread,
Aug 4, 2015, 9:49:22 AM8/4/15
to Ansible Project
While executing the playbook the following output i obtained.Tasks execution below are done in sequential manner. This is not the requirement because if i want to run this playbook in multiple hosts at a time it takes more time. I have tried so many methods like forks and serial parameters. But nothing change i observe.So please give a suggestion to execution of one playbook in mulltiple hosts at a time.Because i am running wih hunders of hosts.



root@system:~# ansible-playbook main.yml --limit test

PLAY [test] ******************************************************************* 

TASK: [mail | service postfix restart] **************************************** 
changed: [srv613]
changed: [srv612]
changed: [dsrv143]

TASK: [mail | service dovecot restart] **************************************** 
changed: [srv613]
changed: [dsrv143]
changed: [srv612]

TASK: [mail | service opendkim restart] *************************************** 
changed: [dsrv143]
changed: [srv613]
changed: [srv612]

TASK: [mail | service apache2 restart] **************************************** 
changed: [srv613]
changed: [srv612]
changed: [dsrv143]

TASK: [mail | service cron restart] ******************************************* 
changed: [srv613]
changed: [dsrv143]
changed: [srv612]

PLAY RECAP ******************************************************************** 
dsrv143                    : ok=5    changed=5    unreachable=0    failed=0   
srv612                     : ok=5    changed=5    unreachable=0    failed=0   
srv613                     : ok=5    changed=5    unreachable=0    failed=0   

Brian Coca

unread,
Aug 4, 2015, 10:04:14 AM8/4/15
to Ansible Project
playbooks run in a number of hosts in paralel, but the tasks are in
lockstep, all hosts complete task #1 before going to task #2, but each
task is run in parallel in a number of hosts == number of forks
defined (default 5). So with forks = 5 each task will be done in
parallel on 5 hosts at a time until all hosts are done.

serial controls how many hosts go in each batch for the full play, so
if you do serial = 5, 5 hosts will do each task in lockstep until end
of the play, then next 5 hosts start the play.

In 2.0 we introduce strategies that control play executions, so the
default (linear) behaves as per above, a new one called 'free' allows
for each parallel host to run to the end of play w/o waiting for the
other hosts to complete the same task.


--
Brian Coca

anand...@greycampus.com

unread,
Aug 4, 2015, 10:48:33 AM8/4/15
to Ansible Project
  



Thanks for the response .my ansible version is 1.9.2 so shall i update to new version or does any change in my configuration?  Because tasks execution in hosts one after another. I want one particular task in all hosts at a time .But for me particular task in host happening and remaining hosts wait for that host execution. 
   so please give what is modification to be done? 

Brian Coca

unread,
Aug 4, 2015, 10:52:04 AM8/4/15
to Ansible Project
none, the tasks are executing on the 3 hosts in parallel already
(unless you set --forks to 1)
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/630d3358-523e-40dc-863c-b01dd69df808%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Brian Coca

anand...@greycampus.com

unread,
Aug 4, 2015, 11:05:11 AM8/4/15
to Ansible Project
actually it seems like parallel execution but happens one after another. If it is the case happens parallel three outputs of three hosts for particular task cant display at a time .it happen one after another.
timings with forks=1
real 0m45.364s
user 0m0.535s
sys 0m0.246s
 timings wyh forks =3
real 0m20.228s
user 0m0.646s
sys 0m0.358s


Brian Coca

unread,
Aug 4, 2015, 11:10:29 AM8/4/15
to Ansible Project
display is serialized, execution is not


--
Brian Coca

anand...@greycampus.com

unread,
Aug 4, 2015, 11:20:20 AM8/4/15
to Ansible Project
Thanks you. And my another issue is that i have one playbook for complete installation of server.while executing this playbook for one server it takes 40 minutes.So with ansible we can create multiple servers with sane time. But for me it takes suppose if i consists 10 servers it takes 10*40=400 minutes taken .So how to optimize this servers creation. my main.yml is
---
- hosts: all
  remote_user: root
  gather_facts: False
  roles:
    -  mail

In my roles/mail/tasks consists main.yml which consists of 10 yml files. So how to solve this issue? 

Brian Coca

unread,
Aug 4, 2015, 11:23:19 AM8/4/15
to Ansible Project
-f 10 should take slightly over 40 mins, not 400
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/1557b4f4-521f-46ea-84f3-dc7d633b8814%40googlegroups.com.

Brian Coca

unread,
Aug 4, 2015, 11:27:49 AM8/4/15
to Ansible Project
to be more specific, all hosts will complete in the time of your
slowest host, as they wait for it on each task.

In v2 you can get around this using the free strategy which will allow
each host to complete as fast as it can, though the play itself will
still take as long as the slowest host.



--
Brian Coca

anand...@greycampus.com

unread,
Aug 4, 2015, 11:45:04 AM8/4/15
to Ansible Project
 shall i specify forks number in playbook running itself or shall i modify in ansible config file. Because while i modify in ansible.cfg (by default it is forks=5)  to forks=20 it shows an errors .So please give clarity about this?

Brian Coca

unread,
Aug 4, 2015, 11:51:08 AM8/4/15
to Ansible Project
it should work either way, i normally use the -f option in the command
line, but ansible.cfg should also work. What specific error do you
get?
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/a07443bf-c898-4221-a45c-21f7fde1b692%40googlegroups.com.

anand...@greycampus.com

unread,
Aug 5, 2015, 1:34:17 AM8/5/15
to Ansible Project
 Thank you very much , i have small doubt about these forks that is , suppose i have 200 hosts, so may i put forks=500 ? I there any problem while assign forks equal more than hosts.
And my another query is that,  is there any ssh connections limit? if not how to increase ssh connections? and is there any dependency for ssh from ufw?  

anand...@greycampus.com

unread,
Aug 5, 2015, 5:50:08 AM8/5/15
to Ansible Project
I have got the following error:
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 85, in _executor_hook
    result_queue.put(return_data)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-104:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 81, in _executor_hook
    while not job_queue.empty():
  File "<string>", line 2, in empty
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
32



can you give an idea to solve this issue?

Florent Dutheil

unread,
Aug 5, 2015, 9:51:38 AM8/5/15
to Ansible Project
> On Tuesday, August 4, 2015 at 8:57:49 PM UTC+5:30, Brian Coca wrote:
>>
>> to be more specific, all hosts will complete in the time of your
>> slowest host, as they wait for it on each task.
>>
--
Brian Coca

I have a similar issue. I've take out of the picture the playbooks I usually use, and get to simply use the ping module to isolate the issue.
Execution context:
  • my inventory has less than  100 hosts, with only their name in it
  • ssh connection by default to "smart", but with public key authentication
  • SSH ControlMaster feature is disabled to be able to get constant connection time.

What I expect: when getting Forks > Number of machines, as you said, the total execution time should be equal to the slowest host, or to the connection timeout (I've intentionnally left unreachable hosts in my inventory).

What I have:


# time ansible all -i inventory.yml -m ping --forks 5

real    2m40.552s
user    0m3.989s
sys    0m1.760s


# time ansible all -i inventory.yml -m ping --forks 100

real    3m6.231s
user    0m8.267s
sys    0m4.404s


This is 100% reproducible, forks = 100 is always slower than forks = 5 or less. Control machine (laptop i5/8Go RAM) is not having RAM or CPU issue while the command runs, and it happens in a ansible dedicated virtual machine too.

Whith forks set to 100, the output highly suggest some kind of sequential bottleneck (very slow output, 1 host at a time), while with forks = 2 I can see results displayed 2 by 2 with, each step taking at maximum the connection timeout for unreachable hosts.


If you have any suggestion concerning any detail I could/should dig deeper, please share :)

Brian Coca

unread,
Aug 5, 2015, 9:58:23 AM8/5/15
to Ansible Project
@anandkumar so in 1.9 the number of forks will be automatically
reduced to the number of hosts, so specifying a larger number should
not be an issue. You should just adjust it to the resources of your
'master'.

The number of ssh connections from a client is limited only by the
resources available on that machine, on the servers/targets you should
only be making a single connection at a time so no need to tweak
anything there.

The error you get can be caused by hitting a resource limit on your
'master' (check logs, dmesg). Consider that it is not only ssh
connections but ansible forks, which need to copy over the data,
execute it, receive the results and then display them and update
shared variables while still handling all inventory and vars data
provided.


@florent, seems like you are hitting resource constraints on your
ansible machine, try lower number of forks.

--
Brian Coca

Florent Dutheil

unread,
Aug 5, 2015, 11:53:03 AM8/5/15
to Ansible Project


Le mercredi 5 août 2015 15:58:23 UTC+2, Brian Coca a écrit :

@florent, seems like you are hitting resource constraints on your
ansible machine, try lower number of forks.

--
Brian Coca

What type of resource constraint would be we be talking of?

Using wireshark, it seems ansible is not firing up enough DNS requests from the beginning (and only small batchs of requests during the whole execution after that) so that it would honor "forks" simultaneous SSH connections.

So I went down the hypothesis that something is wrong about DNS resolution or dealing with unreachable hosts: I added ansible_ssh_host=<IP> for each host, removed unreachable hosts from the inventory and ran a ping again:

$ time ansible all -i inventory_test -m ping --forks 5
real    0m21.618s
user    0m3.762s
sys    0m1.391s


$ time ansible all -i inventory_test -m ping --forks 20
real    0m17.872s
user    0m5.063s
sys    0m1.840s

$ time ansible all -i inventory_test -m ping --forks 100
real    0m17.341s
user    0m7.701s
sys    0m2.968s

Ok, there could be a very slow DNS resolver here on my side, but that wouldn't prevent ansible to put "forks" requests at a time. The good point is that we can see that forks=100 is faster that forks=5 for 73 hosts, which is expected.

Is the handling of DNS requests by ansible "costly" in term of resources, that would imply reducing "--forks" on the control machine?
Is it something that would alter the code path and induced some sort of lock or degrade parallelism compared to the same tasks/modules calls when dealing directly with IP addresses in inventory?

Brian Coca

unread,
Aug 5, 2015, 12:15:14 PM8/5/15
to Ansible Project
no, by resources I meant cpu, ram, bandwith, etc, slow dns resolution
might make some forks and the overall performance degrade but it
should not factor too much on how many ssh connections you can open at
the same time, unless you are using a single proc/thread resolver.


--
Brian Coca

Florent Dutheil

unread,
Aug 6, 2015, 4:38:23 AM8/6/15
to Ansible Project
Thanks for your suggestion Brian.

As I stated, I've looked at the CPU & RAM usage, they were perfectly fine.
I'll see if network is limiting anything, but I have some doubts: it's a entreprise wired LAN, and the logical behaviour of a bandwith limited ansible would be:
  • phase 1: many DNS requests, some of them would be slow, timeout maybe, and be reemitted
  • phase2: begin some SSH connection, but still struggling with DNS requests
  • phase 3: DNS requests have mainly been done, SSH connections now struggles to be established
But it's not what I've seen with wireshark  (see previous message, only small batches of dozens  of DNS requests at a time, after an initial phase with very few DNS requests).


The idea of a mono threaded resolver is really interesting. You mean at the control machine OS level?

Brian Coca

unread,
Aug 6, 2015, 11:03:30 AM8/6/15
to Ansible Project
The mono threaded resolver was a hypothetical, i doubt any really
exist, unless someone was debugging the resolver and forgot to revert
the concurrency settings.

what do you mean by 'struggles to be established'?


--
Brian Coca

anand...@greycampus.com

unread,
Aug 7, 2015, 1:04:51 AM8/7/15
to Ansible Project
Thanks you giving suggestion, I am having small issue that , I am having master node with 1GB ram so shall i extended to this ram to 2GB or more? 
Because  you mentioned that previosly posted my issue  is that cpu and ram usage.so shall i expanded to 2GB  enough or more than this?
while  check dmesg the following result i obtained.
[2038015.659615] show_signal_msg: 11 callbacks suppressed
[2038015.663931] python[10604]: segfault at 24 ip 0000000000558077 sp 00007ffc1ddaf460 error 6
[2038015.667147] python[10575]: segfault at 24 ip 0000000000558077 sp 00007ffe94bcdbe0 error 6python[10588]: segfault at 24 ip 0000000000557db8 sp 00007ffd66ad0630 error 6
[2038015.706367] python[10594]: segfault at 24 ip 0000000000558077 sp 00007ffcceee5500 error 6python[10586]: segfault at 24 ip 0000000000558077 sp 00007ffe0ba38250 error 6 in python2.7[400000+2bc000]
[2038015.720197]  in python2.7[400000+2bc000]
[2038015.770903] python[10625]: segfault at 24 ip 0000000000537388 sp 00007ffe765d3c80 error 6 in python2.7[400000+2bc000]
[2038015.826069]  in python2.7[400000+2bc000]
[2038015.828951]  in python2.7[400000+2bc000]
[2038015.872324] python[10592]: segfault at 24 ip 0000000000557db8 sp 00007ffd81a04310 error 6 in python2.7[400000+2bc000]
[2038015.920538] Core dump to |/usr/share/apport/apport 10592 11 0 10592 pipe failed
[2038015.933555] Core dump to |/usr/share/apport/apport 10586 11 0 10586 pipe failed
[2038015.950079]  in python2.7[400000+2bc000]
[2038523.352829] python[11774]: segfault at 24 ip 0000000000537388 sp 00007ffd64a8bfc0 error 6 in python2.7[400000+2bc000]
[2038523.360200] ------------[ cut here ]------------
[2038523.361416] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
[2038523.362618] invalid opcode: 0000 [#2] SMP 
[2038523.363800] Modules linked in: kvm_intel(X) kvm(X) crct10dif_pclmul(X) crc32_pclmul(X) ghash_clmulni_intel(X) aesni_intel(X) aes_x86_64(X) lrw(X) gf128mul(X) glue_helper(X) ablk_helper(X) cryptd(X) cirrus(X) ttm(X) serio_raw(X) drm_kms_helper(X) drm(X) syscopyarea(X) sysfillrect(X) sysimgblt(X) i2c_piix4(X) lp(X) mac_hid(X) parport(X) psmouse floppy pata_acpi
[2038523.364010] CPU: 0 PID: 11812 Comm: kworker/u2:1 Tainted: G      D     X 3.13.0-52-generic #85-Ubuntu
[2038523.364010] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[2038523.364010] task: ffff880027801800 ti: ffff880027810000 task.ti: ffff880027810000
[2038523.364010] RIP: 0010:[<ffffffff8117ab51>]  [<ffffffff8117ab51>] __get_user_pages+0x351/0x5e0
[2038523.364010] RSP: 0018:ffff880027811d20  EFLAGS: 00010246
[2038523.364010] RAX: 0000000000000040 RBX: 0000000000000017 RCX: 0000800000000000
[2038523.364010] RDX: 00007fffffe00000 RSI: 0000000008118173 RDI: ffff88000dac0780
[2038523.364010] RBP: ffff880027811db0 R08: ffffffff81c3f820 R09: 0000000000000001
[2038523.364010] R10: 0000000000000040 R11: ffff880009492640 R12: ffff88000dac0780
[2038523.364010] R13: ffff880027801800 R14: ffff88000ac1fc00 R15: 0000000000000000
[2038523.364010] FS:  0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[2038523.364010] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2038523.364010] CR2: 0000000000435470 CR3: 0000000001c0e000 CR4: 00000000001407f0
[2038523.364010] Stack:
[2038523.364010]  0000000000000000 0000000000000080 0000000000000000 ffff880027801800
[2038523.364010]  ffff880027811fd8 ffff880027801800 ffff880027811e38 0000000000000000
[2038523.364010]  0000000000000020 000000170936c888 0000000000000001 00007fffffffefdf
[2038523.364010] Call Trace:
[2038523.364010]  [<ffffffff8117ae32>] get_user_pages+0x52/0x60
[2038523.364010]  [<ffffffff811c4046>] copy_strings.isra.17+0x256/0x2e0
[2038523.364010]  [<ffffffff811c4104>] copy_strings_kernel+0x34/0x40
[2038523.364010]  [<ffffffff811c589c>] do_execve_common.isra.23+0x4fc/0x7e0
[2038523.364010]  [<ffffffff811c5b98>] do_execve+0x18/0x20
[2038523.364010]  [<ffffffff810806a8>] ____call_usermodehelper+0x108/0x170
[2038523.364010]  [<ffffffff811ef7c0>] ? generic_block_bmap+0x50/0x50
[2038523.364010]  [<ffffffff81080710>] ? ____call_usermodehelper+0x170/0x170
[2038523.364010]  [<ffffffff8108072e>] call_helper+0x1e/0x20
[2038523.364010]  [<ffffffff8173304c>] ret_from_fork+0x7c/0xb0
[2038523.364010]  [<ffffffff81080710>] ? ____call_usermodehelper+0x170/0x170
[2038523.364010] Code: 45 a8 48 85 c0 0f 85 01 ff ff ff 8b 45 bc 25 00 01 00 00 83 f8 01 48 19 c0 83 e0 77 48 2d 85 00 00 00 e9 e5 fe ff ff a8 02 75 ae <0f> 0b 0f 1f 44 00 00 48 8b 55 c8 48 81 e2 00 f0 ff ff f6 45 bc 
[2038523.364010] RIP  [<ffffffff8117ab51>] __get_user_pages+0x351/0x5e0
[2038523.364010]  RSP <ffff880027811d20>
[2038523.412198] ---[ end trace a315abcee87673a6 ]---
[2038523.433078] python[11666]: segfault at 24 ip 0000000000558077 sp 00007ffff9628860 error 6 in python2.7[400000+2bc000]
[2038523.441081] python[11779]: segfault at 24 ip 0000000000537388 sp 00007ffcacc2e900 error 6 in python2.7[400000+2bc000]
[2038523.478649] python[11368]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.486958] python[10938]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5bc0 error 6 in python2.7[400000+2bc000]
[2038523.492400] python[11313]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.494763] python[11431]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.500881] Core dump to |/usr/share/apport/apport 11313 11 0 11313 pipe failed
[2038523.502892] Core dump to |/usr/share/apport/apport 10938 11 0 10938 pipe failed
[2038523.508463] Core dump to |/usr/share/apport/apport 11368 11 0 11368 pipe failed
[2038523.601005] Core dump to |/usr/share/apport/apport 11431 11 0 11431 pipe failed
[2038523.819813] Core dump to |/usr/share/apport/apport 11666 11 0 11666 pipe failed
[2038523.843761] Core dump to |/usr/share/apport/apport 11779 11 0 11779 pipe failed
[2150747.189478] Request for unknown module key 'Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1' err -11
[2150747.217323] Request for unknown module key 'Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1' err -11
[2150747.225605] ip_tables: (C) 2000-2006 Netfilter Core Team
[2150747.236293] Request for unknown module key 'Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1' err -11
[2220566.325680] ------------[ cut here ]------------
[2220566.328031] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
[2220566.328031] invalid opcode: 0000 [#3] SMP 
[2220566.328031] Modules linked in: iptable_filter(X) ip_tables(X) x_tables(X) kvm_intel(X) kvm(X) crct10dif_pclmul(X) crc32_pclmul(X) ghash_clmulni_intel(X) aesni_intel(X) aes_x86_64(X) lrw(X) gf128mul(X) glue_helper(X) ablk_helper(X) cryptd(X) cirrus(X) ttm(X) serio_raw(X) drm_kms_helper(X) drm(X) syscopyarea(X) sysfillrect(X) sysimgblt(X) i2c_piix4(X) lp(X) mac_hid(X) parport(X) psmouse floppy pata_acpi
[2220566.328031] CPU: 0 PID: 22634 Comm: python Tainted: G      D     X 3.13.0-52-generic #85-Ubuntu
[2220566.328031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[2220566.328031] task: ffff88000e970000 ti: ffff88000f3e6000 task.ti: ffff88000f3e6000
[2220566.328031] RIP: 0010:[<ffffffff8117ab51>]  [<ffffffff8117ab51>] __get_user_pages+0x351/0x5e0
[2220566.328031] RSP: 0018:ffff88000f3e7d40  EFLAGS: 00010246
[2220566.328031] RAX: 0000000000000040 RBX: 0000000000000017 RCX: 0000800000000000
[2220566.328031] RDX: 00007fffffe00000 RSI: 0000000008118173 RDI: ffff88003b3f8540
[2220566.328031] RBP: ffff88000f3e7dd0 R08: ffffffff81c3f820 R09: 0000000000000001
[2220566.328031] R10: 0000000000000040 R11: ffff880004ac4740 R12: ffff88003b3f8540
[2220566.328031] R13: ffff88000e970000 R14: ffff880004a10a80 R15: 0000000000000000
[2220566.328031] FS:  00007fe4f2285740(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[2220566.328031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2220566.328031] CR2: 00000000187a0000 CR3: 0000000004b56000 CR4: 00000000001407f0
[2220566.328031] Stack:
[2220566.328031]  0000000000000000 0000000000000080 0000000000000000 ffff88000e970000
[2220566.328031]  ffff88000f3e7fd8 ffff88000e970000 ffff88000f3e7e58 0000000000000000
[2220566.328031]  0000000000000020 0000001701958b40 0000000000000001 00007fffffffefea
[2220566.328031] Call Trace:
[2220566.328031]  [<ffffffff8117ae32>] get_user_pages+0x52/0x60
[2220566.328031]  [<ffffffff811c4046>] copy_strings.isra.17+0x256/0x2e0
[2220566.328031]  [<ffffffff811c4104>] copy_strings_kernel+0x34/0x40
[2220566.328031]  [<ffffffff811c589c>] do_execve_common.isra.23+0x4fc/0x7e0
[2220566.328031]  [<ffffffff811c5e16>] SyS_execve+0x36/0x50
[2220566.328031]  [<ffffffff817336a9>] stub_execve+0x69/0xa0
[2220566.328031] Code: 45 a8 48 85 c0 0f 85 01 ff ff ff 8b 45 bc 25 00 01 00 00 83 f8 01 48 19 c0 83 e0 77 48 2d 85 00 00 00 e9 e5 fe ff ff a8 02 75 ae <0f> 0b 0f 1f 44 00 00 48 8b 55 c8 48 81 e2 00 f0 ff ff f6 45 bc 
[2220566.328031] RIP  [<ffffffff8117ab51>] __get_user_pages+0x351/0x5e0
[2220566.328031]  RSP <ffff88000f3e7d40>
[2220568.259276] ---[ end trace a315abcee87673a7 ]---
[2220568.453180] python[21669]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.496568] python[21668]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.635851] python[21687]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.647618] python[21653]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.858297] python[21657]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.869836] python[21801]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.890070] python[22088]: segfault at 24 ip 00000000005377c7 sp 00007ffc4ac1ba70 error 6 in python2.7[400000+2bc000]
[2220569.041938] Core dump to |/usr/share/apport/apport 21668 11 0 21668 pipe failed
[2220569.088505] Core dump to |/usr/share/apport/apport 21669 11 0 21669 pipe failed
[2220569.194507] python[21679]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220569.208311] python[21821]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220569.234079] Core dump to |/usr/share/apport/apport 22088 11 0 22088 pipe failed
[2220569.252954] Core dump to |/usr/share/apport/apport 21657 11 0 21657 pipe failed
[2220569.388225] python[21692]: segfault at 24 ip 00000000004c4bce sp 00007ffc4ac1bea0 error 6 in python2.7[400000+2bc000]
[2220569.390181] Core dump to |/usr/share/apport/apport 21801 11 0 21801 pipe failed
[2220569.536051] Core dump to |/usr/share/apport/apport 21653 11 0 21653 pipe failed
[2220569.939338] Core dump to |/usr/share/apport/apport 21679 11 0 21679 pipe failed
[2220570.153310] Core dump to |/usr/share/apport/apport 21692 11 0 21692 pipe failed
 what does this error mean?


And how to upgrade ansible 1.9.2  version to latest 2.0 version?

Brian Coca

unread,
Aug 7, 2015, 1:07:53 AM8/7/15
to Ansible Project
I don't know if this is a lack of memory, that normally gets a kernel
message mentioning killing off processes, this looks like something
much worse that is causing segfaults all over.

>[2220566.328031] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
>[2220566.328031] invalid opcode: 0000 [#3] SMP

looks like some nasty kernel bug related to memory allocation.


--
Brian Coca

anand...@greycampus.com

unread,
Aug 7, 2015, 1:40:55 AM8/7/15
to Ansible Project
How to upgrade to ansible latest version?  and how to solve the the following forks issue? Because i am  very much struggling this issue?
ansible-playbook ssh.yml --force-handlers --forks=100

PLAY [Transfer and execute a script.] ***************************************** 

TASK: [Transfer the script] *************************************************** 
changed: [dsrv493 -> 127.0.0.1]
changed: [dsrv487 -> 127.0.0.1]
changed: [dsrv486 -> 127.0.0.1]
changed: [dsrv209 -> 127.0.0.1]
changed: [dsrv488 -> 127.0.0.1]
changed: [dsrv531 -> 127.0.0.1]
Process SyncManager-1:

Traceback (most recent call last):
 File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
   self.run()
 File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
   self._target(*self._args, **self._kwargs)
 File "/usr/lib/python2.7/multiprocessing/managers.py", line 558, in _run_server
   server.serve_forever()
 File "/usr/lib/python2.7/multiprocessing/managers.py", line 184, in serve_forever
   t.start()
 File "/usr/lib/python2.7/threading.py", line 745, in start
   _start_new_thread(self.__bootstrap, ())
error: can't start new thread

Traceback (most recent call last):
 File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 85, in _executor_hook
Process Process-85:

Traceback (most recent call last):
 File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
 File "/usr/bin/ansible-playbook", line 324, in <module>
   self.run()
   sys.exit(main(sys.argv[1:]))
 File "/usr/bin/ansible-playbook", line 264, in main

 File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
   self._target(*self._args, **self._kwargs)
 File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 81, in _executor_hook
   result_queue.put(return_data)
   pb.run()
 File "/usr/lib/pymodules/python2.7/ansible/playbook/__init__.py", line 348, in run

 File "<string>", line 2, in put
   if not self._run_play(play):
 File "/usr/lib/pymodules/python2.7/ansible/playbook/__init__.py", line 789, in _run_play

     File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
if not self._run_task(play, task, False):
 File "/usr/lib/pymodules/python2.7/ansible/playbook/__init__.py", line 497, in _run_task
   results = self._run_task_internal(task, include_failed=include_failed)
 File "/usr/lib/pymodules/python2.7/ansible/playbook/__init__.py", line 439, in _run_task_internal
   results = runner.run()
 File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 1485, in run
   Process Process-86:

   while not job_queue.empty():
 File "<string>", line 2, in empty
 File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
   conn.send((self._id, methodname, args, kwds))
results = self._parallel_exec(hosts)
 File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 1393, in _parallel_exec

IOError: [Errno 32] Broken pipe
   prc.start()
 File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
   self._connect()
 File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
   conn = self._Client(self._token.address, authkey=self._authkey)
     File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client

Traceback (most recent call last):
self._popen = Popen(self)
 File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in __init__

 File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
changed: [dsrv449 -> 127.0.0.1]

 Can you please tell how to solve this issue?  Because this makes me lot of issues while running in this 100 servers.

Brian Coca

unread,
Aug 7, 2015, 9:35:26 AM8/7/15
to Ansible Project
I'm not sure your issues are ansible related, just triggered by
ansible forks. I don't think upgrading to the lastest version will
solve anything for you, you need to track down why your kernel is
hitting those segfault issues.
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/85b11815-3a8e-451c-a532-5263938a090b%40googlegroups.com.

Florent Dutheil

unread,
Aug 10, 2015, 7:36:45 AM8/10/15
to Ansible Project
@Anand: you clearly have issues with your control host (hadware/OS). Try testing hardware and get rid of these messages before getting another try with ansible (or any software) on this host.

@Brian:
TL;DR: Good news: I have no DNS issue anymore. Bad news: It seems the root cause of my previous observations is that ansible has issues when dealing with errors related to hosts (hosts unreachable, etc), and it hurts its parallelism very badly.


1- Not a DNS issue

I got rid of all faulty hosts and ran a new bunch of tests. Execution times are now identical between an inventory only with hostnames and an inventory with explicit IP addresses in ansible_ssh_host variable. It seems that was environmental. I'll keep an eye on it on my side.
[reminder: still with pipeling enabled but ControlMaster disabled]
  • $ time ansible all -i inventory_without_ip -m ping
    real    0m9.145s
    user    0m8.239s
    sys    0m2.787s
  • $ time ansible all -i inventory_with_ip -m ping
    real    0m9.040s
    user    0m7.570s
    sys    0m2.723s

I additionaly checked the items you suggested:

  • DNS resolution seems fine (on one host, there is a local DNS cache, on the other host, it does directly interact with DNS servers):
    • time dig @127.0.1.1 -f inventory.yml --> real    0m0.114s
    • time dig @datacenter_dns_server -f inventory.yml --> real    0m0.118s
    • conclusion: so none of the control machines has DNS issues.
  • RAM: no problem there (one control host has 1GB, 0 swap used during tests, the other has 8GB).
  • CPU: one core can briefly be maxed out, but most of the time, CPU usage is <10% of one core. That explains only the slight delta on execution time between my 2 control hosts (first has 1 core, the other one 4).

Anyway, DNS topic closed.



2- Ansible handling hosts errors


What I can easily see though, is that ansible is not handling errors gracefully and that hurts a lot the intended parallelism: there is a time penalty for each host being in error (unreachable, etc). To simulate this, I added fake hosts entries in my inventories (format: '<non_existing_hostname> ansible_ssh_host=<unreachable IP outside of the network>') , and that gives the following results (forks=100, everything  should be executed in parallel)

  • 0 fake hosts: time ansible all -i inventory_test -m ping
    real    0m8.942s
    user    0m7.208s
    sys    0m2.830s
  • 1 fake hosts: time ansible all -i inventory_test -m ping
    real    0m18.951s
    user    0m7.337s
    sys    0m2.733s
    SSH timeout is set to 10 in ansible.cfg. It seems to had 10 secs to previous execution time. Ok, fair enough.
  • 2 fake hosts: time ansible all -i inventory_test -m ping
    real    0m21.471s
    user    0m7.720s
    sys    0m2.910s
    Now there is something weird.
  • 3 fake hosts: time ansible all -i inventory_test -m ping
    real    0m31.229s
    user    0m7.600s
    sys    0m2.832s
    Ouch!
  • 4 fake hosts: time ansible all -i inventory_test -m ping
    real    0m41.139s
    user    0m7.591s
    sys    0m2.847s
    Ok, there is a pattern now.
  • 5 fake hosts: time ansible all -i inventory_test -m ping
    real    0m51.172s
    user    0m7.563s
    sys    0m2.939s
    It's confirmed.
That is not what I expected: the time of the whole run should not increase past a certain value (the maximum time between the slowest host and a host in error). Each additional faulty host adds like the whole timeout time, which strongly suggests some sequential algorithm, not something going into parallel threads.


3- Conclusion

That may explain my original observations: faulty hosts increase execution time linearly, showing a "serial/sequential" behaviour instead of treating all hosts at a time.

Finally consistent facts to work on! :)


Brian, are you able to reproduce this?



Regards,


Florent.

Brian Coca

unread,
Aug 10, 2015, 10:15:17 AM8/10/15
to Ansible Project
When dealing with the first contacts with hosts, ansible must deal
with it sequentially as it might need to update the known_hosts file,
otherwise the file will be corrupted, that is probably what you are
seeing with your 'invalid hosts'.


--
Brian Coca

Florent Dutheil

unread,
Aug 10, 2015, 11:37:05 AM8/10/15
to Ansible Project
Thank you Brian for the explanation.
Reply all
Reply to author
Forward
0 new messages