Memory usage and requirements - oom-kill victim

515 views
Skip to first unread message

Robert Lupinek

unread,
Oct 8, 2015, 8:11:15 PM10/8/15
to Ansible Project
Hey guys,

    I have a pretty simple question.  What is the expected memory usage for a simple playbook that performs a yum install or an rpm and pushes a config file?

Ansible Host Data:
   Architecture type: VMware
   Memory: 24GB
   CPU: x4 > 2.3 Ghz
   OS RHEL 7.1
Hosts:
  400 RHEL 5 and 6 servers

Ansible Config:
      Settings = Defaults with 10 Forks

Ansible Playbook:
       I could really be anything!  We have tried just a ping!

We see that if run any playbook that we eventually consume all of the systems memory and oom-kill kicks in and kicks the ansible-playbook running.

Is this normal behavior?  Is there something simple in tuning that I am missing? 
The tower guide suggests 4gb per 100Fork.
I read through the scaling guide, and it didn't mention anything for memory tuning.

Any thoughts or criticisms would be GREATLY appreciated.   I love me some ansible and I want to start handing off large jobs to our operations team soon!
Thanks in advance!

Brian Coca

unread,
Oct 8, 2015, 8:15:07 PM10/8/15
to Ansible Project
Are you using ansible idirectly or tower? I don't think in either case
that usage is normal, seems like a memory leak somewhere.

if it is a tower issue please email sup...@ansible.com or go to
support.ansible.com



--
Brian Coca

Robert Lupinek

unread,
Oct 8, 2015, 9:43:13 PM10/8/15
to Ansible Project
I am sadly not a Tower user, yet.  It feels like a memory leak, and I have installed ansible both on RHEL6.6 via rpm and 7.1 via pip with the same results.  This is interesting only because I am using default Python versions 2.6 and 2.7 respectively.

When I get back to work tomorrow I will report my Python and ansible versions more specifically.

I suppose I can play with this too:
https://us.pycon.org/2014/schedule/presentation/165/

I have played with the ansible python module seeing the same results, so I started presenting the inventory in chunks of 10 servers at a time.  Knowing that I can break it in Python code is good news.  I just have to learn how to trace memory allocation in Python.  Learning anything in Python is always fun work for me.

Thanks for responding so dang quick!

Robert Lupinek

unread,
Oct 9, 2015, 10:48:41 AM10/9/15
to Ansible Project
Ok,  

    On the RHEL 7 host my versions are as follows:

1. ansible =ansible 1.9.3
2. PythoPython = 2.7.5 

  On the RHEL 6 host my versions are as follows:

1. ansible = ansible 1.9.2
2. Python = Python 2.6.6 

I will report back once I can start monitoring malloc.

Brian Coca

unread,
Oct 9, 2015, 11:21:08 AM10/9/15
to Ansible Project
can you show the play that is producing this issue? Plenty of people
use the yum module with those versions and do not report such an
issue.



--
Brian Coca

Serge van Ginderachter

unread,
Oct 9, 2015, 11:23:57 AM10/9/15
to ansible...@googlegroups.com
In my experience, OOM's happen when the inventory is large, especially when having lots of variables in the inventory, combined with lots (several hundreds) of hosts.

Not very scientific, but to get a rough idea on the size of your inventory, could you show the output of


$ time (ansible all -m debug -a var=hostvars |wc -l ; ansible all --list-hosts |wc -l)
19788298
1046

real    2m3.222s
user    6m54.500s
sys    0m7.772s


Mine would yield 19.788.298 hostvars lines for 1046 hosts.

That used to be a lot more, before I made some optimisations on a dynamic inventory script that calculated a bunch of variables:

$ ansible all -m debug -a var=hostvars |wc -l ; ansible all --list-hosts |wc -l
95924259
1037
(that last job took 17 minutes to run, on an i7 quadcore laptop )




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/977bc078-311d-4ab5-9796-5bf9e108c228%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

James Cammarata

unread,
Oct 9, 2015, 2:32:28 PM10/9/15
to ansible...@googlegroups.com
Yes, this appears to be similar to the bug SVG pointed out, which I've tracked down to being related to the way python queues use pickle to serialize dictionary data (the resulting size of the data can be 200x than it was in the dictionary before). I'm currently working on a solution to this, and hope to include it in the next beta round.

James Cammarata
Director, Ansible Core Engineering
github: jimi-c

Robert Lupinek

unread,
Oct 9, 2015, 2:34:30 PM10/9/15
to Ansible Project
Enter code here...


---
- hosts: all
  remote_user
: root
  serial
: 20
  tasks
:
 
- name: Install nscd
    yum
: pkg=nscd state=latest disable_gpg_check=no
 
- name: Transfer the conf file
    copy
: src=files/etc/nscd.conf dest=/etc/nscd.conf mode=0644 backup=yes
 
- name: Configure service...
    service
: name=nscd state=restarted enabled=yes
 
- name: Ensure nscd is started at boot time
    action
: command /sbin/chkconfig nscd on
 
- name: deploy standard resolv.conf
    copy
: src=files/etc/resolv.conf dest=/etc/resolv.conf mode=0644 backup=yes




My files are pretty tiny:

mst_nscd]# ls -la files/etc/
total
16
drwxr
-xr-x 2 root root 4096 Oct  2 18:58 .
drwxr
-xr-x 3 root root 4096 Aug 31 18:21 ..
-rw-r--r-- 1 root root  833 Sep  1 20:40 nscd.conf
-rw-r--r-- 1 root root   96 Oct  2 15:07 resolv.con


I am running the /sbin/chkconfig nscd on because the enabled = is not working because I assumed I was running into this: https://github.com/ansible/ansible-modules-core/issues/237.

Thank you all for the kind responses!
Reply all
Reply to author
Forward
0 new messages