Intelligent loop for yum packages installation

4,569 views
Skip to first unread message

Steven Truong

unread,
Sep 6, 2013, 8:36:04 PM9/6/13
to ansible...@googlegroups.com
Hi all,

I am running into this performance issue with installing a bunch of packages through yum.

- name: install additional requirements
  yum: name=$item state=present
  with_items:
  - vim-enhanced
  - readline
  - readline-devel
  - ncurses-devel
  - gdbm-devel
  - glibc-devel
  - tcl-devel
  - openssl-devel
  - curl-devel
  - expat-devel
  - db4-devel
  - byacc
  - sqlite-devel
  - gcc-c++
  - libyaml
...

It appeared to me that that doing this will mean packages are installed one by one.

In my case, I have less than 30 packages I want to install and I sat there for a good 10/20 minutes for this to happen.  I saw /usr/bin/repoquery ran twice for each package and using "rpm -qa --last" I can confirmed for certainty that packages were installed one by one.

Are there better ways to do this?  There must be ways to install these packages in one fell swoop in Ansible.

Please share your thoughts on this or is this something that can be improved with a new/current module.

Thank you very much,
Steven.

Timothy Gerla

unread,
Sep 6, 2013, 8:37:15 PM9/6/13
to ansible-project
Hi Steven,

The yum and apt modules are smart enough to collapse with_items lists
into single transactions. But I've seen yum behave very slowly anyway:
a common problem is the "fastestmirror" plugin which ironically tends
to make things slower. Try disabling that plugin and see if that
speeds things up.

-Tim
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Tim Gerla
t...@gerla.net

Michael DeHaan

unread,
Sep 7, 2013, 8:46:34 AM9/7/13
to ansible...@googlegroups.com
I don't think fastest mirror should take 10-20 minutes under any case. 

It seems most likely you were waiting for a yum lock to clear, perhaps you had PackageKit installed?  (I always remove PackageKit).

This is unrelated, but I should point out you are still using old style variables.

Do this as follows:

  yum: name={{ item }} state=present

(Old style variables will be removed in a future ansible, the date is not set yet).

I should point out too that if you are still on Ansible 1.2.X or before, there are some performance speedups to yum operations in 1.3.   These are significant, but again 10-20 minutes would be unexpected unless you hit
a very slow mirror.

For people doing datacenter updates, I always recommend people considering creating a local mirror with reposync -- which is also good for not being surprised by upstream content changes and helps your machines be more consistent -- or at least run a cache with something like apt-cacher-ng.





Michael DeHaan <mic...@ansibleworks.com>
CTO, AnsibleWorks, Inc.
http://www.ansibleworks.com/

Byron Schlemmer

unread,
Sep 13, 2013, 6:22:55 AM9/13/13
to ansible...@googlegroups.com
I've seeing the same behaviour. Waiting for yum tasks to complete take ages especially when in a list.

I've not looked at the code so not sure why this is happening but output from a ps auwwx every second shows the following when installing expect for example:

root     32466   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --disablerepo=* --pkgnarrow=installed --qf %{name}-%{version}-%{release}.%{arch} expect
root     32471   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root     32471   11:08   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root     32471   11:08   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root     32481   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root     32481   11:08   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root     32481   11:08   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root     32491   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root     32491   11:08   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root     32491   11:08   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root     32501   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root     32501   11:08   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root     32501   11:08   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root     32501   11:08   0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root     32514   11:08   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root     32514   11:08   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root     32514   11:08   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root     32514   11:08   0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root     32527   11:09   0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root     32527   11:09   0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root     32527   11:09   0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root     32527   11:09   0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64

So that's 22 seconds for one package. So if you have 20 packages or so assuming there are only a few versions in your repo you are looking at several minutes of package checking.

While the initial repo queries make sense I'm not sure I understand the iteration through each version?

Michael DeHaan

unread,
Sep 13, 2013, 9:30:02 AM9/13/13
to ansible...@googlegroups.com
Please upgrade to Ansible 1.3 if you haven't already, it invokes repoquery a lot less.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Byron Schlemmer

unread,
Sep 13, 2013, 11:12:25 AM9/13/13
to ansible...@googlegroups.com
Michael,

that was with 1.3. The common case for us is that yum with local cache would be fine but this bug:


seems to be forcing or hand in having to use repoquery (which would probably be the better long term solution barring this performance issue).

Michael DeHaan

unread,
Sep 13, 2013, 7:01:39 PM9/13/13
to ansible...@googlegroups.com
Ok, can you just install yum-utils?


Dan C

unread,
Sep 19, 2013, 12:01:54 PM9/19/13
to ansible...@googlegroups.com
Are there any news about this issue?

I just checked with my ansible 1.3.1 and it seems to me that it is still installing every package alone, one by one doing:

- name: Install basic packages
  yum: pkg={{ item }} state=latest
  with_items:
    - vim-enhanced
    - curl
    - git
    - java-1.7.0-openjdk
    - make
    - diffutils
    - man
    - policycoreutils
    - htmldoc

Doing so, if I check the processes in "top" I can see that yum is installing every package individually.
If instead I use "shell" like follows:

- name: Install basic packages
  shell: yum install -y vim-enhanced curl git java-1.7.0-openjdk make diffutils man policycoreutils htmldoc

I can obviously see in "top" the yum command installing the packages alltogether.

Anyway. I timed how log both ways take. I did it in a really small virtual machine with vagrant.
The "yum" module way lasted 10min 17s, the "shell" way 8min 18s . Depending on how many packages to be installed this can be a big deal! For example in an autoscale environment... 

Michael DeHaan

unread,
Sep 19, 2013, 9:18:37 PM9/19/13
to ansible...@googlegroups.com
Sounds like you don't have yum-utils installed maybe?


Dan C

unread,
Sep 20, 2013, 3:57:58 AM9/20/13
to ansible...@googlegroups.com
Yes I have it in both guest and host. While the host is a ScientificLinux the guest is a Debian, but I have yum-utils installed on both.

Jesse Keating

unread,
Sep 20, 2013, 6:13:06 PM9/20/13
to ansible...@googlegroups.com
On Sep 19, 2013, at 9:01 AM, Dan C <dco...@gmail.com> wrote:
>
> Are there any news about this issue?
>
> I just checked with my ansible 1.3.1 and it seems to me that it is still installing every package alone, one by one doing:
>
> - name: Install basic packages
> yum: pkg={{ item }} state=latest
> with_items:
> - vim-enhanced
> - curl
> - git
> - java-1.7.0-openjdk
> - make
> - diffutils
> - man
> - policycoreutils
> - htmldoc
>
> Doing so, if I check the processes in "top" I can see that yum is installing every package individually.
> If instead I use "shell" like follows:
>
> - name: Install basic packages
> shell: yum install -y vim-enhanced curl git java-1.7.0-openjdk make diffutils man policycoreutils htmldoc
>
> I can obviously see in "top" the yum command installing the packages alltogether.
>
> Anyway. I timed how log both ways take. I did it in a really small virtual machine with vagrant.
> The "yum" module way lasted 10min 17s, the "shell" way 8min 18s . Depending on how many packages to be installed this can be a big deal! For example in an autoscale environment...

Reading the 1.3.1 code, it appears that state=latest calls into the latest() function.

This function will do a loop /per package/ to determine if the package is installed, and if it needs updates or to be installed, and then will execute that action (install or update). Again, the code does this /per package/.

The intelligence that Ansible uses to collapse lists of packages into single actions, is a matter of single /yum module/ action, not necessarily a single yum action within that module. If Ansible didn't collapse things, you'd get one set of ssh to host, execute yum module, send results back, per-package. Now you get one ssh out, one yum module execution (with perhaps many yum executions within), and one set of results.

Some work could be done within the latest() function and install() function and others like that within the module to build up an install set of packages and run the command with the full set, rather than running the command once per-package.

It also appears that the way things are now, a yum module execution with a list of packages can have a status of /both/ failed and changed. Not sure if that is noteworthy or something that happens in other modules too.

-jlk

Michael DeHaan

unread,
Sep 21, 2013, 9:22:40 AM9/21/13
to ansible...@googlegroups.com
Right, I'm talking about the individual SSH steps being batched.   Sorry for confusion.

Attempts to improve this would be welcome.


--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dan C

unread,
Sep 23, 2013, 9:25:54 AM9/23/13
to ansible...@googlegroups.com
Yeah, I think what I observed adjusts perfectly to what you exposed.
What I could say as a suggestion to solve this particular problem is that, maybe it is not really necessary to check if a package is installed or not as yum itself will do nothing in case it is already installed. I suppose it is a matter of "time". I mean, I don't know what takes longer, checking if a package is already installed or just try to install it and if its installed do nothing. As far as I know failing to found a package yum exits with "1", but trying to install an already installed package exits with "0". In this case the problem is that in the case of a packages list, if some already exists (or get newly installed) and some fail (don't exist), yum exits with "0".

Anyway, thanks guys!

Michael DeHaan

unread,
Sep 23, 2013, 1:28:51 PM9/23/13
to ansible...@googlegroups.com
I think it's important for resources to be idempotent and be able to report change detection appropriately, and not attempt operations unless they need to be attempted.



--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dan C

unread,
Sep 24, 2013, 9:01:53 AM9/24/13
to ansible...@googlegroups.com
I understand your point. Change detection is useful. Anyway, I can think of some scenarios in which I would rather prefer speed on installing than getting the change report, at the end, from my sysadmin point of view, all I really care is certain packages to be installed, knowing if they have been just installed by ansible or if they where already in the system is secondary for me.
I really didn't read the code, so maybe I am wrong, but I don't think yum itself tries to install a package that already exist. I suppose yum checks first and attempt the install later. If it is like that (I repeat that I don't really know it), wouldn't it be redundant having ansible to check it fist?

I hope to find time to read the code and help if I can in someway, I think package installation is so common and speed can be crucial.

Michael DeHaan

unread,
Sep 24, 2013, 8:34:49 PM9/24/13
to ansible...@googlegroups.com
While it is of course true that yum won't need to reinstall things, we are pretty well set on only attempting the underlying system operations for change that need to occur as a general principle of idempotency, and this allows us to get finer grained data out of things.

I'm all for considerations of improvements but the yum module was arrived at a LOT of work from folks like Seth over a long period, and since yum_rhn_plugin has frequently been a total <censored> I feel it's probably best to not tempt fate.

That being said, more than happy to entertain patches that come with very extensive testing on EL 5 and 6.





Steven Truong

unread,
Oct 4, 2013, 5:42:14 PM10/4/13
to ansible...@googlegroups.com
With what have been said on this topics, I believe that this will only encourage people to avoid using yum module for operations on list of rpm packages.  From now on, I might just do this instead of the yum module:

- name: install a bunch of rpms
  shell: yum -y install abc.rpm abdddd.rpm .... 

Instead of sitting around waiting more than needed for the taks.

Happy Friday,
Steven.

Michael DeHaan

unread,
Oct 4, 2013, 7:35:07 PM10/4/13
to ansible...@googlegroups.com
I don't believe it discourages anyone, lots of users of the yum module.

That all being said, quite open to improvements and pull requests.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex Rodenberg

unread,
Oct 18, 2013, 8:24:07 AM10/18/13
to ansible...@googlegroups.com
I just ran into this issue, and found this topic :(

I have decided to do a state=installed instead of state=latest.

And then run yum update in shell separately.

it took me almost 15 minutes to do state=latest for 20 packages. With state=installed, it took 1 - 2 minutes.

This gets even worse if you include EPEL or RPMforge into the repo check.

Michael DeHaan

unread,
Oct 18, 2013, 9:31:35 AM10/18/13
to ansible...@googlegroups.com
Very curious that it would take a minute per package though.

Sounds like something is wrong upstream.


Dylan Martin

unread,
Oct 18, 2013, 3:46:20 PM10/18/13
to ansible...@googlegroups.com
Is yum on the client configured to keep cache & metadata?  If it's throwing all that out, it can take a really long time for each run.  Check for keepcache, metadata_expire and mirrorlist_expire and possibly others in your /etc/yum.conf as well as each repo's config on the client.

Frederick Yankowski

unread,
May 9, 2014, 12:10:54 PM5/9/14
to ansible...@googlegroups.com
Thank you, Alex!  This works great for me in RHEL 6.5. It had been taking several minutes to run "yum state=latest" for just four packages (already installed). Now it finishes in just a few seconds.

I was getting close to giving up on ansible because of the huge delays in every "yum" step.  All the repoquery calls done on the managed server were taking forever, even with a local RHEL mirrors and yum caching enabled.

Michael DeHaan

unread,
May 9, 2014, 1:33:58 PM5/9/14
to ansible...@googlegroups.com
If you are seeing long times with state=latest, I'd suspect you maybe have RHN to blame.  I always recommend a local mirror if so.   If you aren't having RHN involved, perhaps you are still using a slow mirror.

 

Frederick Yankowski

unread,
May 12, 2014, 9:20:41 AM5/12/14
to ansible...@googlegroups.com
We set up a local RHEL mirror just this week, thinking that that would speed up ansible. It didn't.

Running "yum: state=latest" entails running repoquery numerous times for each package involved, based on snooping with "ps". A typical such repoquery takes about two seconds on my managed server

Michael DeHaan

unread,
May 12, 2014, 1:10:53 PM5/12/14
to ansible...@googlegroups.com
Ok, thanks for info.

Probably something that should be addressed on yum's end as we do need the latest information to decide whether to run certain commands or not.

While it is possible to just run the command and report back what it changed (or didn't), that's not Ansible's way when it can avoid it, and also prevents some check mode capabilities.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

Dmitry Makovey

unread,
May 13, 2014, 12:47:02 AM5/13/14
to ansible...@googlegroups.com
I realize that this will probably qualify for non-Ansible way, but here's how I had to resolve similar issue:

1. split playbook into "install" and "setup" whereas install is ran only once in a while if you need to change list of packages installed etc. - resolves issues with RHN etc. setup can be run anytime with any frequency with no ill side-effects
2. For the "setup" I've slapped together https://github.com/droopy4096/ansible-libs/blob/master/library/rpm_query which I use in place of "yum" actions. This just adds a bit of "insurance" to the "setup" playbook, confirming that packages indeed are there. 

while not perfect it gets around all those slow bits of yum, and in our environment packages get to machines sideways (no direct access to internet) thus above split allows for multiple implementations of "install" while keeping the same "setup". 

Balamurugan Ramasamy

unread,
Mar 26, 2015, 12:35:47 PM3/26/15
to ansible...@googlegroups.com

Hi Frederick/Dylan

I have the same problem. The repo query is being made to every single build of a package, thus making the yum install several mins.
I did set the keepcache to 1 in /etc/yum.conf. but that did not fix the issue. Could you share what those settings exactly are?


Thanks
Bala
Reply all
Reply to author
Forward
0 new messages