Re: [ansible-project] subset/limit option in play definition (for applying a play to an intersection of host groups)

599 views
Skip to first unread message

Michael DeHaan

unread,
Dec 7, 2012, 11:07:29 AM12/7/12
to ansible...@googlegroups.com
On Fri, Dec 7, 2012 at 8:50 AM, Michael Liddle <lidd...@gmail.com> wrote:
> Hi there,
>
> I'm wondering if it is planned (or already possible) to allow host
> limits/subsets to be specified in individual plays.
>
> E.g. if I have a webservers group, a debianservers group and a redhatservers
> group, it would be useful to have two plays, one to install apache (say)
> using an apt task and one to install it via yum. In the following example I
> use the "limit" parameter as that matches the command line switch in
> ansible-playbook. In the code the keyword "subset" seems to be preferred:

I have no problem with patches to allow limit groups to be set in
play. I could see this being particularly useful when used
with the group_by module.

If the CLI --limit flag is used it should only set the limit for those
plays that do not have a limit set.


>
> ---
> - hosts: webservers
> - limit: debian
> - tasks:
> - apt: ...
>

FYI -- Putting a dash in front of everything above is incorrect, as a
dash indicates the start of a new list entry.


> - hosts: webservers
> - limit: redhat
> - tasks:
> - yum: ...
>
>
> Please note that this is just an example, and that the real question is how
> to have a play run on an intersection of host groups. At the moment union
> (with group1:group2) and difference (with group1:!group2) are possible, and
> it's only intersection that is lacking.

Agreed

>
> --
>
>

Michael Liddle

unread,
Dec 7, 2012, 11:33:16 AM12/7/12
to ansible...@googlegroups.com


On Friday, December 7, 2012 5:07:29 PM UTC+1, Michael DeHaan wrote: 
I have no problem with patches to allow limit groups to be set in
play.  I could see this being particularly useful when used
with the group_by module.

OK, I'll look into implementing something. Should I stick with the "limit" keyword?
 
If the CLI --limit flag is used it should only set the limit for those
plays that do not have a limit set.

Should it not be that the two instances of limit are combined (intersected with each other)?

I.e. if I have both debian and redhat webservers in London and New York, and I run the above plays with CLI --limit london, that shouldn't be ignored should it? Rather only debian and redhat webservers in london should be touched...
 
FYI -- Putting a dash in front of everything above is incorrect, as a
dash indicates the start of a new list entry.

Ah, typo(s) :)
 

Serge van Ginderachter

unread,
Dec 7, 2012, 11:36:31 AM12/7/12
to ansible...@googlegroups.com
On 7 December 2012 17:33, Michael Liddle <lidd...@gmail.com> wrote:
>> If the CLI --limit flag is used it should only set the limit for those
>> plays that do not have a limit set.
>
>
> Should it not be that the two instances of limit are combined (intersected
> with each other)?
>
> I.e. if I have both debian and redhat webservers in London and New York, and
> I run the above plays with CLI --limit london, that shouldn't be ignored
> should it? Rather only debian and redhat webservers in london should be
> touched...

That sounds more more logical to me, too.

Michael DeHaan

unread,
Dec 7, 2012, 11:38:33 AM12/7/12
to ansible...@googlegroups.com
Yeah, keeping with the CLI keywords seems best.

On Fri, Dec 7, 2012 at 11:33 AM, Michael Liddle <lidd...@gmail.com> wrote:
>
>
> On Friday, December 7, 2012 5:07:29 PM UTC+1, Michael DeHaan wrote:
>>
>> I have no problem with patches to allow limit groups to be set in
>> play. I could see this being particularly useful when used
>> with the group_by module.
>
>
> OK, I'll look into implementing something. Should I stick with the "limit"
> keyword?
>
>>
>> If the CLI --limit flag is used it should only set the limit for those
>> plays that do not have a limit set.
>
>
> Should it not be that the two instances of limit are combined (intersected
> with each other)?

It should work EXACTLY like the command line, in fact, all this would
be doing would be sourcing the data from the play object rather than
the CLI value, if one was set.


>
> I.e. if I have both debian and redhat webservers in London and New York, and
> I run the above plays with CLI --limit london, that shouldn't be ignored
> should it? Rather only debian and redhat webservers in london should be
> touched...

that is pretty much what it does, right? :)

Dag Wieers

unread,
Dec 7, 2012, 11:39:07 AM12/7/12
to ansible...@googlegroups.com
On Fri, 7 Dec 2012, Michael Liddle wrote:

> I'm wondering if it is planned (or already possible) to allow host
> limits/subsets to be specified in individual plays.
>
> E.g. if I have a webservers group, a debianservers group and a
> redhatservers group, it would be useful to have two plays, one to install
> apache (say) using an apt task and one to install it via yum. In the
> following example I use the "limit" parameter as that matches the command
> line switch in ansible-playbook. In the code the keyword "subset" seems to
> be preferred:
>
> ---
> - hosts: webservers
> - limit: debian
> - tasks:
> - apt: ...
>
> - hosts: webservers
> - limit: redhat
> - tasks:
> - yum: ...
>
> At the moment it seems that I would have to do this either by specifying
> debianwebservers and redhatwebservers subgroups, and using them in the
> hosts param. At some point this would become somewhat labourious.

First and foremost our inventory script creates *a lot of* groups based on
CMDB information, including

Then we have:

- appl (application code, points to a team in the company)
- environment (dev, test, qa, prod)
- securityclass (dmz, fta)
- location (dc1, dc2)
- hardwaretype (vmware, kvm, blade, standalone)
- status (to-be-provisioned, provisioned, accepted, production, maintenance)

And we have some combined groups:

- location-environment
- securityclass-environment

This is mostly based on the need to have variables specific to any of
these combinations, or the specific use we have to limit on the command
line.

What you can do then is something like:

hosts: webservers:!redhat:!fedora

Which means all the webservers, except the redhat and fedora servers.
Which is in set theory the complement. You can make unions too:

hosts: debian:fedora

But what is missing is an intersection option, like (made-up syntax):

hosts: webservers#debian

My preference is to implement union, intersection and complement from set
theory and create a syntax for priority rules, etc...

--
-- dag wieers, d...@wieers.com, http://dag.wieers.com/
-- dagit linux solutions, in...@dagit.net, http://dagit.net/

[Any errors in spelling, tact or fact are transmission errors]

Michael Liddle

unread,
Dec 7, 2012, 11:50:17 AM12/7/12
to ansible...@googlegroups.com
On Friday, December 7, 2012 5:38:33 PM UTC+1, Michael DeHaan wrote:
It should work EXACTLY like the command line, in fact, all this would
be doing would be sourcing the data from the play object rather than
the CLI value, if one was set.

>
> I.e. if I have both debian and redhat webservers in London and New York, and
> I run the above plays with CLI --limit london, that shouldn't be ignored
> should it? Rather only debian and redhat webservers in london should be
> touched...

that is pretty much what it does, right?  :)
 
But if the CLI --limit was only applied to play objects without their own limit (which is what I thought you meant with "If the CLI --limit flag is used it should only set the limit for those plays that do not have a limit set"), then in my example above the CLI --limit london would be ignored and the play object limits "debian" and "redhat" used instead (and servers in NY would be affected too). No?
 

Michael Liddle

unread,
Dec 7, 2012, 12:01:02 PM12/7/12
to ansible...@googlegroups.com
On Friday, December 7, 2012 5:39:07 PM UTC+1, Dag Wieers wrote:
But what is missing is an intersection option, like (made-up syntax):

   hosts: webservers#debian

My preference is to implement union, intersection and complement from set
theory and create a syntax for priority rules, etc...

E.g.:

hosts: webservers#(debian:london)

hosts: (webservers#debian):london

?


Dylan Martin

unread,
Dec 7, 2012, 12:27:39 PM12/7/12
to ansible...@googlegroups.com
Speaking personally, I'd like to see as much consistency and predictability as possible. 

This might be crazy or stupid, so please be gentle when telling me off, but I'd like it if you could do everything from the command line and a play file.  Not sure how you would represent all the levels of a yaml file on a cmd line, but I keep getting confused by things that can be set one place but not another.  EG --connection works on cmd line but not playbook and hosts: works in playbook but you use --limit on cmd line. 

Maybe something like

ansible-playbook --hosts webservers --connection ssh ---tasks ----name wheeee ----command "echo this is nuts" an_almost_empty_playbook.yaml

It wouldn't be all that useful in itself, but it would give us a consistency that would be really powerful (and I suspect easier to code in the long run).

Okay, you may now throw tomatoes at me.

-Dylan

Michael DeHaan

unread,
Dec 7, 2012, 1:00:18 PM12/7/12
to ansible...@googlegroups.com
This syntax runs counter to my sensibilities and will not happen.

Michael DeHaan

unread,
Dec 7, 2012, 1:01:44 PM12/7/12
to ansible...@googlegroups.com
On Fri, Dec 7, 2012 at 12:27 PM, Dylan Martin
<dma...@seattlecentral.edu> wrote:
> Speaking personally, I'd like to see as much consistency and predictability
> as possible.
>
> This might be crazy or stupid, so please be gentle when telling me off, but
> I'd like it if you could do everything from the command line and a play
> file. Not sure how you would represent all the levels of a yaml file on a
> cmd line, but I keep getting confused by things that can be set one place
> but not another. EG --connection works on cmd line but not playbook and
> hosts: works in playbook but you use --limit on cmd line.
>
> Maybe something like
>
> ansible-playbook --hosts webservers --connection ssh ---tasks ----name
> wheeee ----command "echo this is nuts" an_almost_empty_playbook.yaml

You can generally do most of these things through feeding variables in
through --extra-vars.

This prevents bloating the command line options.

I'm generally NOT in favor of supporting this as it discourages reuse
and recording what you want to do, and doesn't make any sense when
describing multi-tier operations.

Michael DeHaan

unread,
Dec 7, 2012, 1:03:14 PM12/7/12
to ansible...@googlegroups.com
Oh, that.

The limit system is presently only one level of limiting, it must
appear somewhere in the limit group.

I'd accept patches to make it more additive.

Dag Wieers

unread,
Dec 7, 2012, 6:57:40 PM12/7/12
to ansible...@googlegroups.com
Don't dismiss based on syntax only, being able to do the above is very
powerful and avoids having to make groups just to be able to set a limit.
The fact we don't have the set theory symbols makes it harder to come up
with an acceptable syntax that people understand out of the box.

The current syntax Ansible is using today:

':' means union
':!' means intersection (with complement)
'!' means complement

whereas:

- there are no priority rules (left-to-right only ?)
- union+complement symbols means intersection of complement
(contrary to what one would expect)
- one cannot do everything that's useful

There are other symbols to be used.

'&' meaning union
'|' meaning intersection
'^' meaning complement

This would avoid the old convention (so we could have backward
compatibility) and it might look more readable:

hosts: (webservers|production|dmz) -> all webservers in production and in dmz
hosts: (webservers&proxyservers)|^production -> all webservers and proxyservers that are not in production
hosts: webservers|(debian&ubuntu) -> all webservers running debian and ubuntu
hosts: webservers|^dbserver -> all webservers not running a database
hosts: (blade&standalone)|(rhel&fedora) -> all physical boxes running RHEL and Fedora

Would that be more acceptable ?

Brian Coca

unread,
Dec 7, 2012, 10:12:21 PM12/7/12
to ansible...@googlegroups.com

This would rock, I currently use extravars to set host for most of my playbooks, this is much more elegant.

Brian Coca

Ahmad Khayyat

unread,
Dec 8, 2012, 12:43:49 AM12/8/12
to ansible...@googlegroups.com
On Friday, December 7, 2012 6:57:40 PM UTC-5, Dag Wieers wrote:
 
There are other symbols to be used.

     '&'  meaning union
     '|'  meaning intersection
     '^'  meaning complement

I vote in favor, if anyone's vote counts.
However, I'd like to note that '&' is usually used to express 'and', which is closer to intersection (present in both operands), while '|' is usually used to express 'or', which is closer to union (present in either operand).

So, I propose a modified version:

    '&'    intersecion (present in both)
    '|'    union       (present in either)
    '^'    complement  (not present)

I also second the call for making the feature available both in playbooks and the command line. Perhaps by accepting this syntax in the --limit argument. This leaves variables. Would there be a way to set variables based on such constructs?

Note also that Dag's proposal involves another operator: grouping using '( )' to control precedence.

That Would be Real Powerful, and would solve a LOT of problems trivially.

Michael DeHaan

unread,
Dec 8, 2012, 10:48:57 AM12/8/12
to ansible...@googlegroups.com
While I agree that set theory operations sound powerful, we are
required to add things in ways that don't break existing usage, and
don't seem redundant or different. Thus if --limit has to continue
to work the way it works now, and does hosts, I don't want another
syntax that feels completely different in ways where you have to be
able to read both. That's a mess.

When I created Ansible, I desired it to not be a programming language,
and to be maximally auditable. As such, I don't feel that having
stuff like:

hosts: ((webservers|dbservers)&production

is particularly readable or something I want to encourage, especially
when existing systems of host specs are intentionally simpler.

The way this is best handled, in my opinion, is maintaining a seperate
inventory file for your environments, such that inventory for
production and inventory for stage/development is kept seperate, and
then it's not a function of --limit at all.

I'd prefer if we had this conversation first in terms of concrete real
world use cases, and discussed how they could be modelled, rather than
first saying "here's a language feature I want" and adding it.
> --
>
>

Ashley Penney

unread,
Dec 8, 2012, 1:12:01 PM12/8/12
to ansible...@googlegroups.com
Maybe I misunderstand how things work but this seems to assume you're primarily using static inventory files.  I'm trying to use ec2.py right now as my inventory script and I want to be able to the following:

tag_Group_webservers AND tag_environment_production
tag_Group_webservers AND NOT tag_environment_production
tag_Group_webservers AND tag_environment_production AND tag_variant_test

There's some examples where I feel it's impractical to generate dozens of inventory files to pick and choose from.  Maybe I've misunderstood exactly how things work today but this seems difficult to do as things stand.  With auto scaling groups it's not very practical to do anything but real time inventory discovery.

I just wanted to get some real use cases in to make sure I understand what we're talking about here.

Michael DeHaan

unread,
Dec 8, 2012, 6:13:27 PM12/8/12
to ansible...@googlegroups.com
So if you're using external sources, you could have scripts that keep
your environments seperate by using seperate config files -- you
wouldn't have to follow exactly what the included EC2 example does.

That all being said my point is the syntax we have established for
hosts and limit needs to continue to work, and this means not
introducing additional newness into it without a way that you are
using the newness, and I'd like to first understand a use case where
the newness is required, so we design appropriately.

This could mean a new "hosts_set:" directive incompatible with the
latter, but I don't like if it doesn't also answer ways to use it with
the existing CLI options.

This might mean "set(...)" as some way of designating the newness,
etc, but I'd like to craft ideas first most around use cases that
*can't* be solved today -- far before we fit an implementation to it
-- and only then, if we need it.


On Sat, Dec 8, 2012 at 1:12 PM, Ashley Penney <ape...@gmail.com> wrote:
> Maybe I misunderstand how things work but this seems to assume you're
> primarily using static inventory files. I'm trying to use ec2.py right now
> as my inventory script and I want to be able to the following:
>
> tag_Group_webservers AND tag_environment_production

hosts: webservers
limit: production

> tag_Group_webservers AND NOT tag_environment_production

hosts: webservers
limit: !production

> tag_Group_webservers AND tag_environment_production AND tag_variant_test

I think this is the sticky one.

Currently "hosts" says, "be in one of these groups, and explicitly NOT
in any negated groups"

limit says "in addition to what hosts says, it must also be matched by
something in the limit"

It seems in this case, the proper way to do it in existing ansible
would be to have a production_test flag that selected those systems.
Maybe.

>
> There's some examples where I feel it's impractical to generate dozens of
> inventory files to pick and choose from. Maybe I've misunderstood exactly
> how things work today but this seems difficult to do as things stand. With
> auto scaling groups it's not very practical to do anything but real time
> inventory discovery.
>
> I just wanted to get some real use cases in to make sure I understand what
> we're talking about here.
>
>
> On Sat, Dec 8, 2012 at 10:48 AM, Michael DeHaan <michael...@gmail.com>
> wrote:
>>
>> The way this is best handled, in my opinion, is maintaining a seperate
>> inventory file for your environments, such that inventory for
>> production and inventory for stage/development is kept seperate, and
>> then it's not a function of --limit at all.
>
>
> --
>
>

Michael Liddle

unread,
Dec 10, 2012, 8:52:54 AM12/10/12
to ansible...@googlegroups.com
On Saturday, December 8, 2012 4:48:57 PM UTC+1, Michael DeHaan wrote:
While I agree that set theory operations sound powerful, we are
required to add things in ways that don't break existing usage, and
don't seem redundant or different.   Thus if --limit has to continue
to work the way it works now, and does hosts, I don't want another
syntax that feels completely different in ways where you have to be
able to read both.    That's a mess.

If one wanted to go in the direction of adding intersection to the pattern syntax, without making it too complicated, an alternative that retains the current operators and (basic) semantics could be as follows:

':' = 'add hosts in the following group to the current set' (union, as current)
':!' = 'remove hosts in the the following group from the current set' (complement, as current)
':!!' = 'remove hosts not in the following group from the current set' (intersection, new)

These patterns could have left-to-right precedence and be read as a simple list of "set building instructions" (i.e. not a complex set-theoretic equation). E.g.

webservers:!!debian

Can be read as:

1. take all hosts in webservers
2. remove hosts not in debian

I'm not sure that every possible combination can be built this way (although maybe they can?), but it certainly adds a few more options and is simpler than having a fully blown expression language.

Whether that pattern looks like it should do that is a different question, I guess!

Michael Liddle

unread,
Dec 10, 2012, 8:58:49 AM12/10/12
to ansible...@googlegroups.com


On Sunday, December 9, 2012 12:13:27 AM UTC+1, Michael DeHaan wrote:
On Sat, Dec 8, 2012 at 1:12 PM, Ashley Penney <ape...@gmail.com> wrote:
> tag_Group_webservers AND tag_environment_production AND tag_variant_test

I think this is the sticky one.

To add my two-cents to this one too... This could work:

hosts: webservers
limit:
- production
- test

I.e. in play objects limit params can be lists, which would be combined using intersection rules.

From the implementation side this is not a lot different to having to combine a CLI --limit and single play object limit param.

Daniel Hokka Zakrisson

unread,
Dec 10, 2012, 9:28:22 AM12/10/12
to ansible...@googlegroups.com
Michael DeHaan wrote:
> So if you're using external sources, you could have scripts that keep
> your environments seperate by using seperate config files -- you
> wouldn't have to follow exactly what the included EC2 example does.
>
> That all being said my point is the syntax we have established for
> hosts and limit needs to continue to work, and this means not
> introducing additional newness into it without a way that you are
> using the newness, and I'd like to first understand a use case where
> the newness is required, so we design appropriately.
>
> This could mean a new "hosts_set:" directive incompatible with the
> latter, but I don't like if it doesn't also answer ways to use it with
> the existing CLI options.
>
> This might mean "set(...)" as some way of designating the newness,
> etc, but I'd like to craft ideas first most around use cases that
> *can't* be solved today -- far before we fit an implementation to it
> -- and only then, if we need it.

So when we had talked about this before, we had discussed simply extending
the hosts: (and ansible target) to accept &group to limit to perform the
intersection. E.g. webservers:!debian:&datacenter1 to limit it to webservers
not running Debian in datacenter1. This would be rather easy to implement,
looks like the hosts declaration already does, and allows expressing most of
the common scenarios.

Adding a limit on the play seems odd and doesn't quite fit with everything
else.

Daniel

Daniel Hokka Zakrisson

unread,
Dec 10, 2012, 9:58:39 AM12/10/12
to ansible...@googlegroups.com

Michael DeHaan

unread,
Dec 10, 2012, 6:34:30 PM12/10/12
to ansible...@googlegroups.com
> So when we had talked about this before, we had discussed simply extending
> the hosts: (and ansible target) to accept &group to limit to perform the
> intersection. E.g. webservers:!debian:&datacenter1 to limit it to webservers
> not running Debian in datacenter1. This would be rather easy to implement,
> looks like the hosts declaration already does, and allows expressing most of
> the common scenarios.

I like this syntax very much.

Great job at simplifying requirements.

--Michael

Daniel Hokka Zakrisson

unread,
Dec 10, 2012, 6:43:33 PM12/10/12
to ansible...@googlegroups.com
Michael DeHaan wrote:
>> So when we had talked about this before, we had discussed simply
>> extending
>> the hosts: (and ansible target) to accept &group to limit to perform the
>> intersection. E.g. webservers:!debian:&datacenter1 to limit it to
>> webservers
>> not running Debian in datacenter1. This would be rather easy to
>> implement,
>> looks like the hosts declaration already does, and allows expressing
>> most of
>> the common scenarios.
>
> I like this syntax very much.

I figured you might, it's your original suggestion ;-)

Daniel

> Great job at simplifying requirements.
>
> --Michael
>
> --
>
>

Michael DeHaan

unread,
Dec 10, 2012, 6:52:35 PM12/10/12
to ansible...@googlegroups.com
> I figured you might, it's your original suggestion ;-)
>
> Daniel

Was it?

You should not tell me these things, as I forget them quickly.

--Michael
Reply all
Reply to author
Forward
0 new messages