Structure for Ansible and handling of config

812 views
Skip to first unread message

Daniel Schroeder

unread,
Jul 30, 2014, 4:21:25 AM7/30/14
to ansible...@googlegroups.com
Heya,

I'm currently evaluating Ansible for our company but I'm having problems to understand how we would set up our global structure. We have like 10.000 hosts and hundreds of different projects with different teams and needs. Still we need to use a common code base (roles, plugins) and not re-invent the wheel in every project. Of course I understand roles and playbooks. My problem is how settings are handled and from there it leads to more problems.

1) User settings vs. project settings

According to the docs, Ansible looks in different locations for the cfg file and uses the first found, ignoring all others. So it is not possible to have settings in different locations, which makes it hard to allow user specific settings (e.g. private_key_file) and at the same time define company wide or project specific settings, which should be stored in git.

Ideally Ansible would respect all cfg files, merge the content and handle the settings with precedence in the order like described in the docs.

2) Settings are global, not per playbook

But different playbooks might have different needs, e.g. a playbook could require hash_behaviour=merge. To archive this one would need to create a folder specific for the playbook, place the cfg with the settings inside, cd to that folder and run ansible-playbook from there.

Another thing are callbacks. Those are fired as soon as they exists in the per settings defined folder. So every playbook which has a specific callback would require its own callback directory + settings. When following this pattern, you'd need to create the roles within this directory as well as those need to be located on the same level you execute ansible-playbook on.

This all makes it very hard to re-use roles and plugins. I worked on a structure that might work but is a symlink mayhem:

group_vars
  all
 
group-1
 
group-N
host_vars
  host
-1
  host
-N
inventory
  production
  staging
  uat
library
  modules
    mod
-1.py
    mod
-N.py
  plugins
    callback
      callback
-1.py
      callback
-N.py
    filter
      filter
-1.py
      filter
-N.py
    lookup
      lookup
-1.py
      lookup
-N.py
playbooks
  project
-1
    ansible
.cfg
    group_vars
-> ../../group_vars
    host_vars
-> ../../host_vars
    library
      modules
-> ../../../library/modules
      plugins
        callback
          callback
-1.py -> ../../../../../library/plugins/callback/callback-1.py
          callback
-N.py -> ../../../../../library/plugins/callback/callback-N.py
        filter
-> ../../../../library/plugins/filter
        lookup
-> ../../../../library/plugins/lookup
    production
-> ../../inventory/production
    playbook
-1.yml
    playbook
-N.yml
    roles
-> ../../roles
    staging
-> ../../inventory/staging
    uat
-> ../../inventory/uat
  project
-N
   
...
roles
  role
-1
  role
-N

Roles, inventory, host_vars, group_vars, modules and plugins are defined on the root level. The playbooks directory holds all projects. Every project then can define it's own cfg with specific settings. The inventory files, roles, *_vars, modules, filter-plugins and lookup-plugins are symlinked into the project-folder. With the callbacks it's more tricky as those would be fired as soon as they exists. So every callback that is required for a project needs to be symlinked explicitly.

This enables us to globally handle all re-usable components while every project can define its own settings. A project would/could be a submodule in the main git repo.

So this seems to work. Only that user specific settings are still not possible. Does this setup make sense from the PoV of more experienced Ansible users? IMHO this looks quite complex and I wonder what I'm missing here because things shouldn't be that complex.

Thanks in advance,
Daniel

Michael DeHaan

unread,
Jul 30, 2014, 7:59:10 AM7/30/14
to ansible...@googlegroups.com
Hi Daniel, replies are inline.


On Wed, Jul 30, 2014 at 4:21 AM, Daniel Schroeder <deem...@googlemail.com> wrote:
Heya,

I'm currently evaluating Ansible for our company but I'm having problems to understand how we would set up our global structure. We have like 10.000 hosts and hundreds of different projects with different teams and needs. Still we need to use a common code base (roles, plugins) and not re-invent the wheel in every project. Of course I understand roles and playbooks. My problem is how settings are handled and from there it leads to more problems.

1) User settings vs. project settings

According to the docs, Ansible looks in different locations for the cfg file and uses the first found, ignoring all others. So it is not possible to have settings in different locations, which makes it hard to allow user specific settings (e.g. private_key_file) and at the same time define company wide or project specific settings, which should be stored in git.

With an infrastructure of this size, you should really consider Ansible Tower.   Tower allows you to upload a private key into a concept called a *credential* and this credential is securely provided to those who have access to it (shared with specific teams) and they don't get to see the credential.

Each job template can also be associated with specific credentials for launching.

Alternatively, you could just store your ansible.cfg in git and set ANSIBLE_CFG to the ansible.cfg path. 



2) Settings are global, not per playbook

But different playbooks might have different needs, e.g. a playbook could require hash_behaviour=merge. To archive this one would need to create a folder specific for the playbook, place the cfg with the settings inside, cd to that folder and run ansible-playbook from there.

It's recommended, to be consistent with the majority of the ansible community, that people don't adopt hash_behavior=merge.   However there are some that really feel like they should use it.

In this case, people can use it, it will continue to work, it may just be a little confusing.  I recommend you set a policy on what you use so that everyone can easily read playbooks and know what might be going on.

Most people don't need the complexity of hash_behavior=merge and I recommend people try to avoid it, since it starts to be fun figuring out where something added to a variable and you lose the ability to override hash variables completely.
 

Another thing are callbacks. Those are fired as soon as they exists in the per settings defined folder. So every playbook which has a specific callback would require its own callback directory + settings. When following this pattern, you'd need to create the roles within this directory as well as those need to be located on the same level you execute ansible-playbook on.

I'm not sure how you have callbacks and roles interlocking, as they are not related concepts.

However, if your custom callback requires configuration, the common mechanism is for it to read an environment variable.  This environment variable could even reference the path to a configuration file.

Not too many people write custom callbacks, but if you're looking for something like logging to a database or making things API accessible, again, Tower provides this stock.   For questions about that, feel free to ask sup...@ansible.com.
I agree this looks horrible, but I don't think this is ansible's fault.  I think you could use some best practices organization advice here, but I'd first like to step back and ask what your callbacks *do* to understand the problem, and also ask why callbacks are project specific.   We can then work out the best way to organize this as well as write your callbacks.

 
This enables us to globally handle all re-usable components while every project can define its own settings. A project would/could be a submodule in the main git repo.

So this seems to work. Only that user specific settings are still not possible. Does this setup make sense from the PoV of more experienced Ansible users? IMHO this looks quite complex and I wonder what I'm missing here because things shouldn't be that complex.

It looks like it's quite an order of magnitude too complex.  But see above, let's talk through the callback question and I think making your callbacks configurable may solve the problem.   However I want to understand a bit more about what they do first, and I think we can help completely eliminate all the symlink fun.



Thanks in advance,
Daniel

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/74ad0525-4925-45b5-9542-4c8de9bd270a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Schroeder

unread,
Jul 31, 2014, 4:39:10 AM7/31/14
to ansible...@googlegroups.com
Thanks Michael,

Tower allows you to upload a private key

Well the key only was an example. One that I just made up because there might be the possibility one needs to change it. Another example would be the ssh timeout or more generally the ssh_args. Probably this all could be set in the ssh config. I just want to make sure there won't be a showstopper in future when a user requires a config tweak and can't set it because the cfg is in source control.

The storage of ssh keys in Tower for sure is a very nice feature which might get interesting for us in future for security reasons.

It's recommended, to be consistent with the majority of the ansible community, that people don't adopt hash_behavior=merge.   However there are some that really feel like they should use it.

I don't see a way around this setting in our case. That's because our systems will be furnished with many different services which are provided by different teams. And which config a service uses might depend on other services. A good example for this would be Splunk. Splunk is a tool for collecting and indexing logfiles. Which logs will be collected depends then on which other services will run on a host. My idea was to organize this in groups.

group_vars for group-A:
---
splunk
-forwarder:
  file
-A: sourcetype-A
...

group_vars for group-B:
---
splunk
-forwarder:
 
- file-B: sourcetype-B
...

When the host belongs to group A and B the content will be merged:
---
splunk
-forwarder:
  file
-A: sourcetype-A
  file
-B: sourcetype-B
...

A playbook might look like this:
---
- name: Some playbook
  hosts
: [some-host-which-may-belong-to-A-and/or-B]
  roles
:
   
- { role: role-A, when: "'group-A' in group_names" }
   
- { role: role-B, when: "'group-B' in group_names" }
   
- { role: splunk-forwarder, when: "splunk-forwarder is defined and splunk-forwarder | length > 0" }
...

So along with some other roles the splunk-forwarder role is applied which then uses the config of the other groups.

Another use case - exceptionally one that is not made up and I really have in my evaluation experiment - is a redis proxy (twemproxy by twitter) which should forward connections to different redis clusters. Each cluster is defined in a separate group. To get the relevant configuration of all clusters I include the group_vars of all clusters in a loop. The proxy config only holds references to the clusters, like so:

---
redisproxy
:
 
- host: some-host
    pools
:
     
- cluster: A
        pool
: A
     
- cluster: B
        pool
: K
...

So this group_vars holds the config for all proxies on all hosts. Here we have 2 pools each defined in a different cluster. In a loop in the proxy role I then include all the group_vars of the clusters (redis-proxy-A, redis-proxy-B) which again will be merged hashes.

I recommend you set a policy on what you use so that everyone can easily read playbooks and know what might be going on.

In case of the hash_behavior you are right. Since roles are services provided by different teams we need consistent behavior which developers can rely on. Though for callback plugins this still is a problem.

I'm not sure how you have callbacks and roles interlocking, as they are not related concepts.

It's just that you need to define the roles on the same level as the ansible.cfg or the playbook won't find them.
The simple requirement "playbook specific callback" -> requires a specific ansible.cfg -> requires a root folder for every playbook where the cfg can be placed in along with the playbook -> requires the roles on the same level inside this specific folder.

However, if your custom callback requires configuration, the common mechanism is for it to read an environment variable.  This environment variable could even reference the path to a configuration file.

It's not that I need config (well, I do, but that's another topic ;-)) but to enable or disable a callback per playbook. Some team might want to log to a database like you say. Another team might want to send notifications to their Hipchat channel. Who knows. That's up to them, I just try to find a solution to give them the chance to do whatever they want. Environment variables might be an option, but that's everything but convenient when a user needs to manually set 15 variables before running the playbook and then changing it when running another one.

From the other mail thread I have seen how to access vars inside the callback plugin and that might be a handy option. Then it would be possible to enable/disable a callback per group_vars.

but I'd first like to step back and ask what your callbacks *do*

Nothing specific. I don't really have callbacks other than the Hipchat plugin I'm playing with. I just want to find the best possible setup to give our teams the most freedom in future. But as written before, roles come from different teams and each team might want to get notified on failure, log changes to a database or whatever comes to their mind. So I need a flexible framework where things can be configured per playbook and role.

With the settings available in the callback I believe I can work.

In the all group_vars we then can define notification settings:
---
notifications
:
  playbook
-name:
    role
-name:
      task
-name:
        fail
:
         
- type: Hipchat
            room
: 12345
         
- type: Email
            to
: me@example.com
        ok
:
         
- type: Hipchat
            room
: 12345
...

Each element (playbook-name, role-name, task-name and the actual callback type name could be a wildcard to match any value, so a role provider could get notified of failures in-depended of the playbook name.
Each callback would then run through those definitions and either get active or not. Then I can use a simple structure as we do not require a custom cfg to define a separate set of callbacks.

As for user specific settings vs. company-wide settings: Is there a reasons why multiple cfg's are not merged? Would you accept a PR for such a feature?

Cheers!
Daniel

Michael DeHaan

unread,
Jul 31, 2014, 4:58:23 PM7/31/14
to ansible...@googlegroups.com
On Thu, Jul 31, 2014 at 4:39 AM, Daniel Schroeder <deem...@googlemail.com> wrote:
Thanks Michael,

Tower allows you to upload a private key

Well the key only was an example. One that I just made up because there might be the possibility one needs to change it. Another example would be the ssh timeout or more generally the ssh_args. Probably this all could be set in the ssh config. I just want to make sure there won't be a showstopper in future when a user requires a config tweak and can't set it because the cfg is in source control.

How realistic is that this would need to change on a per project setting versus a per installation setting?  In many cases it seems these should be able to have very good defaults that work for everyone IMHO>
 

The storage of ssh keys in Tower for sure is a very nice feature which might get interesting for us in future for security reasons.

It's recommended, to be consistent with the majority of the ansible community, that people don't adopt hash_behavior=merge.   However there are some that really feel like they should use it.

I don't see a way around this setting in our case. That's because our systems will be furnished with many different services which are provided by different teams.

So far this describes most all Ansible users :)
 
And which config a service uses might depend on other services. A good example for this would be Splunk. Splunk is a tool for collecting and indexing logfiles. Which logs will be collected depends then on which other services will run on a host. My idea was to organize this in groups.

Cool, we've got a lot of big users using Ansible for splunk configuration...
 

group_vars for group-A:
---
splunk
-forwarder:
  file
-A: sourcetype-A
...

group_vars for group-B:
---
splunk
-forwarder:
 
- file-B: sourcetype-B
...

When the host belongs to group A and B the content will be merged:
---
splunk
-forwarder:
  file
-A: sourcetype-A
  file
-B: sourcetype-B
...


So I think I understand it is that on a per role basis you want to configure splunk to possibly go to different locations.

In such a case, I think this could easily be solved with a template that based on something like group_names, decides to add which forwarders.

Ignoring splunk and generalizing it to foo.conf at the moment:

{% if 'xyz' in group_names %}
   text code to enable forwarder A
{% endif %}
{% if 'jkl' in group_names %}
   text code to enable forwarder B
{% endif %}
 
A playbook might look like this:
---
- name: Some playbook
  hosts
: [some-host-which-may-belong-to-A-and/or-B]
  roles
:
   
- { role: role-A, when: "'group-A' in group_names" }
   
- { role: role-B, when: "'group-B' in group_names" }
   
- { role: splunk-forwarder, when: "splunk-forwarder is defined and splunk-forwarder | length > 0" }
...

So along with some other roles the splunk-forwarder role is applied which then uses the config of the other groups.

Another use case - exceptionally one that is not made up and I really have in my evaluation experiment - is a redis proxy (twemproxy by twitter) which should forward connections to different redis clusters. Each cluster is defined in a separate group. To get the relevant configuration of all clusters I include the group_vars of all clusters in a loop. The proxy config only holds references to the clusters, like so:

 

---
redisproxy
:
 
- host: some-host
    pools
:
     
- cluster: A
        pool
: A
     
- cluster: B
        pool
: K
...

So this group_vars holds the config for all proxies on all hosts. Here we have 2 pools each defined in a different cluster. In a loop in the proxy role I then include all the group_vars of the clusters (redis-proxy-A, redis-proxy-B) which again will be merged hashes.


I would probably approach this by templating the config file if possible too, though there could be other approaches.
 

I recommend you set a policy on what you use so that everyone can easily read playbooks and know what might be going on.

In case of the hash_behavior you are right. Since roles are services provided by different teams we need consistent behavior which developers can rely on. Though for callback plugins this still is a problem.

I'm not sure how you have callbacks and roles interlocking, as they are not related concepts.

It's just that you need to define the roles on the same level as the ansible.cfg or the playbook won't find them.
The simple requirement "playbook specific callback" -> requires a specific ansible.cfg -> requires a root folder for every playbook where the cfg can be placed in along with the playbook -> requires the roles on the same level inside this specific folder.


I'm still confused a bit why you would have 100 different chat channels.  That seems pretty interesting, but also almost like you'd want a better way of recording than chat channels.  Not to say this isn't novel.   

You could also write a callback that payed attention to the name of the play or something, though this may require some tweaking.
 

However, if your custom callback requires configuration, the common mechanism is for it to read an environment variable.  This environment variable could even reference the path to a configuration file.

It's not that I need config (well, I do, but that's another topic ;-)) but to enable or disable a callback per playbook. Some team might want to log to a database like you say. Another team might want to send notifications to their Hipchat channel. Who knows. That's up to them, I just try to find a solution to give them the chance to do whatever they want. Environment variables might be an option, but that's everything but convenient when a user needs to manually set 15 variables before running the playbook and then changing it when running another one.

Forgot about this.

Unless I misremember implementing it a playbook can have a "./callback_plugins" directory relative to it and that will work.

Whether this was a symlink or whatever, that callback plugin could contain an INI file or something that included a room ID.
 

From the other mail thread I have seen how to access vars inside the callback plugin and that might be a handy option. Then it would be possible to enable/disable a callback per group_vars.

but I'd first like to step back and ask what your callbacks *do*

Nothing specific. I don't really have callbacks other than the Hipchat plugin I'm playing with. I just want to find the best possible setup to give our teams the most freedom in future. But as written before, roles come from different teams and each team might want to get notified on failure, log changes to a database or whatever comes to their mind. So I need a flexible framework where things can be configured per playbook and role.

With the settings available in the callback I believe I can work.

In the all group_vars we then can define notification settings:
---
notifications
:
  playbook
-name:
    role
-name:
      task
-name:
        fail
:
         
- type: Hipchat
            room
: 12345
         
- type: Email
            to
: me@example.com
        ok
:
         
- type: Hipchat
            room
: 12345
...

Each element (playbook-name, role-name, task-name and the actual callback type name could be a wildcard to match any value, so a role provider could get notified of failures in-depended of the playbook name.
Each callback would then run through those definitions and either get active or not. Then I can use a simple structure as we do not require a custom cfg to define a separate set of callbacks.

As for user specific settings vs. company-wide settings: Is there a reasons why multiple cfg's are not merged? Would you accept a PR for such a feature?

We thought it would be confusing.   Various folks agree and disagree.

 
Reply all
Reply to author
Forward
0 new messages