gathering smart and ansible_env behaviour

1,045 views
Skip to first unread message

Darragh Bailey

unread,
Oct 5, 2015, 10:11:15 AM10/5/15
to Ansible Project
Hi,


Encountered somewhat surprising behaviour regarding what happens when you enable smart fact gathering, have some plays using 'become: yes' at the top level along with using ansible_env['USER'] and ansible_env['HOME'].

I say surprising, because although it's clear what happens once you understand what's going on, it's not really "behaviour of least surprise" when creating playbooks and roles for maximum reuse.

The Example

Taking the following play:
--------------
- hosts: machine1
  tasks:
    - debug: var=ansible_user_id
    - debug: var=ansible_user_dir
    - debug: var=ansible_ssh_user
    - debug: var=ansible_env['USER']

- hosts: machine1
  become: yes
  tasks:
    - debug: var=ansible_user_id
    - debug: var=ansible_user_dir
    - debug: var=ansible_ssh_user
    - debug: var=ansible_env['USER']

- hosts: machine1
  tasks:
  - template:
      src: dump_variables.j2
      dest: /tmp/ansible_variables
  - fetch:
      src: /tmp/ansible_variables
      dest: "ansible_variables"

- hosts: machine1
  become: yes
  tasks:
  - template:
      src: dump_variables.j2
      dest: /tmp/ansible_variables
  - fetch:
      src: /tmp/ansible_variables
      dest: "become_ansible_variables"
--------------

dump_variables.j2 is:
--------------
HOSTVARS (ANSIBLE GATHERED, group_vars, host_vars) :

{{ hostvars[inventory_hostname] | to_yaml }}

PLAYBOOK VARS:

{{ vars | to_yaml }}
--------------

taken from https://groups.google.com/forum/#!msg/ansible-project/IVUwp9195Ek/HZ3QXvf6s1sJ.


When run with 'gathering = smart' in the ~/.ansible.cfg you get the same results for the each of the two sets of plays, all the debug variables print out the same and all of variables dumped are identical. But when you run with the default (comment out the gathering config setting) of implicit, you see the following output (condensed for convenience):

1: initial debug vars:
"ansible_user_id": "stack"
"ansible_user_dir": "/home/stack"
"ansible_ssh_user": "stack"
"ansible_env['USER']": "stack"

2: same debug vars with 'become: yes':
"ansible_user_id": "root"
"ansible_user_dir": "/root"
"ansible_ssh_user": "stack"
"ansible_env['USER']": "root"


--------------
--- ../../ansible/ansible_variables/deployer/tmp/ansible_variables    2015-10-05 12:21:28.062786276 +0100
+++ ../../ansible/become_ansible_variables/deployer/tmp/ansible_variables    2015-10-05 12:21:29.138790179 +0100
@@ -18,9 +18,9 @@
 ansible_bios_version: Bochs
 ansible_cmdline: {BOOT_IMAGE: /boot/vmlinuz-3.14.51-1-amd64-hlinux, console: 'ttyS0,115200',
   nofb: true, nomodeset: true, ro: true, root: /dev/mapper/hlm--vg-root, vga: normal}
-ansible_date_time: {date: '2015-10-05', day: '05', epoch: '1444044087', hour: '11',
-  iso8601: '2015-10-05T11:21:27Z', iso8601_micro: '2015-10-05T11:21:27.257819Z', minute: '21',
-  month: '10', second: '27', time: '11:21:27', tz: UTC, tz_offset: '+0000', weekday: Monday,
+ansible_date_time: {date: '2015-10-05', day: '05', epoch: '1444044088', hour: '11',
+  iso8601: '2015-10-05T11:21:28Z', iso8601_micro: '2015-10-05T11:21:28.178812Z', minute: '21',
+  month: '10', second: '28', time: '11:21:28', tz: UTC, tz_offset: '+0000', weekday: Monday,
   year: '2015'}
 ansible_default_ipv4: {address: 192.168.121.40, alias: eth0, gateway: 192.168.121.1,
   interface: eth0, macaddress: '52:54:00:3b:45:c4', mtu: 1500, netmask: 255.255.255.0,
@@ -59,13 +59,16 @@
 ansible_distribution_release: cattleprod
 ansible_distribution_version: '8'
 ansible_domain: ''
-ansible_env: {HOME: /home/stack, LANG: C, LC_ADDRESS: en_IE.UTF-8, LC_COLLATE: en_US.UTF-8,
+ansible_env: {HOME: /root, LANG: C, LC_ADDRESS: en_IE.UTF-8, LC_COLLATE: en_US.UTF-8,
   LC_CTYPE: C, LC_IDENTIFICATION: en_IE.UTF-8, LC_MEASUREMENT: en_IE.UTF-8, LC_MESSAGES: en_US.UTF-8,
   LC_MONETARY: en_IE.UTF-8, LC_NAME: en_IE.UTF-8, LC_NUMERIC: en_IE.UTF-8, LC_PAPER: en_IE.UTF-8,
-  LC_TELEPHONE: en_IE.UTF-8, LC_TIME: en_IE.UTF-8, LOGNAME: stack, MAIL: /var/mail/stack,
-  PATH: '/usr/local/bin:/usr/bin:/bin:/usr/games', PWD: /home/stack, SHELL: /bin/bash,
-  SHLVL: '1', SSH_CLIENT: 192.168.121.1 37076 22, SSH_CONNECTION: 192.168.121.1 37076
-    192.168.121.40 22, SSH_TTY: /dev/pts/0, TERM: xterm, USER: stack, _: /bin/sh}
+  LC_TELEPHONE: en_IE.UTF-8, LC_TIME: en_IE.UTF-8, LOGNAME: root, MAIL: /var/mail/root,
+  PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', PWD: /home/stack,
+  SHELL: /bin/bash, SUDO_COMMAND: /bin/sh -c echo BECOME-SUCCESS-rezdjrduclnbktzmvdjusxutxpbrncak;
+    LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/stack/.ansible/tmp/ansible-tmp-1444044088.08-205236105695791/setup;
+    rm -rf /home/stack/.ansible/tmp/ansible-tmp-1444044088.08-205236105695791/ >/dev/null
+    2>&1, SUDO_GID: '1001', SUDO_UID: '1001', SUDO_USER: stack, TERM: xterm, USER: root,
+  USERNAME: root}
 ansible_eth0:
   active: true
   device: eth0
@@ -154,12 +157,12 @@
 ansible_swaptotal_mb: 0
 ansible_system: Linux
 ansible_system_vendor: QEMU
-ansible_user_dir: /home/stack
-ansible_user_gecos: ''
-ansible_user_gid: 1001
-ansible_user_id: stack
+ansible_user_dir: /root
+ansible_user_gecos: root
+ansible_user_gid: 0
+ansible_user_id: root
 ansible_user_shell: /bin/bash
-ansible_user_uid: 1001
+ansible_user_uid: 0
 ansible_userspace_architecture: x86_64
 ansible_userspace_bits: '64'
 ansible_virtualization_role: guest
--------------


The Problem

Where this starts to cause issues, is writing out config files that need to contain absolute paths for the target user the task is executed for. If we use 'gathering: smart', we end up getting the wrong information, or needing to add additional tasks to extract the correct user name and home by inspecting the results of echoing variables on the remote.


Working around by using 'gather_facts: True' on the play or adding a 'setup' task to the start, will reset the variables for all other subsequent plays in the site.yml. This means you end up adding a post play to call 'gather_facts: True' where become is not set, so that there are no unexpected surprises for subsequent plays in site.yml expecting ansible_env to reflect that of the remote user instead of the sudo user. This starts to negate some of the benefits of smart fact gathering though.


There simply doesn't appear to be any variables that correctly reflect the user a task is being executed under, so even we move the "become: yes" to be on the role or on each task (on the include more likely), there is no variable that is guaranteed to contain the username and home directory of the user executing the task.

As mentioned, for many modules, we can use '~' and it'll be correctly expanded, but doesn't work so well for configuration files where we need to insert the correct value and '~' may not be expanded by the using application.

Problem appears a little works for the username, although it does appear as though we could use the following hack to get the remote user in cases the play may be run by root where most tasks require root but we want to set ownership of files/directories to the remote user without hardcoding the username:

  {{ ansible_env['SUDO_USER'] | default(ansible_env['USER']) }}

Which should also work for writing such a value to a config file.

Picking up the remote user home directory where we need to perform a task as a sudo user but use this value seems more troublesome though we can use the following:

  {{ ansible_env['PWD'] }}


Both of these though feel like hacks, the alternative of using a shell command and register also feels somewhat clunky.


It'd be nice..


It really seems to me that it would be easier to make roles and plays that make use of become more reusable if there was a standard pattern that could would get consistent results no matter where "become: yes" is placed (task, include, role or play) when running with smart fact gathering.


Maybe, 'ansible_task_env' which always reflects the environment of the user that a task is executed under. Though that may not work with filters, I'm not sure when ansible resolves the final value of variables.


Alternatively a set of user variables

ansible_become_user_dir:
ansible_become_user_gecos:
ansible_become_user_gid:
ansible_become_user_id:
ansible_become_user_shell:
ansible_become_user_uid:

ansible_remote_user_dir:
ansible_remote_user_gecos:
ansible_remote_user_gid:
ansible_remote_user_id:
ansible_remote_user_shell:
ansible_remote_user_uid:


I surmise this could be implemented via a custom module, but it feels like something that should be part of the default fact gathering to remove some ambiguity around remote and become user for smart fact gathering.


Unless of course this is already changed in v2, or there is a standard pattern that I should be following and recommending to other users of ansible locally besides just either don't use smart gathering or don't use 'become: yes' on top level plays.

--
Darragh Bailey
"Nothing is foolproof to a sufficiently talented fool"

Brian Coca

unread,
Oct 5, 2015, 10:20:32 AM10/5/15
to Ansible Project
What you are proposing would require that every task gather facts
before running, that would be very slow and require at least double
the number of connections.
--
Brian Coca

Darragh Bailey

unread,
Oct 7, 2015, 10:37:45 AM10/7/15
to Ansible Project


I don't think it would need to query such information on every task execution.

Ansible already gathers env facts one user, why not have that to gather the env facts for both the become and remote user at the same time, with added option to list additional users to include as well, in case someone wants to change the become user for tasks subsequently, and have a way of making that information available to tasks run for where access to the environment settings is needed without overwriting the view presented to subsequent tasks when using fact caching.

Sure it might be useful to automatically gather the env information for the become_user if it's not already retrieved as part of the initial fact gathering, but that would only result in every task gathering additional facts if you used a different become_user for each task. If this was added, making it optional and default to off would make sense. I would consider it a bit like the idea of only gathering host facts if not already gathered, but just applied to the user level as well.


Also provide some variables that identify the remote and become users separately and consistently instead of the current situation where with smart fact gathering the value of ansible_user_* can change depending on whether become was set to yes or no with the last fact gathering on that host during the same run. That makes ansible_env and ansible_user_* trickier to use for re-usability without causing surprises.


Right now when that situation occurs, to prevent odd behaviour in following plays expecting the default ansible_env to reflect the remote user, would need to force gather_facts for the play with become set to yes, followed by a subsequent force gather_facts with become set to no.



I think it should be possible to provide consistent behaviour, even if that may mean that you may need to explicitly gather the env info for a particular user.

If ansible exposed both the become_user and remote_user and also had a variable that reflected what user a task was going to be run using, then I might be able to put together some custom fact gathering for local use that would ensure consistency around ensuring that the env variables being accessed matched the task user provided we used a particular pattern to access.

--
Darragh Bailey

Reply all
Reply to author
Forward
0 new messages