ansible find.py seems to fail on NFS drives with certain mount options

28 views
Skip to first unread message

Larry Kyrala

unread,
Aug 14, 2019, 3:12:08 PM8/14/19
to Ansible Project
Hi, 

I opened an issue some time ago which was dismissed as a user issue and was directed here instead:

I've discovered more information since reporting that issue.  Similar issues have also been observed on Mac as well as Windows, which makes me suspect NFS mount options, since the filesystem permissions in all other cases are the same between working accounts and non-working accounts.

If we look at specific logs, it appears that once again find.py is implicated.  I'm not sure what find.py is doing that is failing, but maybe someone here can help?


# our deploy script works for some users and fails for others.  When it fails, it looks like this:


user2
~/ansible-rubyvm> ansible-playbook deploy.yml --ask-become-pass -vvv
ansible-playbook 2.8.3
  config file = /
network/home/user2/ansible-rubyvm/ansible.cfg
  configured
module search path = [u'/home/user2/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python
module location = /usr/lib/python2.7/dist-packages/ansible
  executable location
= /usr/bin/ansible-playbook
  python version
= 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
Using /network/home/user2/ansible-rubyvm/ansible.cfg as config file
BECOME password
:
...
host_list declined parsing
/network/home/user2/ansible-rubyvm/hosts as it did not pass it's verify_file() method
script declined parsing /network/home/user2/ansible-rubyvm/hosts as it did not pass it'
s verify_file() method
auto declined parsing /network/home/user2/ansible-rubyvm/hosts as it did not pass it's verify_file() method
...
TASK [deploy : find deployments] *****************************************************************************************************************************************************************************************************************************************************************************
task path: /network/home/user2/ansible-rubyvm/roles/deploy/tasks/main.yml:2
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: user2
<localhost> EXEC /bin/sh -c '
( umask 77 && mkdir -p "` echo /tmp/${USER}/ansible/ansible-tmp-1565734953.55-54346346260526 `" && echo ansible-tmp-1565734953.55-54346346260526="` echo /tmp/${USER}/ansible/ansible-tmp-1565734953.55-54346346260526 `" ) && sleep 0'
Using module file /usr/lib/python2.7/dist-packages/ansible/modules/files/find.py
<localhost> PUT /home/user2/.ansible/tmp/ansible-local-1636121zxeQM/tmpd33rm8 TO /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py
<localhost> EXEC /bin/sh -c '
chmod u+x /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/ /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py && sleep 0'
<localhost> EXEC /bin/sh -c '
sudo -H -S -n  -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-nydnzrdhipfrrspdeywfggmpcnzadjmp ; /usr/bin/python /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c '
rm -f -r /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/ > /dev/null 2>&1 && sleep 0'
ok: [localhost] => {
   "changed": false,
    "examined": 0,
    "files": [],
    "invocation": {
        "module_args": {
            "age": null,
            "age_stamp": "mtime",
            "contains": null,
            "depth": null,
            "excludes": [
                "sample_deployment.json"
            ],
            "file_type": "file",
            "follow": false,
            "get_checksum": false,
            "hidden": false,
            "paths": [
                "deployments/"
            ],
            "patterns": [
                "*.json"
            ],
            "recurse": false,
            "size": null,
            "use_regex": false
        }
    },
    "matched": 0,
    "msg": "deployments/ was skipped as it does not seem to be a valid directory or it cannot be accessed\n"
}




# script that is failing - deploy/main.yml:


- name: find deployments
  find:
    paths: deployments/
    patterns: "*.json"
    excludes: "sample_deployment.json"
  register: files_matched




# directory permissions from within the VM:


user2 ~/ansible-rubyvm> ls -la
total 37
drwxr-xr-x   8 user2  users    20 Jun  6 14:29 ./
drwxr-xr-x  15 user2  users    18 Aug  1 15:22 ../
drwxr-xr-x   8 user2  users    13 Aug 14 14:36 .git/
-rw-r--r--   1 user2  users   113 Jun  6 14:14 .gitignore
-rw-r--r--   1 user2  users     0 Jun  6 14:14 .placeholder
-rw-r--r--   1 user2  users  8946 Jun  6 14:14 README.md
-rw-r--r--   1 user2  users   639 Jun  6 14:29 aliases.sh
-rw-r--r--   1 user2  users    67 Jun  6 14:14 ansible.cfg
-rw-r--r--   1 user2  users    88 Jun  6 14:14 deploy.yml
drwxr-xr-x   2 user2  users     3 Jun  6 14:14 deployments/
drwxr-xr-x   2 user2  users     6 Jun  6 14:14 doc/
drwxr-xr-x   2 user2  users     3 Jun  6 14:21 files/
-rw-r--r--   1 user2  users   151 Jun  6 14:14 goodies.yml
drwxr-xr-x   2 user2  users     3 Jun  6 14:14 group_vars/
-rw-r--r--   1 user2  users    47 Jun  6 14:14 hosts
-rwxr-xr-x   1 user2  users   114 Jun  6 14:14 install.sh*
-rw-r--r--   1 user2  users    57 Jun  6 14:14 requirements.yml
drwxr-xr-x  11 user2  users    11 Jun  6 14:14 roles/
-rw-r--r--   1 user2  users   342 Jun  6 14:14 rubyvm.yml
-rwxr-xr-x   1 user2  users   479 Jun  6 14:14 setup.sh*


user2 ~/ansible-rubyvm> ls -la deployments
total 5
drwxr-xr-x 2 user2 users   4 Aug 14 14:41 .
drwxr-xr-x 8 user2 users  20 Jun  6 14:29 ..
-rw-r--r-- 1 user2 users   0 Aug 14 14:41 app1.json
-rw-r--r-- 1 user2 users 212 Jun  6 14:14 sample_deployment.json


If I do a "mount | grep home" to find the current mount options I find the following differences between the account that works and the one that doesn't:

working-server-nfs:/vmgr/home05/user1 on /network/home/user1
 
(nfs,
  nodev
,
  automounted
,
  nobrowse
)

broken
-server-nfs:/vmgr/home06/user2 on /network/home/user2 type nfs
 
(rw,
  relatime
,
  vers
=3,
  rsize
=1048576,
  wsize
=1048576,
  namlen
=255,
  hard
,
  noacl
,
  noresvport
,
  proto
=tcp,
  timeo
=600,
  retrans
=2,
  sec
=sys,
  mountaddr
=x.x.x.x,
  mountvers
=3,
  mountport
=yyyyy,
  mountproto
=udp,
  local_lock
=none,
  addr
=x.x.x.x)



The only thing that stands out to me is perhaps the "noacl" option which disables use of a possible NFSACL sideband protocol (if available).  In general, the one that doesn't work seems to have a lot of "performance optimizations" (such as local_lock=none).  It's also possible that POSIX assumptions in python and/or find.py rely on certain features that these options remove.

Is there anything obvious here?

Thanks!


Larry Kyrala

unread,
Aug 14, 2019, 3:34:28 PM8/14/19
to Ansible Project
Correction, it turns out I was comparing mount options within the debian VM to those on the Mac.  When comparing mount options between both environments, they are identical, except that one works and the other doesn't.

Trying to gather more information because this doesn't make any sense.

Sorry!

Larry Kyrala

unread,
Aug 14, 2019, 3:51:21 PM8/14/19
to Ansible Project
Ok, I just ran the deploy script on my machine (which works, with exactly the same files).  Here are the logs from the working machine:


$ ansible-playbook deploy.yml --ask-become-pass -vvv
ansible
-playbook 2.8.0
  config file
= /network/home/user1/ansible-rubyvm/ansible.cfg
  configured
module search path = [u'/home/user1/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']

  ansible python
module location = /usr/lib/python2.7/dist-packages/ansible
  executable location
= /usr/bin/ansible-playbook
  python version
= 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
Using /network/home/user1/ansible-rubyvm/ansible.cfg as config file
BECOME password
:
host_list declined parsing
/network/home/user1/ansible-rubyvm/hosts as it did not pass it's verify_file() method
script declined parsing /network/home/user1/ansible-rubyvm/hosts as it did not pass it'
s verify_file() method
auto declined parsing /network/home/user1/ansible-rubyvm/hosts as it did not pass it's verify_file() method
Parsed /network/home/user1/ansible-rubyvm/hosts inventory source with ini plugin

...
TASK [deploy : find deployments] ****************************************************************************************************************************************************************************************************************************
task path: /network/home/user1/ansible-rubyvm/roles/deploy/tasks/main.yml:2
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: user1
<localhost> EXEC /bin/sh -c '
( umask 77 && mkdir -p "` echo /tmp/${USER}/ansible/ansible-tmp-1565811528.74-58741451285526 `" && echo ansible-tmp-1565811528.74-58741451285526="` echo /tmp/${USER}/ansible/ansible-tmp-1565811528.74-58741451285526 `" ) && sleep 0'
Using module file /usr/lib/python2.7/dist-packages/ansible/modules/files/find.py
<localhost> PUT /home/user1/.ansible/tmp/ansible-local-9685YDWEj4/tmpUMTB0r TO /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py
<localhost> EXEC /bin/sh -c '
chmod u+x /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/ /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py && sleep 0'
<localhost> EXEC /bin/sh -c '
sudo -H -S  -p "[sudo via ansible, key=fnjawawcivjycahoaasqjgttvmdjzobb] password:" -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-fnjawawcivjycahoaasqjgttvmdjzobb ; /usr/bin/python /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c '
rm -f -r /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/ > /dev/null 2>&1 && sleep 0'

ok: [localhost] => {
    "changed": false,
    "examined": 2,
    "files": [
        {
            "atime": 1565811338.5080934,
            "ctime": 1565811463.5763762,
            "dev": 54,
            "gid": 101,
            "gr_name": "users",
            "inode": 88137,
            "isblk": false,
            "ischr": false,
            "isdir": false,
            "isfifo": false,
            "isgid": false,
            "islnk": false,
            "isreg": true,
            "issock": false,
            "isuid": false,
            "mode": "0644",
            "mtime": 1565811437.9758408,
            "nlink": 1,
            "path": "deployments/app1.json",
            "pw_name": "user1",
            "rgrp": true,
            "roth": true,
            "rusr": true,
            "size": 281,
            "uid": 5954,
            "wgrp": false,
            "woth": false,
            "wusr": true,
            "xgrp": false,
            "xoth": false,
            "xusr": false

        }
    ],
    "invocation": {
        "module_args": {
            "age": null,
            "age_stamp": "mtime",
            "contains": null,
            "depth": null,
            "excludes": [
                "sample_deployment.json"
            ],
            "file_type": "file",
            "follow": false,
            "get_checksum": false,
            "hidden": false,
            "paths": [
                "deployments/"
            ],
            "patterns": [
                "*.json"
            ],
            "recurse": false,
            "size": null,
            "use_regex": false
        }
    },
    "matched": 1,
    "msg": ""
}



Maybe this explains why one works and the other fails, but I need help understanding where to look?

Thanks!

Kai Stian Olstad

unread,
Aug 14, 2019, 4:16:53 PM8/14/19
to ansible...@googlegroups.com
On 14.08.2019 21:12, Larry Kyrala wrote:
> "matched": 0,
> "msg": "deployments/ was skipped as it does not seem to be a valid
> directory or it cannot be accessed\n"
> }
>
>
>
>
> # script that is failing - deploy/main.yml:
>
>
> - name: find deployments
> find:
> paths: deployments/
> patterns: "*.json"
> excludes: "sample_deployment.json"
> register: files_matched
>
>
>
>
> # directory permissions from within the VM:

It can be a few possibilities, since you are using relative path it
might not search in correct paths.
The easiest way to see what happens might be strace on the failing
machine

strace -f ansible-playbook deploy.yml --ask-become-pass 2>&1 | grep
deployments

This will show every system call against deployments


The find module stops after it test deployments/ is a directory with
os.path.isdir, you can test it with this code and is should print True,
since you are using become you should run this with sudo.

sudo -H python -c 'import os; print os.path.isdir("deployments/")'


--
Kai Stian Olstad

Larry Kyrala

unread,
Aug 15, 2019, 2:08:21 PM8/15/19
to Ansible Project
Thanks Kai, I tried and narrowed down the following:

# strace on isdir


$ diff nfs_
{bad,good}.log
1,2c1,2
< user2 ~> sudo -H strace -f python -c 'import os; print os.path.isdir("nfsdir")'
< execve("/usr/bin/python", ["python", "-c", "import os; print os.path.isdir(\""...], [/* 17 vars */]) = 0
---
> user1 ~$ sudo -H strace -f python -c 'import os; print os.path.isdir("nfsdir")'
> execve("/usr/bin/python", ["python", "-c", "import os; print os.path.isdir(\""...], [/* 14 vars */]) = 0
762,765c762,765
< stat("nfsdir", 0xADDRESS)         = -1 EACCES (Permission denied)
< fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
< write(1, "False\n", 6False
< )                  = 6
---
> stat("nfsdir", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0
> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 7), ...}) = 0
> write(1, "True\n", 5True
> )                   = 5


So isdir is failing on the bad system, specifically a stat call is failing.

And running stat directly confirms this:

# stats on dirs


$ diff
{bad,good}_stat.log
1,2c1,9
< user2 ~> sudo stat nfsdir
< stat: cannot stat '/home/user2/nfsdir': Permission denied
---
> user1 ~$ sudo stat nfsdir
>   File: nfsdir
>   Size: 4           Blocks: 3          IO Block: 8192   directory
> Device: 36h/54d Inode: 384033      Links: 2
> Access: (0755/drwxr-xr-x)  Uid: ( 5954/ user1)   Gid: (  101/   users)
> Access: 2019-06-06 14:14:23.025878969 -0400
> Modify: 2019-08-14 15:37:43.576408706 -0400
> Change: 2019-08-14 15:37:43.576408706 -0400
>  Birth: -


So the "bad" machine is setup to not allow sudo to be able to see the nfs mount for the network user "user2".

Larry Kyrala

unread,
Aug 15, 2019, 4:15:31 PM8/15/19
to Ansible Project
The other user's home directory didn't have execute privileges. 

Thanks for your help and patience!
Reply all
Reply to author
Forward
0 new messages