lineinfile transiently results in empty file

53 views
Skip to first unread message

Ben Watson

unread,
Feb 14, 2017, 2:20:43 PM2/14/17
to Ansible Project
Greetings,

I'm using the devel branch on Ansible Github (e.g. version 2.3) and have run into a curious problem the past few weeks regarding the use of lineinfile that was seemingly bulletproof in prior versions (e.g. 2.0, 2.1, 2.2).  We've moved to 2.3 for needed enhancements.

I say that the problem is transient because it doesn't *always* manifest.  That is, I cannot reliably reproduce the problem.  However, when it does manifest, it results in an empty file.  I'll attempt to describe my running scenario here, however the actual system we use is Internet-disconnected, so I cannot simply copy/paste results here.

I'm using lineinfile to ensure entries exist in /etc/hosts

My play looks like this:

- name: Ensure all machines have /etc/hosts entries
  become: true
  hosts: [machines]
  tasks:

    - name: ensure /etc/hosts has entries
      lineinfile:
        dest: /etc/hosts
        owner: root
        group: root
        mode: u=rw,g=r,o=r
        line: "{{ hostvars[item]['vm_ip'] }}    {{ hostvars[item]['vm_hostname'] }}    {{ hostvars[item]['vm_alias'] }}"
        create: yes
        state: present
        backup: yes
      with_items: "{{ groups['all'] }}"

In other words, I'm looping over my inventory and adding known facts about the inventory as lines in /etc/hosts on [machines].  The known facts are all stored in aptly named files under the host_vars folder.  This seemed pretty trivial and worked well in the past.  However now, some subset of [machines] end up with a 0-byte length /etc/hosts file.  Moreover, it is not consistent on a run-to-run basis.  On the *majority* of runs of the playbook, everything is as expected.  But on *some* runs, certain hosts will have this empty file.

I added the "backup: yes" as a means to troubleshoot and when I do encounter one of these machines with an empty /etc/hosts file, I can see via a directory listing all of the instances of the backups of the file corresponding to each time lineinfile modified the file.  I can even see the file size growing over these instances until at some point, the size goes to 0 bytes and stays there.  For example:

ls -la /etc/hosts

-rw-r--r--. 1 root root   0 Feb 14 18:02 /etc/hosts
-rw-r--r--. 1 root root 225 Sep 16 18:21 /etc/hosts.2753.2017-02-14@18:02:06
-rw-r--r--. 1 root root 278 Feb14 18:02 /etc/hosts.2774.2017-02-14@18:02:06
-rw-r--r--. 1 root root 331 Feb14 18:02 /etc/hosts.2795.2017-02-14@18:02:06
-rw-r--r--. 1 root root   0 Feb14 18:02 /etc/hosts.2816.2017-02-14@18:02:07
-rw-r--r--. 1 root root   0 Feb14 18:02 /etc/hosts.2837.2017-02-14@18:02:07
-rw-r--r--. 1 root root   0 Feb14 18:02 /etc/hosts.2858.2017-02-14@18:02:07
-rw-r--r--. 1 root root   0 Feb14 18:02 /etc/hosts.2879.2017-02-14@18:02:07

When actually exeuting the ansible-playbook, I'm seeing yellow "changed" output thinking that everything is getting updated and is OK in the end, however I end up in the above situation.  Is this a known bug in Ansible 2.3?

Ben Watson

unread,
Feb 28, 2017, 10:22:34 AM2/28/17
to ansible...@googlegroups.com
Bump.  I'm still having this random problem.  Anyone else seen anything like this or have any recommendations?  I tried a more recent pull of Ansible 'devel', but it broke something else (filesystem module).

--
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/SP8KJnwSxWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/569abc84-0b36-4777-8cbb-002a11896566%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Coca

unread,
Feb 28, 2017, 11:14:39 AM2/28/17
to Ansible Project
It sounds like corruption after writing, after you get the first 0
length file it makes sense that it stays that way if it cannot match
the line to change.
what is your filesytem? anything show up in the logs? devices errors?
write errors?


----------
Brian Coca

Ben Watson

unread,
Mar 1, 2017, 8:06:16 AM3/1/17
to ansible...@googlegroups.com
Haven't checked device logs just yet.  These are RHEL7 systems with XFS filesystems.

I've internally chalked this up to a "race condition", where multiple Ansible process forks are trying to update the same file at nearly the same time.  As an experiment, I added the "serial" directive to the play and set it to "1":

Example:

- name: Ensure all machines have /etc/hosts entries
  become: true
  hosts: [machines]
  serial: 1
  tasks:
    - <blah>

I see the difference in Ansible's console output, where it is doing the lineinfile call to one host at a time vice all at once.  I've only run it a few times after this change but have yet to see the empty file problem.  I'll continue running tests to ensure this is a good fix.

v/r

Ben

--
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/SP8KJnwSxWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.

Brian Coca

unread,
Mar 3, 2017, 2:46:54 PM3/3/17
to Ansible Project
Ansible updates the file atomically, so having 'concurrent' updates
will clobber each other, but should not create a corrupt file. IIRC,
XFS does by default 0 out files when there is an I/O error.


----------
Brian Coca
Reply all
Reply to author
Forward
0 new messages