Small issue in libpromise/lock.c

26 views
Skip to first unread message

bruno vidal

unread,
Nov 23, 2016, 12:17:36 PM11/23/16
to help-cfengine
  Hi,

Few weeks ago we have migrate to cfengine 3.7.4. Everything is running fine except that sometimes a random client is reporting the following error:

warning: Could not move lock database backup into place. (rename: 'No such file or directory')


I looked in the source code where this error is coming from and I really suspect a small bug in libpromise/lock.c

This error is coming from the functionCopyLockDatabaseAtomically() near the end :


    // Make sure changes are persistent on disk, so database cannot get corrupted at system crash.
    if (fsync(to_fd) != 0)
    {
        Log(LOG_LEVEL_WARNING, "Could not sync %s file to disk. (fsync: '%s')", to_pretty_name, GetErrorStr());
        goto cleanup_4;
    }

    close(to_fd);
    if (rename(tmp_file_name, to) != 0)
    {
        Log(LOG_LEVEL_WARNING, "Could not move %s into place. (rename: '%s')", to_pretty_name, GetErrorStr());
        goto cleanup_3;
    }

    // Finished.
    goto cleanup_3;   -> PROBLEM IS HERE, it should be cleanup_2 not cleanup_3

cleanup_4:
    close(to_fd);
cleanup_3:
    unlink(tmp_file_name);
cleanup_2:
    close(from_fd);
cleanup_1:
    free(tmp_file_name);
}

The problem is just the "goto cleanup_3", it should be "goto cleanup_2". After the rename is successfull, then tmp_file_name does not exist anymore, so the unlink is always failling.
But because there is no return code test, it is not seen (you can see it by using strace).
In some case if multiple thread are using this function, then it happen that one thread is removing the file used by the other resulting in the rename failing.
So in conclusion the real finish state must be cleanup_2. It appears that this issue still exist even in 3.9.X.

Cheers.


Dimitrios Apostolou

unread,
Nov 23, 2016, 1:24:31 PM11/23/16
to bruno vidal, help-cfengine
Thanks! I can see that this code is almost completely changed in master branch. But the unlink(tmp_file_name) call is still present in all cleanups.

On Wed, Nov 23, 2016 at 6:17 PM, bruno vidal <bruno....@gmail.com> wrote:

In some case if multiple thread are using this function, then it happen that one thread is removing the file used by the other resulting in the rename failing.
So in conclusion the real finish state must be cleanup_2. It appears that this issue still exist even in 3.9.X.

cf-agent is never multi-treaded. It is however multi-process, so this might indeed happen. Do you think you had multiple agent processes running?

The fact that multiple processes might write to the temp file calls for a unique tmp filename here, I believe (most probably using mkstemp()). And then the redundant unlink won't be dangerous.

What do you think? Can you supply a pull request?


Dimitris

bruno vidal

unread,
Nov 30, 2016, 12:55:43 PM11/30/16
to help-cfengine
  Hi,

I have recompil the cfengine package with this small correction (goto cleanup_2), and we do not have any more this warning message. But yes the use of mkstemp() can also be a good solution too.
Sorry, but where can I request for a correction ?

Cheers.



Aleksey Tsalolikhin

unread,
Nov 30, 2016, 1:09:15 PM11/30/16
to bruno vidal, help-cfengine
Yay!  Bruno, if you want to submit your fix as a pull request, you should be able to do so via https://github.com/cfengine/core/blob/master/libpromises/locks.c  -- there is an edit icon (pencil) you can select to start the editing process, and then once you are done, follow the prompts to send the pull request.

Thanks!
Aleksey

-- 
Need training on CFEngine, Git or Time Management?  Email trai...@verticalsysadmin.com.

--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengine+unsubscribe@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/d/optout.

Dimitrios Apostolou

unread,
Dec 1, 2016, 9:56:19 AM12/1/16
to bruno vidal, help-cfengine, Aleksey Tsalolikhin
You can additionally open a ticket describing the issue and attaching the patch, in our bugtracker:

https://tracker.mender.io/


Dimitris

Reply all
Reply to author
Forward
0 new messages