Hi,
Few weeks ago we have migrate to cfengine 3.7.4. Everything is running fine except that sometimes a random client is reporting the following error:
warning: Could not move lock database backup into place.
(rename: 'No such file or directory')
I looked in the source code where this error is coming from and I really suspect a small bug in libpromise/lock.c
This error is coming from the function
CopyLockDatabaseAtomically() near the end :
// Make sure changes are persistent on disk, so database cannot get corrupted at system crash.
if (fsync(to_fd) != 0)
{
Log(LOG_LEVEL_WARNING, "Could not sync %s file to disk. (fsync: '%s')", to_pretty_name, GetErrorStr());
goto cleanup_4;
}
close(to_fd);
if (rename(tmp_file_name, to) != 0)
{
Log(LOG_LEVEL_WARNING, "Could not move %s into place. (rename: '%s')", to_pretty_name, GetErrorStr());
goto cleanup_3;
}
// Finished.
goto cleanup_3; -> PROBLEM IS HERE, it should be cleanup_2 not cleanup_3
cleanup_4:
close(to_fd);
cleanup_3:
unlink(tmp_file_name);
cleanup_2:
close(from_fd);
cleanup_1:
free(tmp_file_name);
}
The problem is just the "goto cleanup_3", it should be "goto cleanup_2". After the rename is successfull, then tmp_file_name does not exist anymore, so the unlink is always failling.
But because there is no return code test, it is not seen (you can see it by using strace).
In some case if multiple thread are using this function, then it happen that one thread is removing the file used by the other resulting in the rename failing.
So in conclusion the real finish state must be cleanup_2. It appears that this issue still exist even in 3.9.X.
Cheers.