We encountered this issue a few years ago, on various rundeck 3 versions. Current testing is being performed on 4.14.2. We are on RHEL7 and use mariadb as our database. RPM rundeck installation.
We were seeing this issue intermittently on jobs that used Node Step Inline Script steps, that were set to run on the local server node. These are used to allow us to run a step on the rundeck server per node with access to the node parameters.
After a lot of back and forth with rundeck support we eventually confirmed that setting enable-sync=true on the localhost node resolved the issue.
After a recent hardware upgrade and mandated security software upgrade on our servers, we have noticed that the sync is causing massive performance issues. For example. we have a job that runs across around 20 nodes takes around 6 minutes with enable-sync=false, but takes around an hour with enable-sync=true.
After looking at the source code:
we have been testing with enable-sync=false and
file-busy-err-retry=true in hopes that this would prevent the text file busy issue while also disabling the sync command. This did not wok.
The error message that file-busy-err-retry is looking for is:
public static final String MESSAGE_ERROR_FILE_BUSY_PATTERN = "Cannot run program.+: error=26.*";
I also tried setting
file-copy-destination-dir="/var/lib/rundeck/rundeckTmpfs"
and mounting a tmpfs filesystem. I would have thought setting this along with enable-sync=false could workaround our problem, as the tmpfs location would be in memeory and the sync shouldn't be needed/have no effect. This also didnt fix the issue. We still see intermittent text file busy failures.
We believe the performance issues we are seeing with enable-sync=true are casued by calling sync without a filename, causing all pending changes for all mounted filesystems for the current user to be sync'd to disk.
I have downloaded the source code for 4.14.2 and modifed the sync call so that it is called with a filename:
Map<String, String> nodeAttribute = node.getAttributes();
if(BooleanUtils.toBoolean(nodeAttribute.get(NODE_ATTR_ENABLE_SYNC_COMMAND))) {
//perform sync to prevent the file from being busy when running
final NodeExecutorResult nodeExecutorSyncResult = framework.getExecutionService().executeCommand(
context, ExecArgList.fromStrings(featureQuotingBackwardCompatible , false, "sync", filepath), node);
if (!nodeExecutorSyncResult.isSuccess()) {
return nodeExecutorSyncResult;
}
}
We are testing this custom build now and the performance is on par with enable-sync=false in our environment.
We have no way of reproducing the text file busy issue, but are currently monitoring for any occurences.
I have seen a few discussions of the text file busy issue occuring when the chmod u+x is performed just before execution, most suggest calling sync before executing, but some also suggest sleep. So I am worried that even calling sync with a filename might not fix this for us as it is just hiding a timing issue and sync with a filename might be too quick compared to sync without a filename.
Is there any reason sync is called without a filename in the rundeck code? It seems like this could be a good performance improvement in general if it can be called specifically for the file being worked on.
Any help is greatly appreciated, we have been playing whack-a-mole with this bug for years :)