150812-06:42:34,549 workflow ERROR:
['Node trim_fslroi failed to run on host YYY.']
150812-06:42:34,559 workflow INFO:
Saving crash info to XXX
150812-06:42:34,559 workflow INFO:
Traceback (most recent call last):
File "XXX/pipeline/plugins/linear.py", line 38, in run
node.run(updatehash=updatehash)
File "XXX/pipeline/engine.py", line 1386, in run
self._run_interface()
File "XXX/pipeline/engine.py", line 2252, in _run_interface
updatehash=updatehash))
File "Xfailed: [YYY] => (item={"rc": 1, "start": "2015-08-12 06:01:52.770879", "warnings": [], ...}
[later in stdout]
Running: ZZZ/a_different_command
150812-06:54:16,889 workflow INFO:
Executing node a_different_command in dir: ZZZ/a_different_coXX/pipeline/engine.py", line 2153, in _collate_results
if node.result.outputs:
AttributeError: 'NoneType' object has no attribute 'outputs'
...ignoring
I discovered that Ansible internally uses a display() helper function to establish a lock before printing, as opposed to the code sample I used that just naively prints. OK, makes sense - we're doing an 8+ hour build that contains several 1+ hour tasks producing a LOT of output, so it makes sense that we'd see this race condition more often than everyone else using it.
However, even establishing a lock with the current runner (obtained via stack inspection, since the callback plugin API doesn't pass it) doesn't seem to prevent this from happening. I haven't had time to debug this further, but AFAICT from reading the code base, there's no support for more advanced control of printing than that.