I reduced the problem in size and the pypeline.log dump below shows the problem (sorry I can upload the files from my work - get a 340 error due to firewall restrictions)
Key to the failure is executing the MapNode with 2 or more files and using the Condor plugin (I hadn't seen this dependency yesterday).
The following configurations give the results noted (they correspond to the information in the log file)
MapNode gzip 1 file and Condor plugin - succeed
MapNode gzip 2 files and Condor plugin - fail
MapNode gzip 2 files - no plugin - succeeds
The crucial part of the log does indeed indicate that the hash is playing a role - after the subnodes (_zipTestFiles0 and _zipTestFiles1 complete) finish an attempt is made to rerun them and the change to the hash is involved:
121212-13:49:24,989 workflow INFO:
[Job finished] jobname: _zipTestFiles0 jobid: 2
121212-13:49:25,26 workflow INFO:
[Job finished] jobname: _zipTestFiles1 jobid: 3
121212-13:49:25,28 workflow INFO:
Submitting 1 jobs
121212-13:49:25,28 workflow INFO:
Executing: zipTestFiles ID: 1
121212-13:49:44,457 workflow DEBUG:
networkx 1.4 dev or higher detected
121212-13:49:44,460 workflow INFO:
Executing node zipTestFiles in dir: /tmp/tmpEJPykN/TestZip/zipTestFiles
121212-13:49:44,461 workflow DEBUG:
setting hashinput input_file-> ['/home/bvanlew/tmp/EtaOin0.txt', '/home/bvanlew/tmp/EtaOin1.txt']
121212-13:49:44,461 workflow DEBUG:
Node hash: c59d2151caedebee82b7a10d7c931023
121212-13:49:44,461 workflow DEBUG:
/tmp/tmpEJPykN/TestZip/zipTestFiles/_0xc59d2151caedebee82b7a10d7c931023_unfinished.json found and can_resume is True or Node is a MapNode - resuming execution
121212-13:49:44,462 workflow DEBUG:
writing pre-exec report to /tmp/tmpEJPykN/TestZip/zipTestFiles/_report/report.rst
121212-13:49:44,463 workflow DEBUG:
setting input 0 input_file /home/bvanlew/tmp/EtaOin0.txt
121212-13:49:44,464 workflow DEBUG:
Setting node inputs
121212-13:49:44,464 workflow INFO:
Executing node _zipTestFiles0 in dir: /tmp/tmpEJPykN/TestZip/zipTestFiles/mapflow/_zipTestFiles0
121212-13:49:44,464 workflow DEBUG:
Node hash: 084859a03d2d2df961346df37dc8e50c
121212-13:49:44,465 workflow DEBUG:
Rerunning node
121212-13:49:44,465 workflow DEBUG:
updatehash = False, self.overwrite = None, self._interface.always_run = False, os.path.exists(/tmp/tmpEJPykN/TestZip/zipTestFiles/mapflow/_zipTestFiles0/_0x084859a03d2d2df961346df37dc8e50c.json) = False, hash_method = timestamp
121212-13:49:44,465 workflow DEBUG:
Previous node hash = a754c7dec9a6c06f2f1f53df81206c50
121212-13:49:44,466 workflow DEBUG:
values differ in fields: input_file: '/home/bvanlew/tmp/EtaOin0.txt' != [u'/home/bvanlew/tmp/EtaOin0.txt', u'2452a4cf598b13d7c663fa6c7375c763']
121212-13:49:45,403 workflow ERROR:
['Node zipTestFiles failed to run on host lkeb-gisela01.']
Cheers
Baldur