Hi,
I am working on a ruffus bioinformatic pipeline that is intended to run several different branching tasks and handle relatively large (10-100Gb) amounts of data. It runs well on several data sets but has large overhead delays which make debugging difficult. The pipeline seems to spend about 100 seconds in <method 'acquire' of '_thread.lock' object>, which constitutes about 95% of the total run time for a small test dataset.
I could provide more details about the pipeline structure if that is relevant and necessary, briefly it takes a few raw data files, splits them, processes them, merges the output and runs a few more downstream functions. If anyone has come across similar issues I would appreciate leads and tools on how to debug and improve that, because I don't know how I can approach the problem as it is.
Thanks,
Itai