Hello, Neva,
Thanks for your development of Juicer, and it's been an awesome tool for processing HiC data!
While I am using this for my project, I encountered some minor issues. Although these have not prevented generating results, I would like to get some inputs from you:
1. The Juicer.sh pipeline seems to generate large sam files while aligning the reads. This caused some 'out of disk space' errors for samples I was working on, which yielded 400M reads. The sam files stretched up to 400-500GB, and the total size of the work directory exceeded 1TB. Has the Juicer team been considering piping the series of commands, or using binary files to ease the I/O burden?
2. Current Juicer pipeline deployed data processing functions in several sh/awk/perl/Java jar files, usually found in the 'common' folder. Are there any virtual environment, such as Conda, Docker, or Singularity instances available? Since, nowadays, reproducibility is one of major issues, having a version control could help to address whether the results are reproducible.
3. The current JuicerTools works with CUDA 8.0 only, while I have CUDA 11.1 on my machine, JuicerTool failed to detect GPU. How do I leverage the GPU resources then, instead of using HiCCUPS CPU? I admit my Java programming sucks, and I found it a little bit tricky in re-compiling jar files by including the newest version of JCuda.jar. Any clues in how this can be done?
I really appreciate your commitment in maintaining this software. Thank you so much in advance!
Kind regards,
Johnson