Dear Marc,
apologies for the late reply.
Looking at your code, it seems that you run the python script in the wrong way, probably the python script is not terminating immediately as it should, it sound like a bug I haven't though of. I will look into it.
I
usually execute sequenza-utils with pypy instead of python, it will
give you a considerable boost (4x to 6x time faster in my experience),
especially for really big files.
I would run something similar:
pypy /data/home/mpx155/R/packages/3.1.0/sequenza/exec/sequenza-utils.py bam2seqz \
--fasta /data/BCI-EvoCa/william/referenceHG19/ucsc.hg19.fasta \
-n /data/BCI-EvoCa/marc/gastric_cancers/bamfiles/chr1/original_align/normal.bam \
-t /data/BCI-EvoCa/marc/gastric_cancers/bamfiles/chr1/original_align/tumour.bam \
-gc /data/BCI-EvoCa/marc/refs/hg19.gc5Base.txt.gz \
--chromosome chr1 | \gzip > out_chr1.seqz.gz
You can run multiple chromosomes in parallel, using GNU parallel for example as suggested in the wiki:
https://bitbucket.org/ffavero/sequenza/wiki/Sequenza_Utils#markdown-header-parallelize-the-executionAfter you have a seqz.gz files you can use the binning function.
Alternatively you could add the binning process in the pipeline, but you will lose the "bigger" un-binned files:
pypy /data/home/mpx155/R/packages/3.1.0/sequenza/exec/sequenza-utils.py bam2seqz \
--fasta /data/BCI-EvoCa/william/referenceHG19/ucsc.hg19.fasta \
-n /data/BCI-EvoCa/marc/gastric_cancers/bamfiles/chr1/original_align/normal.bam \
-t /data/BCI-EvoCa/marc/gastric_cancers/bamfiles/chr1/original_align/tumour.bam \
-gc /data/BCI-EvoCa/marc/refs/hg19.gc5Base.txt.gz \
--chromosome chr1 | \
pypy /data/home/mpx155/R/packages/3.1.0/sequenza/exec/sequenza-utils.py binning -w 50 | \
gzip > out_chr1_bin50.seqz.gz
I
usually process all the chromosomes in parallel without binning, and
then merge the chromosome files while binning, obtained a single file for all the genome.
I'll try to update the wiki if something is not clear.
Thanks for trying sequenza :)
Best