Hi, Francesco
First of all, I appreciate your quick and helpful reply.
The indexing of the pileup files using tabix by the command you kindly suggested, indeed generated *.tbi index and subsequently, I managed to generate *seqz.gz file using the command below. Thankfully, sequenza-utils could handle this "massive parallelism" about which you concerned.
sequenza-utils \
bam2seqz \
--pileup \
-n ${normal}.pileup \
-t ${tumor}.pileup \
--fasta genome.fa \
-C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT \
--samtools samtools-exec \
--tabix tabix-exec \
-gc genome_primary.gc50Base.wig.gz \
-o ${sample}.seqz.gz
I noticed that the command above generated seqz.gz files separately by chromosome number, such as ${sample}_1.seqz.gz, ${sample}_2.seqz.gz, ${sample}_3.seqz.gz..., ${sample}_MT.seqz.gz, and tabix-indexed these seqz.gz files separately as well. So I first used seqz_binning command to bin the original seqz files by:
for chromosome in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT
do
${sequenza} \
seqz_binning \
--seqz ${sample}_${chromosome}.seqz.gz \
-w 50 \
-o ${sample}_${chromosome}.small.seqz.gz
done
And then, using seqz_merge iteratively (A far better solution to this might exist though. This was all I could come up with) :
sequenza-utils \
seqz_merge \
-o seq1.small.seqz.gz \
-1 ${sample}_1.small.seqz.gz \
-2 ${sample}_2.small.seqz.gz
sequenza-utils \
seqz_merge \
-o seq2.seqz.gz \
-1 seq1.seqz.gz \
-2 ${sample}_3.small.seqz.gz
.
.
.
I could finally generate output.small.seqz file for an input to sequenza analysis in R.
Thank you again for your help, Francesco!
Best regards,
Ryan