Combining Results of bam2seqz (parallel version)

255 views
Skip to first unread message

Erich Peterson

unread,
Feb 2, 2016, 10:34:27 AM2/2/16
to Sequenza User Group
Hi,
I am using the script suggested here to parallelize the execution of bam2seqz. As such, I have a seq.gz file for each chromosome. I am wondering if it is possible to combine them into one seq file before binning and using R to extract and visualize (on a genome wide basis)?

Thanks,
Erich

Erich Peterson

unread,
Feb 2, 2016, 10:44:12 AM2/2/16
to Sequenza User Group
PS - Or I could combine after binning (so I could still use parallel methods to bin each seq file first).

Francesco Favero

unread,
Feb 4, 2016, 9:01:51 PM2/4/16
to Sequenza User Group
Hi Erich,

Well it boils down to bash scripting at this point. I'm considering to write a tool for better manage the parallel thing, as I use it all the time (It speed up several time the process indeed...)

I usually keep the individual chromosomes seqz un-binned, then I concatenate the files with zcat gawk (to remove the extra headers and keep the first one) and pipe into the seqz binning tool and then gzip again.

If you don't care about the chromosome order it would be very straight forward, otherwise it's just a tiny bit longer.

if you need help I can give you more details.

Best

Francesco 

Erich Peterson

unread,
Feb 4, 2016, 11:59:42 PM2/4/16
to Sequenza User Group
Thanks. Yeah, I was able to do something as you described.
Reply all
Reply to author
Forward
0 new messages