Can STITCH run on multiple chromosomes at a time?

89 views
Skip to first unread message

Teresa Pegan

unread,
Feb 27, 2020, 6:04:14 PM2/27/20
to STITCH imputation
Hello,
I am hoping to run STITCH on a genome I assembled to a reference that is organized at the scaffold level, not the chromosome level. (It's a non-model organism). This means that I have thousands of "chromosomes," so running STITCH on all of them would be quite complicated (although maybe more manageable if I only do it on the few hundred "chromosomes" that contain most of the data). I was wondering if there is any way to give STITCH a list of chromosomes to run, instead of just a chromosome name?

Thanks for fielding all my questions so quickly!
-Teresa

Robbie Davies

unread,
Feb 28, 2020, 1:02:14 PM2/28/20
to Teresa Pegan, STITCH imputation
Hi Teresa, 

Sorry, STITCH doesn't currently have that functionality, and it's unlikely I'll add it anytime soon, as it can be handled externally without much extra code, and the external option preserves your opportunity to parallelize as you see fit. Hopefully it shouldn't be too hard to run in either a cluster type environment, or within linux. So for example something like the below, having a file with a list of chromosomes, would allow you loop easily, or parallelize using something like gnu parallel. You could use the same technique on something like grid engine or slurm, on a task array of length the number of chromosomes, then define what chromosome to run within each job. After running something like the below, you could combine variants using something like GATK combine variants, or bcftools, or even using the pseudo-code below. 

Hope that helps,
Best wishes,
Robbie

list_of_chr_file="something"
input=`cat "${list_of_chr_file}"`
chrs=($input)

for chr in $(seq 0 ${#chrs[@]})
do
    STITCH.R --chr=$chr etc
done

## then use any of a variety of methods like gatk combine variants 
## or something hacky like 
giant_output_file="something"
gunzip -c ${first_chr_output_file} | head -n 2 > ${giant_output_file} 
for chr in $(seq 0 ${#chrs[@]})
do
  per_chr_output_file=${outputdir}/"stitch.${chr}.vcf.gz"
  gunzip -c ${per_chr_output_file} | tail -n + 2 >> ${giant_output_file} 
done
bgzip  ${giant_output_file} 
tabix  ${giant_output_file} 


--
You received this message because you are subscribed to the Google Groups "STITCH imputation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stitch-imputat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stitch-imputation/4f79bc68-9195-4dec-a0d9-e10d87ca7684%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages