Split by chromosome - downstream effects?

101 views
Skip to first unread message

John Lin

unread,
Sep 3, 2014, 4:33:35 PM9/3/14
to bissn...@googlegroups.com
Hi Yaping,

We are working with a set of public whole genome bisulfite sequencing data, which we have aligned using Bismark. The data set came in separate runs, so instead of merging all these together, we aligned separately and split each alignment by chromosome. We then merged the splitted alignments by chromosome using the merge function from samtools. We are processing by chromosome in order to run things in parallel and decrease processing time.

1) When using BISSNP sortByRefAndCor - we've noticed that the time for processing is exceptionally long when using the whole genome as the reference and seems to occur while "Loading the Tribble Index" with an estimated time remaining of 4-5 days. Because of this, we have replaced the genome for use with sortByRefAndCor (as well as in the step following this for filter - see code below) with just the genome of the chromosome of interest. This greatly reduces the time to about 4 hours. Is this a legitimate way to do this or are there other things we need to consider? I am asking because of steps such as ReorderSam from Picard tools and other steps in the BISSNP pipeline use the whole genome - not chromosome 1.

2) If this is not a valid way, can you suggest an alternative way to run through the pipeline by chromosome? Previously I replaced every instance of the genome used with just the genome of chromosome 1 and ran into some errors with contigs - (I believe since the header of the bam file contained contigs that the reference genome - in this case that of chromosome 1 - did not)

Apologies if that was not clear, but I have put the code corresponding to the first method below. Please let me know if you need any more information.

Thanks,
John

-- -- -- -- --
#Script for chromosome 1
#Sort by coordinate
~/jre1.7.0_25/bin/java -Xmx8g -jar ~/picard-tools-1.97/picard-tools-1.97/SortSam.jar INPUT='chr1_sorted.bam' OUTPUT='chr1_sorted.bam_psorted' SORT_ORDER=coordinate TMP_DIR=~/tmp

#Remove duplicates
~/jre1.7.0_25/bin/java -Xmx8g -jar ~/picard-tools-1.97/picard-tools-1.97/MarkDuplicates.jar INPUT='chr1_sorted.bam_psorted' OUTPUT='chr1_sorted.bam_psorted_dedup' REMOVE_DUPLICATES=TRUE METRICS_FILE=~/'chr1_sorted.bam_psorted_dedup_metrics.txt' TMP_DIR=~/temp

#Reorders according to contigs in reference file
~/jre1.7.0_25/bin/java -Djava.io.tmpdir=~/temp -jar -Xmx8g ~/picard-tools-1.97/picard-tools-1.97/ReorderSam.jar INPUT='chr1_sorted.bam_psorted_dedup' OUTPUT='chr1_sorted.bam_psorted_dedup_R.bam' REFERENCE=~/'genome_ref/human_g1k_v37.fasta'

~/jre1.7.0_25/bin/java -Xmx8g  -jar ~/picard-tools-1.97/picard-tools-1.97/BuildBamIndex.jar INPUT='chr1_sorted.bam_psorted_dedup_R.bam'  TMP_DIR=~/temp

#Tagging origin of reads
~/jre1.7.0_25/bin/java -Xmx8g -Djava.io.tmpdir=~/temp -jar ~/picard-tools-1.97/picard-tools-1.97/AddOrReplaceReadGroups.jar I='chr1_sorted.bam_psorted_dedup_R.bam' O='chr1_sorted.bam_psorted_dedup_RR.bam' ID='glia_chr'$CHR LB='glia_chr'$CHR PL=illumina PU=run SM='glia_chr'$CHR CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT SORT_ORDER=coordinate

#BISSNP
~/jre1.7.0_25/bin/java -Djava.io.tmpdir=~/temp -jar -Xmx10g ~/BisSNP/BisSNP-0.82.2.jar  -R  ~/'genome_ref/human_g1k_v37.fasta'   -I 'chr1_sorted.bam_psorted_dedup_RR.bam' -T BisulfiteCountCovariates -knownSites ~/BisSNP/dbsnp_137.b37.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate  -recalFile 'chr1_sorted.bam_psorted_dedup_RR_recal.csv' -nt 12

~/jre1.7.0_25/bin/java -Djava.io.tmpdir=~/temp -jar -Xmx10g ~/BisSNP/BisSNP-0.82.2.jar  -R  ~/'genome_ref/human_g1k_v37.fasta'   -I 'chr1_sorted.bam_psorted_dedup_RR.bam'  -o 'chr1_sorted.bam_psorted_dedup_RR_recal.bam' -T BisulfiteTableRecalibration -recalFile 'chr1_sorted.bam_psorted_dedup_RR_recal.csv' -maxQ 40

~/jre1.7.0_25/bin/java -Xmx10g -Djava.io.tmpdir=~/temp -jar ~/BisSNP/BisSNP-0.82.2.jar -T BisulfiteGenotyper -R ~/'genome_ref/Homo_sapiens.GRCh37.74.dna.chromosome.'$CHR'.fa' -D ~/BisSNP/dbsnp_137.b37.vcf  -I 'chr1_sorted.bam_psorted_dedup_RR_recal.bam' -vfn1 'chr1_sorted.bam_psorted_dedup_RR_raw.vcf' -mmq 30 -mbq 0 -stand_call_conf 30 -nt 12 -out_modes EMIT_HET_SNPS_ONLY

#SortByRefAndCor
perl ~/BisSNP/sortByRefAndCor.pl --k 1 --c 2 --tmp ~/temp  'chr1_sorted.bam_psorted_dedup_RR_raw.vcf'  ~/'genome_ref/Homo_sapiens.GRCh37.74.dna.chromosome.1.fa.fai' > 'chr1_sorted.bam_psorted_dedup_RR_raw_sort.vcf'

~/jre1.7.0_25/bin/java  -Xmx10g -Djava.io.tmpdir=~/temp -jar ~/BisSNP/BisSNP-0.82.2.jar -R ~/'genome_ref/Homo_sapiens.GRCh37.74.dna.chromosome.1.fa'  -T VCFpostprocess -oldVcf 'chr1_sorted.bam_psorted_dedup_RR_raw_sort.vcf'   -newVcf 'chr1_sorted.bam_psorted_dedup_RR_filtered_sort.vcf'  -snpVcf 'chr1_sorted.bam_psorted_dedup_RR_raw_sort.vcf'   -o 'chr1_sorted.bam_psorted_dedup_RR_snp_summary.txt'

ping

unread,
Sep 4, 2014, 6:43:20 PM9/4/14
to bissn...@googlegroups.com
Hi John,
Thanks for you interest in our software. i noticed that you are using Java 1.7 (java 7), which will lead the problem for Bis-SNP java program: it will get stuck in "Loading the Tribble Index" ...
Another potential problem: dbsnp_137.b37.vcf need to sorted as chromosome order. 

To run it chromosome by chromosome, you just need to add one more option:  -L chr1
this will enable the program just work in the chromsome you would like to focus.

Let me know if you have further question

yaping
 



--
You received this message because you are subscribed to the Google Groups "bissnp-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bissnp-help...@googlegroups.com.
To post to this group, send email to bissn...@googlegroups.com.
Visit this group at http://groups.google.com/group/bissnp-help.
For more options, visit https://groups.google.com/d/optout.



--
Yaping Liu Ph.D.

Postdoctoral Associate
Manolis Kellis Lab
Computer Science and Artificial Intelligence Lab (CSAIL)
Massachusetts Institute of Technology
Broad Institute of MIT and Harvard


John Lin

unread,
Sep 4, 2014, 8:21:49 PM9/4/14
to bissn...@googlegroups.com
Hi Yaping,

Thanks very much for the advice!

-John
Reply all
Reply to author
Forward
0 new messages