Hi JC,
I run the 2-pass scheme on your data and cannot replicate the behavior you observed. I get very small decrease of the unique mapping rate in the 2nd pass, and all of it due to increase in multi-mapping rate:
1 Uniquely mapped reads % | 93.22%
2 Uniquely mapped reads % | 93.13%
1 % of reads mapped to multiple loci | 4.49%
2 % of reads mapped to multiple loci | 4.61%
1 % of reads unmapped: too short | 2.23%
2 % of reads unmapped: too short | 2.20%
How did you setup your 2-pass runs?
Here is what I did:
1-pass run:
genome files:
STAR --runMode genomeGenerate --runThreadN 6 --genomeDir Genome1 --genomeFastaFiles hg19_fasta/chr{[0-9],[0-9][0-9],X,Y,M,[0-9]_gl*,[0-9][0-9]_gl*,Un_gl*}.fa --sjdbGTFfile human_refGene.UCSC2013-03-06-11-23-03.hg19.gtf --sjdbOverhang 100
mapping:
STAR --genomeDir Genome1/ --genomeLoad LoadAndKeep --readFilesIn SampleTest_R1_trimmed.1M.fastq.gz SampleTest_R2_trimmed.1M.fastq.gz --readFilesCommand zcat
2-pass run:
prepare splice junctions:
awk 'BEGIN {OFS="\t"; strChar[0]="."; strChar[1]="+"; strChar[2]="-";} {if($5>0){print $1,$2,$3,strChar[$4]}}' ../SampleTest.trimQ20.STAR-2pass.sjbd99.SJ.out.tab > SJin.tab
genome files:
STAR --runMode genomeGenerate --runThreadN 6 --genomeDir Genome2 --genomeFastaFiles hg19_fasta/chr{[0-9],[0-9][0-9],X,Y,M,[0-9]_gl*,[0-9][0-9]_gl*,Un_gl*}.fa --sjdbGTFfile human_refGene.UCSC2013-03-06-11-23-03.hg19.gtf --sjdbOverhang 100 --sjdbFileChrStartEnd SJin.tab
mapping:
STAR --genomeDir Genome2/ --genomeLoad LoadAndKeep --readFilesIn SampleTest_R1_trimmed.1M.fastq.gz SampleTest_R2_trimmed.1M.fastq.gz --readFilesCommand zcat
Cheers
Alex