Why did reads map to introns when I set RemoveNoncanonicalUnannotated?

593 views
Skip to first unread message

Sirian

unread,
Feb 24, 2015, 3:49:02 PM2/24/15
to rna-...@googlegroups.com
I'm trying to use STAR to map human RNA-seq reads with UCSC gtf, for GATK variant calling analysis.
Here are my command-lines:

Generate genome:
${STARdir}/STAR --runMode genomeGenerate --genomeDir ${genomeDir} --genomeFastaFiles ${hg19fasta} --sjdbGTFfile ${ucscgtf} --sjdbOverhang 99 --runThreadN 5 --outStd Log

Align:
${STARdir}/STAR --genomeDir ${genomeDir} --readFilesCommand zcat --outReadsUnmapped Fastx --readFilesIn ${rawfastaDir}/Sample_${seq}_R1.fastq.gz ${rawfastaDir}/Sample_${seq}_R2.fastq.gz --runThreadN 5 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --outSAMstrandField intronMotif --outFileNamePrefix ${seq}_STARmapped_ --outStd Log

I though "RemoveNoncanonicalUnannotated" will remove the alignments in the intron regions based on the gtf file. However, I still have a lot of alignment in introns. Did I miss a filter?

By the way, the purpose of this alignment is for variant calling by GATK. Is there any other recommended parameter setting other than what I have above? For example, STAR by default allows up to 10 mismatches. Should I change this? My data is 100bp PE reads.

Thanks.

Alexander Dobin

unread,
Feb 26, 2015, 3:11:40 PM2/26/15
to rna-...@googlegroups.com
Hi Sirian,

--outFilterIntronMotifs RemoveNoncanonicalUnannotated prohibits reads that contain unannotated splice junctions with non-canonical intron motifs. It does not check for reads that map to the introns - this type of filtering has to be done post-mapping. There are a few parameters that you can use to simplify the GATK pipeline, here is the discussion.
A few other parameters to think about:

outFilterType                                     BySJout      //reduces the number of "spurious" junctions
alignSJoverhangMin                       8                   //min overhang for unannotated junctions
alignSJDBoverhangMin                  1                  //min overhang for annotated junctions
alignIntronMin                                  20               //min intron
alignIntronMax                                 1000000    //max intron
alignMatesGapMax                        1000000    //max genomic distance between mates


Cheers
Alex
Reply all
Reply to author
Forward
0 new messages