Why did reads map to introns when I set RemoveNoncanonicalUnannotated?

593 views

Skip to first unread message

Sirian

unread,

Feb 24, 2015, 3:49:02 PM2/24/15

to rna-...@googlegroups.com

I'm trying to use STAR to map human RNA-seq reads with UCSC gtf, for GATK variant calling analysis.

Here are my command-lines:

Generate genome:

${STARdir}/STAR --runMode genomeGenerate --genomeDir ${genomeDir} --genomeFastaFiles ${hg19fasta} --sjdbGTFfile ${ucscgtf} --sjdbOverhang 99 --runThreadN 5 --outStd Log

Align:

${STARdir}/STAR --genomeDir ${genomeDir} --readFilesCommand zcat --outReadsUnmapped Fastx --readFilesIn ${rawfastaDir}/Sample_${seq}_R1.fastq.gz ${rawfastaDir}/Sample_${seq}_R2.fastq.gz --runThreadN 5 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --outSAMstrandField intronMotif --outFileNamePrefix ${seq}_STARmapped_ --outStd Log

I though "RemoveNoncanonicalUnannotated" will remove the alignments in the intron regions based on the gtf file. However, I still have a lot of alignment in introns. Did I miss a filter?

By the way, the purpose of this alignment is for variant calling by GATK. Is there any other recommended parameter setting other than what I have above? For example, STAR by default allows up to 10 mismatches. Should I change this? My data is 100bp PE reads.

Thanks.

Alexander Dobin

unread,

Feb 26, 2015, 3:11:40 PM2/26/15

to rna-...@googlegroups.com

Hi Sirian,

--outFilterIntronMotifs RemoveNoncanonicalUnannotated prohibits reads that contain unannotated splice junctions with non-canonical intron motifs. It does not check for reads that map to the introns - this type of filtering has to be done post-mapping. There are a few parameters that you can use to simplify the GATK pipeline, here is the discussion.
A few other parameters to think about:

outFilterType BySJout //reduces the number of "spurious" junctions

alignSJoverhangMin 8 //min overhang for unannotated junctions

alignSJDBoverhangMin 1 //min overhang for annotated junctions

alignIntronMin 20 //min intron

alignIntronMax 1000000 //max intron

alignMatesGapMax 1000000 //max genomic distance between mates

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages