I'm trying to use STAR to map human RNA-seq reads with UCSC gtf, for GATK variant calling analysis.
Here are my command-lines:
Generate genome:
${STARdir}/STAR --runMode genomeGenerate --genomeDir ${genomeDir} --genomeFastaFiles ${hg19fasta} --sjdbGTFfile ${ucscgtf} --sjdbOverhang 99 --runThreadN 5 --outStd Log
Align:
${STARdir}/STAR --genomeDir ${genomeDir} --readFilesCommand zcat --outReadsUnmapped Fastx --readFilesIn ${rawfastaDir}/Sample_${seq}_R1.fastq.gz ${rawfastaDir}/Sample_${seq}_R2.fastq.gz --runThreadN 5 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --outSAMstrandField intronMotif --outFileNamePrefix ${seq}_STARmapped_ --outStd Log
I though "RemoveNoncanonicalUnannotated" will remove the alignments in the intron regions based on the gtf file. However, I still have a lot of alignment in introns. Did I miss a filter?
By the way, the purpose of this alignment is for variant calling by GATK. Is there any other recommended parameter setting other than what I have above? For example, STAR by default allows up to 10 mismatches. Should I change this? My data is 100bp PE reads.
Thanks.