The GATK group recommends using STAR for alignment prior to running the GATK variant caller on RNASeq data.
Calling variants in RNAseq This is all experimental stuff. I have a few suggestions based on my experience so far with this pipeline that would streamline the process. None of these are show-stoppers.
1. It is necessary to add read groups (RG header and tags) with picard tools. This step could be avoided if STAR had an option to include read groups in the original SAM (soon to be BAM, I hope) file.
2. It is then necessary to run GATK's SplitNCigarReads to split reads into exon segments and convert mapping quality scores from 255 to 60. To quote the GATK article:
At this step we also add one important tweak: we need to reassign mapping qualities, because STAR assigns good alignments a MAPQ of 255 (which technically means “unknown” and is therefore meaningless to GATK). So we use the GATK’s ReassignMappingQuality read filter to reassign all MAPQs to the default value of 60. This is not ideal, and we hope that in the future RNAseq mappers will emit meaningful quality scores, but in the meantime this is the best we can do. In practice we do this by adding the ReassignMappingQuality read filter to the splitter command.
3. The SAM files produced by STAR fail strict validation with picard tools ValidateSamFile, generation this warning for each record:
NM tag (nucleotide differences) is missing
It's possible to ignore this warning, but it would be nice if STAR could produce the value. I see that the STAR SAMs have a nM tag. Is this intended to be NM, or something else? Lower case letters are reserved for local tags.