--outSAMmapqUnique

773 views
Skip to first unread message

matt...@gmail.com

unread,
Feb 23, 2016, 5:09:09 PM2/23/16
to rna-...@googlegroups.com

So STAR aligner has an option, by default, (--outSAMmapqUnique) to assign 255 as MQ for unique mappers.


${STAR} --runMode alignReads --twopassMode Basic --runThreadN 24 --outSAMtype BAM SortedByCoordinate --outSAMattributes All --outFileNamePrefix /
"${file1%_1.fastq}_tsta" --outSAMmapqUnique 255 --sjdbGTFfile "${sjdb}" --genomeDir "${STAR_index}" --readFilesIn "${file7}" "${file8}"


I had overlooked this and, from my guess, is causing majority of my reads to be filtered using GATK MuTect2:


java -jar -Xmx32g ${GATK} MuTect2 -R "${reference}" -I:tumor "${inpT}" -I:normal "${inpN}" --dbsnp "${dbSNP_1}" --cosmic "${COSMIC_1}" /
-L "${interval}" --filter_reads_with_N_cigar --out "${varCall}"

Result:


MicroScheduler - 18076 reads were filtered out during the traversal out of approximately 18167 total reads (99.50%)
MicroScheduler --> 0 reads (0.00% of total) failing BadCigarFilter
MicroScheduler --> 1601 reads (8.81% of total) failing DuplicateReadFilter
MicroScheduler --> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
MicroScheduler --> 11 reads (0.06% of total) failing MalformedReadFilter
MicroScheduler --> 16384 reads (90.19% of total) failing MappingQualityUnavailableFilter
MicroScheduler --> 80 reads (0.44% of total) failing NotPrimaryAlignmentFilter
MicroScheduler --> 0 reads (0.00% of total) failing UnmappedReadFilter


Conceptually, I am wondering, for each read base, a unique MQ is assigned, correct? If so is there a way to preserve MQ for each of them? Is it as simple as removing the --outSAMmapqUnique filter? I'd like to preserve MQ to select variants against for annotations.


As always, thank you for your time and help.

Alexander Dobin

unread,
Feb 24, 2016, 10:56:26 AM2/24/16
to rna-star
Hi Matt,

the GATK RNA-seq best practices recommend converting the 255 mapping quality into 60, so you could simply run it with --outSAMmapqUnique 60.
The per base quality score (QS) defines the probability that the base is wrong Pbase=10^(-QS/10) .
The mapping quality defines the probability that the alignment is wrong in Palign=10^(-MQ/10).
There is no simple relationship between QSs and MQ for a read.

STAR at the moment uses a very simple scheme: Palign=1-1/Nmult, where Nmult is the number of loci the read maps to.
For unique mappers, Palign=0, and MQ defaults to 255.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages