Dear all,
I have a question regarding STAR's behaviour with discordant alignments. Essentially, I would like to know whether there is any way to prevent STAR from outputting discordant alignments where each read of a pair aligns to a different chromosome. What I would like to do is to fetch all alignments for my reads, mark all top-scoring alignments as primary (in particular all best scoring multimappers) so that I can retain them later by sorting out secondaries. My STAR command is:
STAR --runMode alignReads --runThreadN $cores --genomeDir $genome_dir --readFilesIn $temp_d/$id.all_clean_1P.fastq.gz $temp_d/$id.all_clean_2P.fastq.gz --readFilesCommand zcat --outFilterType BySJout --outFilterMultimapNmax 100 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outFileNamePrefix $temp_d/ --outMultimapperOrder Random --outSAMprimaryFlag AllBestScore --outSAMstrandField intronMotif --runRNGseed 23 --outSAMtype BAM Unsorted --quantMode GeneCounts --twopassMode Basic
And the flags of the resulting bam file (after removing secondary aligments) are:
16934940 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
16934940 + 0 mapped (100.00% : N/A)
16934940 + 0 paired in sequencing
8467470 + 0 read1
8467470 + 0 read2
12050972 + 0 properly paired (71.16% : N/A)
16934940 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
4145760 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Is there any way to tell STAR not to output these 4145760 reads that align to different chromosomes? Interestingly this only happens with the multimapping reads, as after filtering them out (samtools view -q 255), these discordant reads go away completely:
11700850 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
11700850 + 0 mapped (100.00% : N/A)
11700800 + 0 paired in sequencing
5850400 + 0 read1
5850400 + 0 read2
11700800 + 0 properly paired (100.00% : N/A)
11700800 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I was under the impression that chimeric reads were only output by star when told to do so with the respective parameters? I'm using star version 2.6.1d and I haven't tested in the most recent version since there is no mention in the release notes regarding any change in this behaviour.