Extracting uniquely mapped reads

6,631 views
Skip to first unread message

ERA

unread,
Apr 10, 2017, 11:11:58 AM4/10/17
to rna-star

Hi Alex,

Would you suggest me a way to extract the uniquely mapped reads from the sample_Aligned.sortedByCoord.out.bam? There is a difference between the number (Uniquely mapped reads number 15988433) I found in the sample_Log.final.out file and the number (15916113) I obtained from the use of egrep or fgrep (samtools view -h sample_ Aligned.sortedByCoord.out.bam | fgrep -w NH:i:1 | samtools etc).

Thanks,

ERA 

Alexander Dobin

unread,
Apr 10, 2017, 3:26:03 PM4/10/17
to rna-star
Hi ERA,

I think you are picking NH:i:10 (and 11,12,...if you allowed more than 10 multimappers).
You would need to ensure "tab" after 1 to pick only unique mappers.

An easier way to select unique mappers is by MAPQ, since unique mappers all have 255:
$ samtools view -q 255 Aligned.sortedByCoord.out.bam

Cheers
Alex

ERA

unread,
Apr 10, 2017, 4:05:44 PM4/10/17
to rna-...@googlegroups.com

Thanks Alex. I do not think so because the number of reads (15916113) I got from the egrep or fgrep command is 1- lower than the number of the uniquely mapped reads (15988433) in the Log.final.out file; and then 2- the same as I got using samtools –q 255 (15916113). It tested that in two samples. 


$samtools view -h -q  255 Aligned.sortedByCoord.out.bam | samtools view | wc -l

31832226, after dividing this number by 2 (pair-end samples), I got 15916113

Alexander Dobin

unread,
Apr 10, 2017, 4:27:03 PM4/10/17
to rna-star
Hi ERA,

I thought that you have single-end reads. For paired-end reads, the likely explanations is as follows.
Some of the reads are mapped as single-end alignments - this could happen if
(i) you changed the parameters to allow for SE alignments
(ii) you trimmed reads before mapping - if the mates have very unbalanced lengths after trimming, even the default parameterswill allow SE alignments.

When you count the NH:i:1 lines, the SE alignment will contribute 1, so when you divide them by 2, you will count them as 1/2 reads.

You can count separately the SE and PE alignments:
SE: $ samtools view -c -q 255 -F 0x2 Aligned.out.bam
PE: $ samtools view -c -q 255 -f 0x2 Aligned.out.bam

Then SE+PE/2 should be equal to the number of unique mappers in the Log.final.out.

Cheers
Alex

ERA

unread,
Apr 10, 2017, 5:00:21 PM4/10/17
to rna-...@googlegroups.com

My math is presently correct. Yes, STAR allowed SE alignment because some mates have different lengths after trimming.  So according to you, I can use $samtools view -q 255 Aligned.out.bam to get all uniquely mapped reads. Please confirm if I am right.

Thanks

Alexander Dobin

unread,
Apr 10, 2017, 5:07:59 PM4/10/17
to rna-star
Yes, that's correct samtools view -q 255 Aligned.out.bam will extract all unique mappers, SE and PE.

ERA

unread,
Apr 10, 2017, 5:13:52 PM4/10/17
to rna-star
Thank you very much, this ticket can be closed.
Reply all
Reply to author
Forward
0 new messages