Extracting only the the sequence which have M in the cigar string from STAR local aligner

69 views
Skip to first unread message

Varun Gupta

unread,
Apr 1, 2016, 3:28:43 PM4/1/16
to rna-star
HI Alex,
Hope you are doing well. I mapped a data set with my custom genome and want to extract from the bam files only that sequence where I have the M part of the CIGAR string. So i don't want soft clip bases and I and deletions. I used --alignIntronMax as 1 so there won't be any reads with N but still I have S , I and D. How can I get them .

Regards
Varun

Alexander Dobin

unread,
Apr 1, 2016, 4:18:21 PM4/1/16
to rna-star
Hi Varun,

do you mean that you want to set STAR parameters in such a way that it allows alignments with only M operations in the CIGAR?
You can achieve this by using:
--alignEndsType EndToEnd --alignIntronMax 1 --alignIntronMin 2 --scoreDelOpen -10000 --scoreInsOpen -10000

Cheers
Alex

Varun Gupta

unread,
Apr 1, 2016, 4:42:23 PM4/1/16
to rna-star

Hi Alex
In that case I will loose all those reads which have alignment like this

10S45M2S

I want only the 45M part of the sequence. But in bam file all 57 bases are present.

And then there can be complications of insertions and deletions some thing like 10S40M2I20M1I30M2S

I tried to write something but it get's complex.

Any idea??

Alexander Dobin

unread,
Apr 1, 2016, 4:51:16 PM4/1/16
to rna-star
Hi Varun,

do you want to allow STAR to make alignments with N,I,D,S, but then filter them out?
This would be easy to do, though you would have to fix the NH and HI multi-mapping attributes.
Or do you want to extract the M parts of the alignments?
This would be hard since you will have to change practically all fields in SAM.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages