sequence that should map, but doesn't

86 views
Skip to first unread message

Eric Londin

unread,
May 1, 2020, 10:59:56 AM5/1/20
to rna-star
Hi,
I am trying to map some sequence data and running into a problem. We do want everything to be mapped allowing no more than 1 mis-match and multi-mappings. As an example, we have 1 sequence: 

'GGCGTCTACGGCCATACCACCCTGAACGCGCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCGGGCCTGGTT'

By using BLAT, we know that this sequence is exactly in the genome (multiple times), but it doesn't show up in our mapping. My mapping commands are:

$STAR \

--runThreadN 60 \

--genomeDir $STARGENOMEDIR \

--readFilesIn $INPUTFILE \

--readFilesCommand zcat \

--outFileNamePrefix Mapped.out. \

--outSAMmultNmax -1 \

--outSAMtype BAM SortedByCoordinate \

--quantMode TranscriptomeSAM \

--outFilterMismatchNmax 1 \

-–sjdbGTFfile $GENEANNOTATION \

--outFilterMultimapNmax 1000 \

--outSAMattributes Standard \

--chimSegmentMin 20 \

--twopassMode Basic \

--twopass1readsN -1 \

--outTmpKeep none \

--scoreDelOpen 0 \

--scoreDelBase 0 \

--scoreInsOpen 0 \

--scoreInsBase 0 \

--outReadsUnmapped Fastx


These are single-end 76bps reads. I'm not sure if there is an issue with the specific commands, or the way that STAR handles highly repetitive regions of the genome. Thank you for any help or insight.
Best,
Eric

Alexander Dobin

unread,
May 1, 2020, 2:08:19 PM5/1/20
to rna-star
Hi Eric,

I think this sequence maps to more than 50 loci, and thus you also have to increase --winAnchorMultimapNmax .
This parameter defines the max number of loci and anchor seed is allowed to map to. You need to set it to (at least) the max number of loci for multimappers that you want to consider. However, too large values will slow down mapping significantly.

Cheers
Alex

Eric Londin

unread,
May 1, 2020, 3:46:35 PM5/1/20
to rna-star
Hi Alex,
Thank you for the reply. I'm getting some odd results. If I try to map that read by itself (i.e. no other reads in the file) as a test, it maps as expected in the genome, using both my original settings and with increasing the --winAnchorMultimapNmax. But, when I go back and try to remap the read with it as part of a bigger set of reads, it fails to map and ends up in the unmapped fastq output. This seems really odd to me. Would you have any insights into this?
Thank you for your help,
Eric

Alexander Dobin

unread,
May 4, 2020, 12:55:50 PM5/4/20
to rna-star
Hi Eric,

if you use the 2-pass mapping, for some reads, the alignment depends on other reads, so mapping the read by itself can be different.
When mapping all reads, have you tried to further increase --outFilterMultimapNmax to see if this read can be mapped?

Cheers
Alex

Eric Londin

unread,
May 4, 2020, 1:51:22 PM5/4/20
to rna-star
Hi Alex,
my --outFilterMultimapNmax is set to 1000, which should be good, but I'll try increasing it some more. Out of curiosity, why would the alignment of some reads be dependent on other reads? I was always under the impression that each read is considered as an individual event, or am I misunderstanding?
Best,
Eric

Alexander Dobin

unread,
May 6, 2020, 1:42:49 PM5/6/20
to rna-star
Hi Eric,

sorry, I meant increased --winAnchorMultimapNmax for the 2-pass run.

The reads are indeed mapped independently for the standard 1-pass mapping.
However, with the 2-pass scheme, the junctions detected from all reads in the 1st pass are "inserted" in the genome.
Then the mapping in the 2nd pass can be dependent on which junctions sequences were inserted.
Even if a read is not spliced, the seeds can map to more loci in the 2nd pass, which is why increasing --winAnchorMultimapNmax may be needed to capture multimappers.

Cheers
Alex 

Eric Londin

unread,
May 8, 2020, 9:22:21 AM5/8/20
to rna-star
Thanks, that makes sense to me. The read does in fact map as expected in the first pass, I'm playing around with the parameters to try to see how I can get to map during the second pass.
Thanks for your help.
Eric
Reply all
Reply to author
Forward
0 new messages