Paired-end mapping issue

1,070 views
Skip to first unread message

Carsten Kuenne

unread,
Oct 26, 2016, 12:02:38 PM10/26/16
to rna-star
Hi all,

I have been successfully using STAR for dozens of projects including RNA-Seq (paired-end, single-end) and ChIpSeq (single-end) data. Now I have tried for the first time to deal with paired-end ATACSeq data and found a very unexpected behaviour for my dataset. If I map the data with default parameters in paired-end mode, about 60% of the reads will be marked as mapping to multiple loci (which I usually remove later on). If I map R1 and R2 separately, only about 10% of reads will be multimappers in each set. This is the opposite of what should happen in my understanding, as two mates should be able to anchor the alignment more "uniquely" than one mate, since the total alignment length is longer.

I am using STAR 2.5.2b and the index was built including human Ensembl GTF version GRCh37.75 for hg19. The reads were produced by an Illumina NextSeq machine and average at ~50 bp each. The reads were pretrimmed, but ensuring that no mate was completely removed. So it's the same number and order of reads in R1 and R2 which have the same id per pair.

For the screenshot below, I only show uniquely mapping reads (first lane = paired, 2. = R1, 3. = R2). STAR maps both of these paired reads uniquely in single-end mode, why not as paired-end? Can it have something to do with the fragment length? As you can see, the R1/R2 reads of each pair frequently overlap to a large degree.

I'm kind of stumped at this point. Any ideas?

Best,
Carsten

Carsten Kuenne

unread,
Oct 27, 2016, 4:17:51 AM10/27/16
to rna-star
I just compared the paired-end mapping with only unique alignments (lane 1) to the paired-end mapping including multiple hits (lane 2). The screenshot shows 1) paired-end unique, 2) paired-end multi, 3) single-end unique R1, 4) single-end unique R2. So the separate mates of a pair are mapped uniquely in single-end mode (NH=1), but with multiple matches in paired-end mode (NH=2). In multi-mapping paired-end mode the two mates were not recognized as mates as you can see in the picture (Mate is mapped = no in IGV). BUT mate1 was set to "First in pair" and mate2 to "Second in pair". Really weird...



Carsten Kuenne

unread,
Oct 27, 2016, 8:04:21 AM10/27/16
to rna-star
After some more digging, I found the issue I guess. The reads were trimmed from both sides, which in combination with the low fragment size seems to be a situation not handled correctly by STAR at the moment. If I use the raw reads or just disable the headcrop, the mates are mapped correctly as unique in paired-end mode. I would really recommend to change this behavior, as it results in misleading bam files.

Alexander Dobin

unread,
Oct 28, 2016, 12:05:25 PM10/28/16
to rna-star
Hi Carsten,

thanks for figuring out what the problem was. I think it's the same as the one discussed in this thread:

And I think I have already coded the solution:
--alignEndsProtrude       5    ConcordantPair
    int, string:        allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate
                        1st word: int: maximum number of protrusion bases allowed
                        2nd word: string:
                                            ConcordantPair ... report alignments with non-zero protrusion as concordant pairs
                                            DiscordantPair ... report alignments with non-zero protrusion as discordant pairs

This options defines the number of nucleotides that are allowed to be "protruded" from the ends of overlapping mates.
It is zero by default, you need to set it to ~ max number of bases you expect to be trimmed from the 5' .
However, I have not tested it thoroughly on relevant examples. If you are willing to give it a try, I will be happy to help.

Cheers
Alex

Carsten Kuenne

unread,
Oct 31, 2016, 4:51:09 AM10/31/16
to rna-star
Hi Alex,

thanks. I will try that.

Best,
Carsten
Reply all
Reply to author
Forward
0 new messages