Hi all,
I have been successfully using STAR for dozens of projects including RNA-Seq (paired-end, single-end) and ChIpSeq (single-end) data. Now I have tried for the first time to deal with paired-end ATACSeq data and found a very unexpected behaviour for my dataset. If I map the data with default parameters in paired-end mode, about 60% of the reads will be marked as mapping to multiple loci (which I usually remove later on). If I map R1 and R2 separately, only about 10% of reads will be multimappers in each set. This is the opposite of what should happen in my understanding, as two mates should be able to anchor the alignment more "uniquely" than one mate, since the total alignment length is longer.
I am using STAR 2.5.2b and the index was built including human Ensembl GTF version GRCh37.75 for hg19. The reads were produced by an Illumina NextSeq machine and average at ~50 bp each. The reads were pretrimmed, but ensuring that no mate was completely removed. So it's the same number and order of reads in R1 and R2 which have the same id per pair.
For the screenshot below, I only show uniquely mapping reads (first lane = paired, 2. = R1, 3. = R2). STAR maps both of these paired reads uniquely in single-end mode, why not as paired-end? Can it have something to do with the fragment length? As you can see, the R1/R2 reads of each pair frequently overlap to a large degree.
I'm kind of stumped at this point. Any ideas?
Best,
Carsten
