Mapping long paired-end reads

55 views
Skip to first unread message

Babak A

unread,
May 12, 2016, 5:10:01 PM5/12/16
to rna-star
Hi Alex,

Quick question: I was wondering if you could suggest parameters for mapping long paired-end read? e.g., the 262bp x 2 reads in SRR1293901. With the suggested parameters for short reads, I get ~35% "Unmapped Short". There are lots of Ns in the reads, but I am not sure if the 35% is caused only by them.

Thanks,
Babak

Alexander Dobin

unread,
May 16, 2016, 5:14:00 PM5/16/16
to rna-star
Hi Babak,

Off the bat I would recommend increasing the number of allowed mismatches --outFilterMismatchNmax 20 or even more (even though Ns do not count towards mismatches).
Please post (or email me) the entire Log.final.out file so that I can give you an informed advice.
What is the proportions of Ns in the read, and their distribution along the read length?

Cheers
Alex

Babak A

unread,
May 17, 2016, 1:14:53 PM5/17/16
to rna-star
Hi Alex,

Thanks for your suggestions. Here is the Log.final.out

                                 Started job on |       May 12 13:20:19
                             Started mapping on |       May 12 13:22:22
                                    Finished on |       May 12 13:50:43
       Mapping speed, Million of reads per hour |       20.16

                          Number of input reads |       9524186
                      Average input read length |       524
                                    UNIQUE READS:
                   Uniquely mapped reads number |       5990265
                        Uniquely mapped reads % |       62.90%
                          Average mapped length |       482.10
                       Number of splices: Total |       11571163
            Number of splices: Annotated (sjdb) |       11418597
                       Number of splices: GT/AG |       11473450
                       Number of splices: GC/AG |       87181
                       Number of splices: AT/AC |       10532
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       0.80%
                         Deletion rate per base |       0.01%
                        Deletion average length |       2.15
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.86
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       187180
             % of reads mapped to multiple loci |       1.97%
        Number of reads mapped to too many loci |       471
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       34.93%
                     % of reads unmapped: other |       0.20%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%


Here are the base composition percentiles in the left mates files:

A: 25.3 
C: 25.9
G: 24.0 
T: 24.5 
N:0.3

Total: 2.5e9 bases

Also, I am using --outFilterMismatchNmax  999 to activate the proportional mismatch number.

Thanks,
Babak

Alexander Dobin

unread,
May 20, 2016, 2:10:21 PM5/20/16
to rna-star
Hi Babak,

the mapped length of 482 is significantly shorter than the read length of 524.
This may be caused by:
1. Poor quality tails - did you check the sequencing quality vs base position?
In this case it would be helpful to do quality-dependent trimming.
2. Short insert size - in this case the adapter sequence will appear at the read ends.
In this case you can trim the adapter sequence before mapping.
3. Sequencing quality for one of the reads is worse than for the other - you can check this by mapping the ends separately.

For all cases, you can allow for shorter mapped lengths by reducing --outFilterScoreMinOverLread and --outFilterMatchNminOverLread

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages