Disagreement FastQC read length and Average input read length

53 views
Skip to first unread message

Koen K

unread,
Jun 16, 2016, 12:23:57 PM6/16/16
to rna-star
Hi all,

After our RNA-sequencing (75bp PE reads) has finished we're trying to run STAR for our alignment. All runs without problems and the output looks quite good, however there's one thing that seems strange to us: the number of input reads exceeds the actual number. Instead of 150 bp (the combined length of the PE reads), the input read length is reported as 160. Checking in FastQC does show a 75 bp read length. Has anyone had this before? Besides this one thing the mapping looks quite good, so we're also wondering if it would be safe to just continue with the rest of the pipeline.

Thanks in advance for your input!

Koen

Started job on | Jun 16 01:04:18
                             Started mapping on | Jun 16 01:08:04
                                    Finished on | Jun 16 01:14:36
       Mapping speed, Million of reads per hour | 528.45

                          Number of input reads | 57542076
                      Average input read length | 160
                                    UNIQUE READS:
                   Uniquely mapped reads number | 52306851
                        Uniquely mapped reads % | 90.90%
                          Average mapped length | 159.00
                       Number of splices: Total | 23621368
            Number of splices: Annotated (sjdb) | 23071617
                       Number of splices: GT/AG | 23358580
                       Number of splices: GC/AG | 184406
                       Number of splices: AT/AC | 21517
               Number of splices: Non-canonical | 56865
                      Mismatch rate per base, % | 0.28%
                         Deletion rate per base | 0.01%
                        Deletion average length | 1.47
                        Insertion rate per base | 0.00%
                       Insertion average length | 1.27
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci | 2250766
             % of reads mapped to multiple loci | 3.91%
        Number of reads mapped to too many loci | 90473
             % of reads mapped to too many loci | 0.16%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches | 0.00%
                 % of reads unmapped: too short | 4.94%
                     % of reads unmapped: other | 0.09%
                                  CHIMERIC READS:
                       Number of chimeric reads | 0
                            % of chimeric reads | 0.00%

Alexander Dobin

unread,
Jun 16, 2016, 12:56:49 PM6/16/16
to rna-star
Hi Koen,

this is very strange indeed. The mapped length is also ~160, so somehow STAR sees longer mappable seqeunces.
Could you please map a small subset (~1,000 reads) and if it still shows the same problem send it to me.

Cheers
Alex

archana bhardwaj

unread,
Dec 16, 2017, 9:55:16 AM12/16/17
to rna-star
Hello everyone ,

I also had same  issue in my samples. My read length is 75 bp. But its stange. 

 Number of input reads | 19153084
                      Average input read length | 148
                                    UNIQUE READS:
                   Uniquely mapped reads number | 16099484
                        Uniquely mapped reads % | 84.06%
                          Average mapped length | 147.09
                       Number of splices: Total | 6179163
            Number of splices: Annotated (sjdb) | 0
                       Number of splices: GT/AG | 6086749
                       Number of splices: GC/AG | 33639
                       Number of splices: AT/AC | 2689
               Number of splices: Non-canonical | 56086
                      Mismatch rate per base, % | 0.46%
                         Deletion rate per base | 0.07%
                        Deletion average length | 2.06
                        Insertion rate per base | 0.00%
                       Insertion average length | 1.19
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci | 2688325
             % of reads mapped to multiple loci | 14.04%
        Number of reads mapped to too many loci | 169029
             % of reads mapped to too many loci | 0.88%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches | 0.00%
                 % of reads unmapped: too short | 0.91%
                     % of reads unmapped: other | 0.12%
                                  CHIMERIC READS:
                       Number of chimeric reads | 0
                            % of chimeric reads | 0.00%

Please let me know. If there is any problem or I can proceed with this ??? 

waiting for reply 

Alexander Dobin

unread,
Dec 16, 2017, 3:27:35 PM12/16/17
to rna-star
Hi Archana,

STAR reports the average length of the pair, so if your reads are 2x75 and unttrimmed, it should give 
Average input read length | 150
However, if you trim your reads before mapping, it may give a smaller number.

You can check the read length distribution for your input files:
$ zcat read1.fq.gz | awk 'NR%4==2 {n[length($1)]++} END {for (ii=0;ii<=150;ii++) print ii,n[ii]}'

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages