length mapped reads - length reference genome

391 views
Skip to first unread message

rnaFan

unread,
Apr 6, 2016, 8:53:07 AM4/6/16
to rna-star
Dear all,
I tried to map some miRNA samples to mirnaOME and I have a strange output:

                                 Started job on |    Apr 06 11:47:29
                             Started mapping on |    Apr 06 11:47:30
                                    Finished on |    Apr 06 11:48:57
       Mapping speed, Million of reads per hour |    376.87

                          Number of input reads |    9107723
                      Average input read length |    41
                                    UNIQUE READS:
                   Uniquely mapped reads number |    4047164
                        Uniquely mapped reads % |    44.44%
                          Average mapped length |    21.62
                       Number of splices: Total |    0
            Number of splices: Annotated (sjdb) |    0
                       Number of splices: GT/AG |    0
                       Number of splices: GC/AG |    0
                       Number of splices: AT/AC |    0
               Number of splices: Non-canonical |    0
                      Mismatch rate per base, % |    0.29%
                         Deletion rate per base |    0.00%
                        Deletion average length |    1.00
                        Insertion rate per base |    0.00%
                       Insertion average length |    1.04
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |    161711
             % of reads mapped to multiple loci |    1.78%
        Number of reads mapped to too many loci |    0
             % of reads mapped to too many loci |    0.00%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |    0.00%
                 % of reads unmapped: too short |    53.79%
                     % of reads unmapped: other |    0.00%
                                 Started job on |    Apr 06 11:50:36
                             Started mapping on |    Apr 06 11:50:42
                                    Finished on |    Apr 06 11:52:12
       Mapping speed, Million of reads per hour |    195.95

the "Average input read length" is 41 and the "Average mapped length" 21.62. Basically in this way if I map the 50% of the read to the 100% of the reference sequence that read is flagged as "mapped" to that sequence. This make sense if I have mRNA but here I am studying miRNAs and I switched off the splicing with the options "--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --alignIntronMax 1".
How can I fix it? With an "End-toEnd" alignment?

The command is:
STAR_2.4.1d   --runThreadN 8   --genomeDir star   --readFilesIn input.fastq      --limitBAMsortRAM 10369680498   --outFileNamePrefix test   --outStd BAM_SortedByCoordinate   --outSAMtype BAM   SortedByCoordinate      --outSAMunmapped Within   --outFilterScoreMinOverLread 0   --outFilterMatchNmin 15   --outFilterMatchNminOverLread 0   --outFilterMismatchNoverLmax 0.05   --alignIntronMax 1

Thanks a lot in advance.

Alexander Dobin

unread,
Apr 6, 2016, 5:16:07 PM4/6/16
to rna-star
Hi @rnaFan,

since miRNAs are ~22nt in size, the average mapped length of 21.6 makes sense.
The read sequnces are longer because they contain adapter at the 3' ends.
It's best to trim those adapters before mapping, but since you did not do it, STAR is trimming all the bases it cannot match to the genome.
If you use EndToEnd option, most reads will be unmappable.

To trim adapters before mapping, you can use external trimmers, or STAR's internal trimmer with
--clip3pAdapterSeq <AdapterSequence> --clip3pAdapterMMp <MismatchPercentageInAdapter>
AdapterSequence should be the first bases of the adapter, I typically use ~10, MismatchPercentageInAdapter ~<0.1
After trimming you can use EndToEnd option.

Cheers
Alex

Vincenzo Capece

unread,
Apr 7, 2016, 3:50:42 AM4/7/16
to Alexander Dobin, rna-star
Dear Alex,
Now it makes more sense.
Thanks a lot.
Have a nice day

--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/V27GN5g7pyU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/rna-star.



--
Regards,
            Capece Vincenzo
Reply all
Reply to author
Forward
0 new messages