STAR --runThreadN 4 --genomeDir $index_dir \
--readFilesIn $R1 $R2 --outFileNamePrefix $path_to_out/$library_name. \
--outSAMtype BAM Unsorted
--alignSplicedMateMapLmin 50
--alignSplicedMateMapLminOverLmate 0.2
STAR --runThreadN 4 --runMode genomeGenerate --genomeDir $path_to_out/star/index \
--genomeFastaFiles $fasta --sjdbGTFfile $gff \
--sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 109
STAR_2.4.1d_modified
Number of input reads | 40059196
Average input read length | 225
UNIQUE READS:
Uniquely mapped reads number | 27299013
Uniquely mapped reads % | 68.15%
Average mapped length | 228.08
Number of splices: Total | 14823607
Number of splices: Annotated (sjdb) | 13537993
Number of splices: GT/AG | 14519438
Number of splices: GC/AG | 164417
Number of splices: AT/AC | 18779
Number of splices: Non-canonical | 120973
Mismatch rate per base, % | 0.50%
Deletion rate per base | 0.04%
Deletion average length | 2.08
Insertion rate per base | 0.04%
Insertion average length | 1.48
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 3062416
% of reads mapped to multiple loci | 7.64%
Number of reads mapped to too many loci | 156976
% of reads mapped to too many loci | 0.39%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 23.73%
% of reads unmapped: other | 0.09%
I think the reason that STAR reports fewer paired reads aligned is in the definition of the "proper" pair.STAR requires that the start of the mate on the positive strand is smaller than the start of the other mate.This requirement comes from a simple view of the sequencing process that starts on the opposite ends of an insert, and should be true even if the insert size is smaller than the read length.
Started job on | Jul 08 09:39:08
Started mapping on | Jul 08 09:39:29
Finished on | Jul 08 09:47:45
Mapping speed, Million of reads per hour | 146.23
Number of input reads | 20147145
Average input read length | 243
UNIQUE READS:
Uniquely mapped reads number | 14363238
Uniquely mapped reads % | 71.29%
Average mapped length | 242.82
Number of splices: Total | 6402912
Number of splices: Annotated (sjdb) | 6334140
Number of splices: GT/AG | 6365155
Number of splices: GC/AG | 32462
Number of splices: AT/AC | 3234
Number of splices: Non-canonical | 2061
Mismatch rate per base, % | 0.14%
Deletion rate per base | 0.01%
Deletion average length | 1.79
Insertion rate per base | 0.01%
Insertion average length | 1.57
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 736002
% of reads mapped to multiple loci | 3.65%
Number of reads mapped to too many loci | 9943
% of reads mapped to too many loci | 0.05%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 24.77%
% of reads unmapped: other | 0.23%
--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/VS3wiSciQtg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.
Started job on | Jan 08 12:18:38
Started mapping on | Jan 08 12:24:07
Finished on | Jan 08 12:40:14
Mapping speed, Million of reads per hour | 186.75
Number of input reads | 50162442
Average input read length | 50
UNIQUE READS:
Uniquely mapped reads number | 45580199
Uniquely mapped reads % | 90.87%
Average mapped length | 49.86
Number of splices: Total | 2135536
Number of splices: Annotated (sjdb) | 2057033
Number of splices: GT/AG | 2111366
Number of splices: GC/AG | 16160
Number of splices: AT/AC | 1618
Number of splices: Non-canonical | 6392
Mismatch rate per base, % | 0.31%
Deletion rate per base | 0.01%
Deletion average length | 1.41
Insertion rate per base | 0.01%
Insertion average length | 1.25
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 3364739
% of reads mapped to multiple loci | 6.71%
Number of reads mapped to too many loci | 231406
% of reads mapped to too many loci | 0.46%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 1.40%
% of reads unmapped: other | 0.57%
Started job on | Oct 16 00:34:49
Started mapping on | Oct 16 00:35:49
Finished on | Oct 16 00:53:06
Mapping speed, Million of reads per hour | 132.64
Number of input reads | 38208727
Average input read length | 102
UNIQUE READS:
Uniquely mapped reads number | 25108416
Uniquely mapped reads % | 65.71%
Average mapped length | 81.28
Number of splices: Total | 5025010
Number of splices: Annotated (sjdb) | 4895898
Number of splices: GT/AG | 4960736
Number of splices: GC/AG | 38904
Number of splices: AT/AC | 3710
Number of splices: Non-canonical | 21660
Mismatch rate per base, % | 0.72%
Deletion rate per base | 0.01%
Deletion average length | 1.12
Insertion rate per base | 0.00%
Insertion average length | 1.18
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 12930774
% of reads mapped to multiple loci | 33.84%
Number of reads mapped to too many loci | 165424
% of reads mapped to too many loci | 0.43%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 0.00%
% of reads unmapped: other | 0.01%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Thanks in advance!
Hi everyone,
I am getting a high percentage of "reads unmapped: too short" (25–35%) when mapping some paired-end Illumina reads.
The reads are 2x125 bp so ~110 bp each after trimming etc.
Here is how I used STAR:STAR --runThreadN 4 --genomeDir $index_dir \
--readFilesIn $R1 $R2 --outFileNamePrefix $path_to_out/$library_name. \
--outSAMtype BAM Unsorted
I have also (separately) tried the parameters:and--alignSplicedMateMapLmin 50but this doesn't make a difference.--alignSplicedMateMapLminOverLmate 0.2
My mean insert size is only about 130 bp as determined by bwa-mem (and in agreement with the Bioanalyzer traces) so I expect a large amount of overlap between the read pairs. I wonder if this could be part of my problem, since STAR is reporting the read length as 225 bp?
The genome was created with:STAR --runThreadN 4 --runMode genomeGenerate --genomeDir $path_to_out/star/index \
--genomeFastaFiles $fasta --sjdbGTFfile $gff \
--sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 109
I am using STAR 2.4.1d from github:STAR_2.4.1d_modified
I would be grateful for any suggestions. FWIW I get a high percentage (~95%) of concordant mapping with tophat2 for the same libraries, but I don't want to use tophat2.
Thanks for reading,
Tom Harrop
IRD, Montpellier, France.