Hello everyone,
I’m getting a high percentage of reads left unmapped due to being “too short” (~35% - 42%) when mapping paired-end Illumina reads. After reading some of the threads from users with similar problems, I tried setting --outFilterScoreMinOverLread,--outFilterMatchNminOverLread, and --alignSplicedMateMapLminOverLmate to 0.50, which reduced the amount of “too short” reads only by about 4%. Any suggestions for other ways to try to reduce the number of unmapped reads?
Here is the command I used with STAR 2.4.2a:
STAR --runMode alignReads --runThreadN 8 --genomeDir Tsirt_genome --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic --outFilterType BySJout --outSAMattributes All --outSAMheaderPG Sickle --chimOutType SeparateSAMold --outSAMattrRGline ID:SRR805147 DS:Lung --outFileNamePrefix ./map/SRR805147_Lung --readFilesIn sickle_trimmed/SRR805147_Lung_1.sickle.fastq sickle_trimmed/SRR805147_Lung_2.sickle.fastq --bamRemoveDuplicatesType UniqueIdenticalBelow is a Log.final.out file when the above flags were set at their default values:
Number of input reads | 44635143
Average input read length | 195
UNIQUE READS:
Uniquely mapped reads number | 26148473
Uniquely mapped reads % | 58.58%
Average mapped length | 189.97
Number of splices: Total | 12516076
Number of splices: Annotated (sjdb) | 12514958
Number of splices: GT/AG | 12333007
Number of splices: GC/AG | 126045
Number of splices: AT/AC | 8109
Number of splices: Non-canonical | 48915
Mismatch rate per base, % | 0.43%
Deletion rate per base | 0.02%
Deletion average length | 1.95
Insertion rate per base | 0.01%
Insertion average length | 1.53
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 647350
% of reads mapped to multiple loci | 1.45%
Number of reads mapped to too many loci | 7705
% of reads mapped to too many loci | 0.02%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 39.79%
% of reads unmapped: other | 0.16%
And here is the Log.final.out after changing the flags to 0.50:
Number of input reads | 44635143
Average input read length | 195
UNIQUE READS:
Uniquely mapped reads number | 28049700
Uniquely mapped reads % | 62.84%
Average mapped length | 184.83
Number of splices: Total | 13307673
Number of splices: Annotated (sjdb) | 13306377
Number of splices: GT/AG | 13090522
Number of splices: GC/AG | 135649
Number of splices: AT/AC | 8716
Number of splices: Non-canonical | 72786
Mismatch rate per base, % | 0.50%
Deletion rate per base | 0.02%
Deletion average length | 1.90
Insertion rate per base | 0.01%
Insertion average length | 1.53
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 859077
% of reads mapped to multiple loci | 1.92%
Number of reads mapped to too many loci | 9478
% of reads mapped to too many loci | 0.02%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 35.05%
% of reads unmapped: other | 0.16%
Please let me know if there I can provide any other helpful information.
Thank you very much for your time, I really appreciate any feedback or suggestions.
Blair Perry