how to reduce large % of reads unmapped: too short

53 views
Skip to first unread message

Ramesh Ramasamy

unread,
Feb 7, 2024, 8:00:25 AMFeb 7
to rna-star
Hi Alex,

I am mapping PE-150 adapter-trimmed bulk RNA-seq reads from mouse F1 hybrids using STAR version 2.7.11a. I anticipate around 5 million heterozygous SNVs, but I am not sure how many of these SNVs are in transcribed regions. I have a median of 25M reads per sample. For almost all of the samples, over 40% of the reads are categorized as "% of reads unmapped: too short."

 

Here is the command I used:

`STAR --runThreadN 30 --genomeDir mouse_genome_1 --readFilesIn ./adapt_trimmed/${FILENAME}_R1_001.fastq.gz ./adapt_trimmed/${FILENAME}_R2_001.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix ./star_quant/${FILENAME} --outSAMtype BAM Unsorted --outSAMunmapped Within --outSAMattributes Standard --outReadsUnmapped Fastx --quantMode GeneCounts`

 

When I added `--outFilterScoreMinOverLread 0.4 --outFilterMatchNminOverLread 0.4`, the same percentage of reads were labeled as multi-mapping. Upon comparing the read IDs of multi-mappers (@0.4) with those of unmapped reads (@0.66), over 95% of them intersect.

 

The read length of the mapped reads is greater than that of the unmapped/multi-mapping reads. See the attached plots.

 

Thank you in advance for your assistance.

Thanks,

Ramesh



mapped_reads.PNG
unmapped_reads.PNG

Alexander Dobin

unread,
Feb 23, 2024, 2:45:59 PMFeb 23
to rna-star
Hi Ramesh,

You would need to investigate why the reads do not map - or map as multimappers when the short mapped length is allowed.
You can BLAST the unmapped reads to check for contamination.
Reply all
Reply to author
Forward
0 new messages