too many short reads failed to map

Assa Yeroslaviz

unread,

Jan 13, 2023, 4:04:17 AM1/13/23

to rna-star

I have a Ribo-Seq dataset I am trying to map using STAR 2.7.10a with the following parameters:

--limitBAMsortRAM 7062663806

--winAnchorMultimapNmax 20

--seedSearchStartLmax 15

--outFilterMultimapNmax 8

--outFilterMismatchNmax 4

As I'm mapping against the transcriptome, I am not using any gtf file neither for indexing the genome, nor the mapping itself.

The mapping is preceded by cutadapt to remove adapter, discarding also untrimmed reads and fastx_trimmed to remove the first base after trimming. I then map against the rRNA to remove those reads and the rest is then mapped against my transcriptome.

but for some reason, over 80% of my reads fail to align due to being too short.

less GSM3152879/GSM3152879.Log.final.out

Started job on | Jan 12 16:10:05

Started mapping on | Jan 12 16:10:10

Finished on | Jan 12 16:59:26

Mapping speed, Million of reads per hour | 414.20

Number of input reads | 340105409

Average input read length | 36

UNIQUE READS:

Uniquely mapped reads number | 4841867

Uniquely mapped reads % | 1.42%

Average mapped length | 17.95

Number of splices: Total | 14760

Number of splices: Annotated (sjdb) | 0

Number of splices: GT/AG | 14723

Number of splices: GC/AG | 27

Number of splices: AT/AC | 0

Number of splices: Non-canonical | 10

Mismatch rate per base, % | 4.84%

Deletion rate per base | 0.00%

Deletion average length | 1.29

Insertion rate per base | 0.00%

Insertion average length | 1.09

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 34426996

% of reads mapped to multiple loci | 10.12%

Number of reads mapped to too many loci | 1648189

% of reads mapped to too many loci | 0.48%

UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0

% of reads unmapped: too many mismatches | 0.00%

Number of reads unmapped: too short | 299183031

% of reads unmapped: too short | 87.97%

Number of reads unmapped: other | 5326

% of reads unmapped: other | 0.00%

CHIMERIC READS:

Number of chimeric reads | 0

% of chimeric reads | 0.00%

but for some reason, over 80% of my reads fail to align due to being too short.

What is the reason for this very high number of too short reads?

Is it possible to map these reads as well (Does it make sense to try and map them?)

thanks

Assa

Alexander Dobin

unread,

Feb 3, 2023, 11:35:14 AM2/3/23

to rna-star

Hi Assa,

the reads are short to begin with, and the mapped portion of the reads that could map is only 18b, while the mismatch error rate is quite high ~5%.

Two possible reasons are (i) poor quality sequencing (ii) divergent genome. Both are hard to fix with such short reads.

Reply all

Reply to author

Forward

Message has been deleted