struggling with high % of reads unmapped: too short

rz...@ucdavis.edu

unread,

Oct 11, 2016, 11:40:30 AM10/11/16

to rna-star

Hi everyone,

I have been struggling with this "high % of reads unmapped: too short" problem for almost a month. I am working with Brassica napus, and mapping illumina paired-end 100bp raw reads to its reference genome.

The command I have been using is "STAR --genomeDir path-to-genome --readFilesIn ../1_1.fq ../1_2.fq --outSAMtype BAM SortedByCoordinate --sjdbGTFfile genome.gff3 --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic –alignIntronMax 15000 --outFilterIntronMotifs RemoveNoncanonical --runThreadN 6 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS --outReadsUnmapped Fastx"

The result gave me % of reads unmapped ranged from 28% to 55% for my 24 libraries. (such a low mapping rate!!! I am expecting at least 80% mapping rate)

I have tried different ways to look for the reason:

1) mapped my reads using BWA and TopHat, both of them gave low mapping rate. So this is not a problem with STAR.

2) mapped my reads to combined B.rapa & B.oleracea (these are the two ancestor genomes of Brassica napus), gave me almost the same low mapping rate too. So it is not a problem with B.napus genome.

3) included Mitochondria, chloroplast, and rRNA in my reference, still got similar low mapping rate. This means mitochondria and chloroplast, and rRNA contamination is not a problem.

While all of these didn't work out. The last thing I tried is de-novo assemble my unmapped reads into transcripts using Trinity and blast my gene against NCBI genebank to see what they hit. Guess what, out of the 43749 assembled transcripts, only 1945 don't hit Brassica napus genes, and rRNA, chloroplast, and mitochondria contamination doesn't seem like a problem either.

Advanced users or developers, should I change the advanced options for my mapping? Or should I trim my data?

confused....

I appreciate any input and suggestions.

Thank you very much!

Ruijuan Li

rz...@ucdavis.edu

unread,

Oct 12, 2016, 6:29:28 PM10/12/16

to rna-star

Finally we figured it out: this is due to a high adapter contamination. So I learned that quality and adapter trimming should always be performed before mapping.

Darya Vanichkina

unread,

Oct 16, 2016, 10:28:56 PM10/16/16

to rna-star

Curious: did you see these high adapter levels in fastqc?

Alexander Dobin

unread,

Oct 18, 2016, 4:38:14 PM10/18/16

to rna-star

Hi Ruijuan,

thanks for sharing this experience. I think the "adapter contamination" is becoming a common problem, so I will start strongly recommending the pre-mapping trimming.

STAR has a basic adapter trimmer --clip3pAdapterSeq <adapterSeq> --clip3pAdapterMMp 0.1 (proportion of mismatches in the adapter sequence.