STAR Poorly Mapping Paired Input Reads, but works fine when input separately

Caleb Bostwick

unread,

Jun 16, 2016, 2:32:38 PM6/16/16

to rna-star

Hello fellow rna-star users. I am using star to map paired 101 bp Illumina reads. I first process the paired reads (read1 and read2) using trimmomatic for quality control. I then input the trimmed read1 and trimmed read2 into star's --readFilesIn argument, but I get a very low uniquely mapped read percentage (1-2%) and a very high percentage of unmapped: too short (>90%). However, when I run star using each trimmed file separately (not paired), star outputs a high uniquely mapping % (>88%) and a low unmapped: too short % (4-5%). Does anyone have an idea why this is occurring and/or what I can do to map the reads as paired input and get a high unique mapping percentage? Thank you very much.

Best,
Caleb

Alexander Dobin

unread,

Jun 16, 2016, 6:58:20 PM6/16/16

to rna-star

Hi Caleb,

this point to a problem with ordering the reads in the two files.

STAR expects exactly the same ordering in two files. It's possible that trimmomatic dropped a read from one of the files but did not drop its mate from the other file - this would screw up the ordering. There should be an option in trimmomatic to make it drop reads in pairs.

You can check this by mapping the reads without any trimming. If this results in a good mapping rate, than trimming is to blame, otherwise there could be a problem with ordering in your original fastq.

Cheers

Alex

Caleb Bostwick

unread,

Jun 16, 2016, 7:17:24 PM6/16/16

to rna-star

Hello Alex. Thank you very much for the prompt response and excellent program. I will try to map the reads without trimming. One thing I have tried in the meantime was to sort the trimmed fastq files (end1 and end2) using fastq-tools fastq-sort (http://homes.cs.washington.edu/~dcjones/fastq-tools/fastq-sort.html). I sorted the reads alphabetically by read identifier (--id option) and got the same results as I did before sorting (low unique mapping %) Might this result indicate the problem lies in the original fastq files and not the trimming? Thanks again for your valuable time and assistance.

Best,
Caleb

Alexander Dobin

unread,

Jun 16, 2016, 7:51:56 PM6/16/16

to rna-star

Hi Caleb,

if there were "unpaired" reads in the files, simple would not fix the problem, since these unpaired read will shift the ordering of the reads.

I have looked at the trimmomatic documentation, and it output the unpaired reads into separate files, so it should not create any problems.

So it indeed seems like a problem with the original files, and it's not simple ordering, since this would have been fixed by sorting.

The first thing you could check is whether the read IDs in the two files are consistent.

If that is true, the problem may be deeper, and I would need a small example subset.

Cheers

Alex

Caleb Bostwick

unread,

Jun 17, 2016, 12:12:29 PM6/17/16

to rna-star

Hi Alex. So I downloaded the files again directly from NCBI's SRA using the SRA-toolkit and mapped the files both before and after trimming with trimmomatic. Our suspicion about the problem being with the original files was correct. The untrimmed files had >81% unique mapping, and the trimmed files had >86% unique mapping. I apologize for not checking the integrity of the data before asking questions about the software. Thank you again for your help and fine program.

Best,
Caleb

Alexander Dobin

unread,

Jun 17, 2016, 1:10:30 PM6/17/16

to rna-star

Hi Caleb,

it's great you resolved it, thanks for letting me know.

I plan to add more automatic checks of the input files to allow quick detection of these problems.