run STAR for PE with unequal amount of reads

Lea Shallev

unread,

Mar 2, 2015, 4:22:49 PM3/2/15

to rna-...@googlegroups.com

Hi Alex,

I run STAR to map RNA seq of PE reads to the human genome.
In some of my samples the number of the reads wasn't equal in the 2 fastq files (it is an external DB and I don't know why).
There is a problem to run STAR as PE when the reads number is unequal?
There were no errors in the log file and the alignment look OK in the IGV but the size of the sam file is significantly small compare to the sam files of samples that have the same number of reads in their fastq file.

In addition, I think that I found a little problematic definition in the tutorial of STAR:
"outFilterMultimapNmax 10
int: read alignments will be output only if the read maps fewer than
this value, otherwise no alignments will be output"

but actually read alignments will be output only if the read maps fewer or equal than
this value.

Thanks a lot for your help!
Lea

Alexander Dobin

unread,

Mar 5, 2015, 10:22:01 AM3/5/15

to rna-...@googlegroups.com

Hi Lea,

the number of reads in two PE fastq files must be equal, and, moreover, the order of the reads must be exactly the same. STAR does not check or fix the read order, it assumes the order is correct. However, it's likely to produce an error if the number of reads is different in two files.

Can you trace the reason for inconsistent fastq files? It must be some kind of processing like trimming.

Cheers

Alex

Lea Shallev

unread,

Mar 9, 2015, 7:38:28 AM3/9/15

to rna-...@googlegroups.com

Hi Alex,

I can't trace the reason for the inconsistent fastq files. I tried to contact to the publishers of the DB to get more information but they didn't answer..

Do you have a recommend software that can remove the reads that do not have a pair and sort them for the STAR alignment?

Thanks a lot,

Lea

בתאריך יום חמישי, 5 במרץ 2015 בשעה 17:22:01 UTC+2, מאת Alexander Dobin:

Alexander Dobin

unread,

Mar 12, 2015, 12:47:46 PM3/12/15

to rna-...@googlegroups.com

Hi Lea,

I am not aware of any software that can do that. You can easily script this yourself, e.g.:

awk '{if (ARGIND==1) {gsub ("/1","",$1); r=$1; getline; S[r]=$1; getline;getline; Q[r]=$1} else {gsub ("/2","",$1); if ($1 in S) {print $1 "\n" S[$1] "\n+\n" Q[$1] > "Read1"; print $1; getline; print; getline; print "+"; getline; print} } }' <(zcat Read1.in.gz) <(zcat Read2.in.gz) > Read2

It will output the correctly ordered reads into Read1 and Read2, though I did not test this script thoroughly.

Cheers

Alex

Dietmar Rieder

unread,

Mar 12, 2015, 5:06:19 PM3/12/15

to rna-...@googlegroups.com

Hi Lea,

besides the awk script that Alex posted, you might wish to try pairfq a perl tool from Evan Staton. I use this routinely and it works very well.

https://github.com/sestaton/Pairfq

Dietmar

Lea Shallev

unread,

Mar 18, 2015, 8:22:33 AM3/18/15

to rna-...@googlegroups.com

O.K.

Thank you very much!

בתאריך יום חמישי, 12 במרץ 2015 בשעה 23:06:19 UTC+2, מאת Dietmar Rieder:

Reply all

Reply to author

Forward