Reordering / shuffling paired-end FASTQ files

501 views
Skip to first unread message

Thomas Sandmann

unread,
Oct 15, 2016, 5:44:30 PM10/15/16
to Sailfish Users Group
Dear Salmon users,

I saw that the documentation highlights the need for reads to be in random order, e.g. not sorted by position or target. It recommends to "randomize / shuffle them before performing quantification with Salmon."

When I retrieve data from the SRA repository and extract the FASTQ files with fastq-dump, I am not sure whether this condition is met.
Is there an efficient way of randomize the order in (paired-end) FASTQ files, potentially even as part of the input stream?

Thanks a lot for any pointers,

Thomas

Vasisht Tadigotla

unread,
Oct 15, 2016, 10:01:59 PM10/15/16
to Thomas Sandmann, Sailfish Users Group
Hi Thomas,

It should be fine for sra files that only contain the reads and no alignments. Even if the file contains alignments to a reference genome, I think it should be ok, since the alignment order is not likely to match the ordering of the transcripts (not completely certain here). 

You split the fast file into several chunks and pass them as inputs to Salmon or concatenate them in a random order.

e.g. salmon quant -1 < (gzcat 1.fq.gz 4.fq.gz 3.fq.gz, 2.fq.gz) -2 (gzcat 1.fq.gz 4.fq.gz 3.fq.gz 2.fq.gz)

I’ve some code based on kseq.h that can split fastq files into smaller chunks of specified number of reads (https://github.com/vasisht/fastq_splitter) if that helps. 

Cheers,
Vasisht
--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/54300a09-de48-44ed-b96d-2db870e95c19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Sandmann

unread,
Oct 17, 2016, 7:15:05 PM10/17/16
to Sailfish Users Group, toms...@gmail.com
Great, thanks a lot for you quick response Vasisht!
Reply all
Reply to author
Forward
0 new messages