combining unpaired and paired reads

John Stanton-Geddes

unread,

May 6, 2014, 1:46:31 PM5/6/14

to sailfis...@googlegroups.com

Now that I've finally upgraded to version 0.6.3 I see that a 'library type' has to be specified as either SE or PE reads. However, as part of a standard quality control process, it is possible to end up with orphaned reads from paired end data. I can easily be persuaded that these reads are not useful and should be tossed, but they are often retained (e.g. as done when using khmer).

So, would it be possible to include unpaired reads *in addition* to paired end reads when running `sailfish quant`? Currently, it doesn't complain when I include both but only the paired end reads are processed.

Previously, I just included all my reads (orphaned and still-paired) as separate files. Two possible workarounds:

1) Run all data as single end (SE).

2) Quantify expression separately for the orphaned (SE) and paired reads...then sum TPM?

Rob

unread,

May 12, 2014, 10:46:00 PM5/12/14

to sailfis...@googlegroups.com

Hi John,

I'm glad to hear you've upgraded to 0.6.3 --- 0.6.4 is well under development and hopefully not too far off, with a number of improvements and new features. The short answer to your question is that you should be able to provide the -l flag multiple times on the command line. For example, something like

sailfish quant [other relevant parameters] -l "T=PE:O=><:S=U" -1 mates1.fastq -2 mates2.fastq -l "T=SE:S=U" -r unpaired.fastq

should work. Sailfish should process the libraries in the order they are given, and each -l description applies to the reads passed in until another -l is encountered. So, the command line above says to process
mates1 and mates2 using "T=PE:O=><:S=U" and to process unpaired.fastq using "T=SE:S=U". Let us know if this works for you.

Thanks,
Rob

Martin Alexander Smith

unread,

Feb 2, 2015, 9:30:05 AM2/2/15

to sailfis...@googlegroups.com

This doesn't seem to be the case for Salmon. Any plans to implement this lovely feature?

Rob

unread,

Feb 4, 2015, 3:37:57 PM2/4/15

to sailfis...@googlegroups.com

Hi Martin,

You're right. Currently, this feature isn't implemented (at least not in this manner, in salmon). In alignment-based salmon, orphaned reads are already handled by the model. What I mean by this is that if you have a paired-end read library (say, library type `IU` --- unstranded reads facing toward each other), and some reads are orphaned, either during the QC step or during alignment, but the remaining read has a valid SAM/BAM record in the output, these reads will be appropriately considered during quantification.

For "read-based" salmon, the situation is different. Currently, it is expected that, if the library is paired-end, then both pairs will be present in each read in the input. Further, it is assumed that both reads will map to the same contig in order to generate a valid "alignment". There are two ways to relax this requirement. The first is to allow the orphaning of reads during the mapping phase in read-based salmon. This would still require there to be two reads present for every input fragment, but would allow orphaned mappings if it's the case that no concordant mappings exist for the pair. The second way to relax this requirement is to have something like what is mentioned above, where the paired-end and orphaned reads are separated into different input files prior to running salmon. Then, each can be quantified according to its own, expected, library type, but both libraries are considered during quantification. I'd like to implement (at least) one of these --- but I'm not sure which would be most useful.