Aligning both paired and unpaired reads for a sample

snt...@gmail.com

unread,

Dec 25, 2013, 12:29:13 AM12/25/13

to sub...@googlegroups.com

I used subjunc of Subread 1.4.3-p1 to align RNA seq. reads to the UCSC hg19 reference human genome. The alignment seems to have been successful.

However, I am not sure if the result is good because I used both paired and unpaired reads (for the same sample). The original data is from an Illumina paired-end 101 b run but I used the trimmomatic read cleaning/filtering software on the data, which resulted in the generation of both paired and unpaired data which were then used for subjunc.

Example usage:

subjunc -T 16 --gzFASTQinput -i hg19_index_for_Subread -r trimmed_paired_1.fastq.gz trimmed_unpaired_1.fastq.gz -R trimmed_paired_2.fastq.gz trimmed_unpaired_2.fastq.gz -o out -u -H --BAMoutput

Can someone confirm that subjunc will use both paired and unpaired read data when provided (or if it discards the unpaired read data)? I cannot find information on this in the Subread manual or in online forums.

Further, the message that Subread shows after alignment to summarize the result uses the term 'fragment' (such as in 'mapped fragments'), and I cannot match the number of fragments to the number of reads. E.g., for one sample, the input had a total of 9598056 paired reads (2x9598056 total) and a total of 3402944 unpaired reads (as per Tophat), but Subread's message stated that the input had 7781687 fragments (of which 95.7% could be mapped). In this context (both paired and unpaired data in the input), what does 'fragment' mean?

Thanks.

Wei Shi

unread,

Jan 3, 2014, 8:55:09 PM1/3/14

to sub...@googlegroups.com

Firstly, you do not have to trim your reads before aligning them since subjunc and subread-align aligners perform soft clipping for those read bases which could not be mapped to anywhere in the reference genome.

Secondly, the -r and -R arguments of subjunc (and subread-align) can each only accept one file. It is on our to-do list to support multiple input files. At present, you will have to run subjunc twice to align your paired reads and unpaired reads separately. However, if your reads are unpaired, you should map them as single-end reads rather than paired-end reads. Only when your reads are paired should you use both -r and -R options.

Lastly, after your alignments are done, you can merge your mapping results into one BAM file and then provide it to featureCounts. featureCounts automatically detects unpaired reads and then counts them properly. Note that you need to specify the -p option when running featureCounts so that you can count fragments instead of reads. A fragment is a pair of reads.

Hope this helps.

Wei

snt...@gmail.com

unread,

Jan 4, 2014, 6:42:24 PM1/4/14

to sub...@googlegroups.com

Thank you for the clarifications.

The ability to use multiple files, and both paired and unpaired read data, at the same-time for the subread-align/subjunc command will be a time-saver for many.... I hope the next Subread release has this feature.

My reads have a relatively major contamination with adapter sequences, and for one set of samples, the overall read quality is only modest. I am not sure if Subread handles such cases well (Tophat2 does not), and therefore I use trimmomatic to first clean the read data.

Reply all

Reply to author

Forward