Mapping single-end and paired-end reads together

833 views
Skip to first unread message

James Blachly

unread,
Oct 1, 2014, 10:12:32 PM10/1/14
to rna-...@googlegroups.com
Dear STAR community,

When I trim my paired-end data, I end up with paired left and right reads, plus left mates which have lost their mate (e.g. mate was too short after adapter or quality trimming), plus right mates which have lost their mate. This is a total of 4 files.

Does STAR have the capability to map all four together, or must I run two jobs and combine the output?

My preferred downstream tool of choice, eXpress, can handle a BAM file with (left+right)+(unpaired left)+(unpaired right) without a problem; I've been using bowtie2 for this but I'd like to switch to STAR, thought, and this would simplify my workflow.

Thanks all in advance, and thanks Alex for your hard work.


Alexander Dobin

unread,
Oct 2, 2014, 11:17:24 AM10/2/14
to rna-...@googlegroups.com
Hi James,

You would have to run 2 (or even 3) jobs. When mapping single-end reads, STAR will *not* set the paired-end specific FLAG bits:
0x1 template having multiple segments in sequencing
0x8 next segment in the template unmapped
0x40 the first segment in the template
0x80 the last segment in the template
I am not sure if this would create a problem for eXpress - does it need to know that these single-end alignments were originally paired-end?

If you run separately (unpaired left) and (unpaired right), you can set these bits in the FLAG manually:
0x1+0x8+0x40 for (unpaired left), and 0x1+0x8+0x80 for (unpaired right).
I think I will code an option allowing to add an arbitrary number to the FLAG. 

Cheers
Alex

James Blachly

unread,
Oct 2, 2014, 2:58:01 PM10/2/14
to rna-...@googlegroups.com
Thanks Alex. I don't know how eXpress handles reads with the 0x8 bit set, as I have bowtie2 set with --no-discordant option so there should not be any of those in my output.

It does not need to know that singletons were originally paired, except inasmuch as I believe the right mate become singleton should be reverse complemented before mapping with RF dUTP protocols, while the left mate become singleton should be for FR protocols.

An arbitrary flag setting would be great, though.

He2len

unread,
Feb 11, 2015, 11:10:27 AM2/11/15
to rna-...@googlegroups.com
Hello,

I also end up with paired and unpaired reads after trimming.
For tophat they advise to first run the paired-end reads and to subsequently run the unpaired reads providing the junctions file from the paired-end mapping.
I'm wondering if the same holds true for STAR - is there an advantage of providing the SJ.out.tab file (first 4 columns) from mapping of my paired-end reads to the sjdbFileChrStartEnd option for mapping my unpaired reads?

Thanks a lot,
Helen

Alexander Dobin

unread,
Feb 12, 2015, 11:31:58 PM2/12/15
to rna-...@googlegroups.com
Hi Helen,

if you want to use detected splice junctions for mapping, it would require re-generating the genome, as in the 2-pass operation - please have a look at this post https://groups.google.com/d/msg/rna-star/rBQK-ujtSh8/viTPg1UFKl8J
If you are using good annotations, and are not concerned with very careful quantification of novel junctions, 2-pass approach will provide only limited mapping improvement.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages