STAR with Exome sequencing

João Fadista

unread,

Aug 29, 2013, 8:03:43 AM8/29/13

to rna-star

Hi,

Just wondering if there are any metric comparisons available when comparing STAR with BWA when aligniging Exome sequencing (or Whole Genome Sequencing).

Thanks,

João

Alexander Dobin

unread,

Aug 31, 2013, 8:08:02 PM8/31/13

to rna-...@googlegroups.com

Hi João,

STAR was designed for aligning RNA-seq data, and is not optimized for DNA-seq mapping. In particular, the "mapping" quality values are very naive which may give you trouble with SNP callers. Some users had managed to obtain decent results with STAR, see this post and the comparison on bioplanet within it. Of course, STAR advantage is very high speed.

Cheers

Alex

Alexander Predeus

unread,

Apr 14, 2016, 11:57:54 AM4/14/16

to rna-star

I meant to ask the same question - the thing is, most of the ENORMOUS resequencing projects are extremely slow with catching up to the newest assemblies.

I have been told that it's mostly because of the alignment issues - it's just too expensive (and long) to re-align the whole compendium of the samples (which is now approaching 100k for ExAC for example).

Of course, things are even more dramatic for the WGS - files are just huge and alignment is not a cheap thing to accomplish (even though arguably it's better to assemble WGS reads rather than align them, or use some sort of combinations).

It seems like an excellent niche for STAR - most of the reads are the same length and the alignment can be lightning fast if properly optimized (I mean, aligning 2x150bp exome reads is a piece of cake anyway, isn't it? Most of them map uniquely with great confidence).

The only issue is proper estimation of mapping qualities. I wonder if there is a way to cut corners there.

Alexander Dobin

unread,

Apr 14, 2016, 3:58:06 PM4/14/16

to rna-star

Hi Alexander,

accurate calculation of mapping qualities is complicated, since it requires knowing very well the "next best alignment".

I think STAR (with default parameters) is too greedy compared to BWA, leading to a higher error rate. For applications such as SNP detection this would be a showstopper.

The parameters can probably be tweaked to reduce the error rate at the expense of speed, but I have not done that.