Remove Duplicates

664 views
Skip to first unread message

bold...@gmail.com

unread,
May 5, 2015, 1:06:55 PM5/5/15
to rna-...@googlegroups.com
Hi Alex,

Can I add the option --bamRemoveDuplicatesType in my command line while mapping (or should I do it in a second step in --runMode inputAlignmentsFromBAM).

Thanks


Francois

Joshua Bradley

unread,
May 8, 2015, 11:26:36 AM5/8/15
to rna-...@googlegroups.com
I think it has to be done in a second step. Also, depending on whether your data is paired-end or not may make a difference. It's not clear to me in the documentation how STAR defines a duplicate. I personally use Picard to remove duplicates (this is also part of the GATK Best Practices pipeline).

Alexander Dobin

unread,
May 8, 2015, 12:00:55 PM5/8/15
to rna-...@googlegroups.com, jgbra...@gmail.com
Hi Francois, Joshua,

Joshua is right - at the moment it has to be done separately from mapping/sorting, with the --runMode inputAlignmentsFromBAM option after the mapping and sorting is done.
The reads are considered duplicates if their alignment starts (after extending soft-clipped bases) and CIGARS (i.e. indels and junctions) coincide (mismatches in the sequence are allowed).
I am not sure how that definition is different from Picard or samtools.

Cheers
Alex

Felix Schlesinger

unread,
May 8, 2015, 1:16:35 PM5/8/15
to rna-...@googlegroups.com, jgbra...@gmail.com
I think samtools and picard do not require indels in the CIGAR to match for a pair to be considered duplicate.

The other tricky question with duplicates is always how to handle chimeric pairs and (chimeric-) split reads. 

ymc...@gmail.com

unread,
May 11, 2015, 11:36:08 AM5/11/15
to rna-...@googlegroups.com
I have three questions about STAR's remove duplicates function:

1) Can it be multithreaded such that it is faster than picard?
2) Does it work for the case for PE reads that spans over the breakpoint of a genome rearrangement/fusion gene?
3) Can I use it for DNA mapped bam?

Thanks a lot in advance.
Yee Man

bold...@gmail.com於 2015年5月6日星期三 UTC+8上午1時06分55秒寫道:

Alexander Dobin

unread,
May 14, 2015, 4:43:00 PM5/14/15
to rna-...@googlegroups.com, ymc...@gmail.com
Hi Yee Man,

1. The duplicate removal is not multi-threaded at the moment. Do you have speed comparison with picard.
2. It will not work for any type of "chimeric alignments".
3. It should work for DNA reads (except for fusions).

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages