Using -D option

50 views
Skip to first unread message

Tejaswi Yarra

unread,
Feb 25, 2015, 11:58:05 AM2/25/15
to ea-u...@googlegroups.com
Hello,

I have a question about removing duplicates from the sequences using the -D option.
I am unsure as to what the number N represents.

The documentation indicates:
-D N Remove duplicate reads : Read_1 has an identical N bases (0)

If I do "-D 50" then Read_1 has 50 identical bases? What does this actually mean?
How can I actually use this option to remove duplicate sequences from my data?

Thank you in advance for your help and apoligies if my question was very amateurish.

Regards,
Teja.

Jason Powers

unread,
Apr 2, 2015, 11:16:30 AM4/2/15
to ea-u...@googlegroups.com
Teja,

The N here refers to the number of bases examined for duplication.

Let's say your read is a single-end fastq that is 50 nt long.

If you set N to 25, then it is going to find all unique 25mers (starting from the first base), and toss any duplicates.

Tejaswi Yarra

unread,
Jun 5, 2015, 11:38:46 AM6/5/15
to ea-u...@googlegroups.com
Thank you very much for the explanation Jason!
Very sorry for the late reply, I did not check back here in a long time.
Reply all
Reply to author
Forward
0 new messages