Skip to first unread message

RP

unread,
Jan 12, 2017, 9:58:11 AM1/12/17
to CLARK Users
Hey Rachid,

I have a question regarding the methods section of your BMC publication.

After hashing the target sequences, "CLARK then removes any k-mer that appears in more than one target".
Then it states that "k-mers in the index may be removed based on their number of occurrences", which is accomplished through the -t option of the CLARK command.

-t <minFreqTarget>,     minimum of k-mer frequency/occurence for the discriminative k-mers:  integer, >=0.
The default value is 0. For example, for 1 (or, 2), the program will discard any 
discriminative k-mer that appears only once (or, less than twice)

So my question is, wouldn't setting -t >= 1 remove ALL k-mers in the target hash since the only remaining target k-mers have an occurrence among all targets equal to 1?
Also, wouldn't a k-mer that occurs more than once not be discriminative, rendering the 'discriminative k-mers' term in the -t option description meaningless?
Basically, I don't see the point of the -t option based on the protocol for target indexing.

Thanks,
RP

Rachid

unread,
Jan 15, 2017, 11:02:06 PM1/15/17
to CLARK Users
Dear Robert,
As described in the BMC Genomics paper, the occurrence of k-mers is evaluated across targets and within targets.
A k-mer can be specific to a target but it can appear multiple times (e.g., 1000 times) within that target. The "-t " option helps you to determine how many times at least you want each target-specific k-mer to appear within their associated target. Makes sense?
Cheers,
Rachid
Reply all
Reply to author
Forward
0 new messages