Hi Florian,
Filtering for rare kmers works like this:
1) The kmer_filter program computes the kmer frequencies across the
whole data set you submit to the program.
2) For each read, the read is kmerized, and the median kmer coverage is
computed.
3) A sliding window, of size k, is moved across the read. If at any
point the kmer coverage within the window drops below the coverage limit
(default is 15% of the median read coverage), then the read is trimmed
or dropped.
The effect of this is to identify putative sequencing errors by finding
a run of kmers that has a kmer coverage far below the median coverage
for that read, as sequencing errors will spawn runs of low-coverage kmers.
Abundant filtering is simpler, it just checks if a read contains more
than the specified number or repetitive kmers and if so, drops the read.
Both filters can be quite useful depending on the downstream program you
plan to use. I would recommend trying your assemblies with and without
filtering to be able to compare the differences.
The same advice applies for choosing kmer normalization. I have had good
results with a 40x limit myself, but you can find other limits specified
in the literature for other normalizing programs (like khmer).
Let us know how the program worked out for you--
Best,
julian
Florian Mauffrey wrote:
> Hello,
>
> I'm using the kmer_filter command of Stacks to filter/normalize my reads
> prior to /de novo/ assembly with Trinity. My questions are: how