Questions about kmer_filter command

512 views
Skip to first unread message

Florian Mauffrey

unread,
Oct 31, 2014, 12:01:39 PM10/31/14
to stacks...@googlegroups.com
Hello,

I'm using the kmer_filter command of Stacks to filter/normalize my reads prior to de novo assembly with Trinity. My questions are: how filtering based on rare or abundant kmer works ?
Certainly the choice of rare/abundant filtering depends on the type of reads I have but how can I choose between rare or abundant or both ?
Finally, how do I determine the depth for kmer normalization option ?

Thank you !

Sincerely,

Florian

Julian Catchen

unread,
Nov 10, 2014, 8:21:31 PM11/10/14
to stacks...@googlegroups.com, mauffre...@gmail.com
Hi Florian,

Filtering for rare kmers works like this:

1) The kmer_filter program computes the kmer frequencies across the
whole data set you submit to the program.

2) For each read, the read is kmerized, and the median kmer coverage is
computed.

3) A sliding window, of size k, is moved across the read. If at any
point the kmer coverage within the window drops below the coverage limit
(default is 15% of the median read coverage), then the read is trimmed
or dropped.

The effect of this is to identify putative sequencing errors by finding
a run of kmers that has a kmer coverage far below the median coverage
for that read, as sequencing errors will spawn runs of low-coverage kmers.

Abundant filtering is simpler, it just checks if a read contains more
than the specified number or repetitive kmers and if so, drops the read.

Both filters can be quite useful depending on the downstream program you
plan to use. I would recommend trying your assemblies with and without
filtering to be able to compare the differences.

The same advice applies for choosing kmer normalization. I have had good
results with a 40x limit myself, but you can find other limits specified
in the literature for other normalizing programs (like khmer).

Let us know how the program worked out for you--

Best,

julian


Florian Mauffrey wrote:
> Hello,
>
> I'm using the kmer_filter command of Stacks to filter/normalize my reads
> prior to /de novo/ assembly with Trinity. My questions are: how

Ana Maria Millan

unread,
Jan 19, 2018, 11:13:37 AM1/19/18
to Stacks
Hello,

I´m using kmer_filter of STACKS to visualize the error profiles of the RAD-seq data, but for some reason that I can´t resolve, I´m not able to get the output files. The program runs and I got this message but I don´t find any output file. Thanks in advance.

Using a kmer size of 15

Filtering out reads by identifying rare kmers: Off.

Filtering out reads by identifying abundant kmers: Off.

Normalizing read depth: Off.

Found 2 input file(s).

Found 0 paired input file(s).



Best regards,

Ana Maria  
Reply all
Reply to author
Forward
0 new messages