Maximum homopolymers

72 views
Skip to first unread message

Jini

unread,
Sep 19, 2014, 11:22:03 AM9/19/14
to qiime...@googlegroups.com
Hi group!

I used Mothur to trim a set of 454 sequences (used min and max length, qaverage=25, maxambig=0). I also used the Chimera Slayer and filter.seqs (on Mothur). I then imported a subset of these sequences into QIIME (I wanted to build some bootstrapped trees). I've noticed that the default trimming in QIIME includes removing any homopolymers >6 bp in length. I'm wondering why that's the default and if I should make it less stringent?  It's removed another 3000 sequences from my data set (from my already trimmed data set from mothur).  I read somewhere that I should see how many homopolymers I'm expecting to have in my sequence.  I quick look on BLAST makes it seem like I should expect homopolymers of 4 bp...not sure if this helps.

If anyone's got insight into whether eliminating homopolymers is necessary (and what length to set it as), I'd appreciate it. I've looked at a couple articles, but haven't really found what I'm looking for.

Thanks!

Katherine Amato

unread,
Sep 19, 2014, 12:11:50 PM9/19/14
to qiime...@googlegroups.com
Hi Jini,

This is a good question. Here's some information that hopefully helps a little. The homopolymer issue starts to appear around 6 bases in 454 data, hence our defaults for split_libraries.py, but if one is expecting homopolymers in the region/taxa sequenced, then relaxing the parameter would probably be justified.

Best,
Katie
Reply all
Reply to author
Forward
0 new messages