Changing K-mer length

1,898 views
Skip to first unread message

Miles Buchwaldt

unread,
Feb 19, 2015, 5:25:14 PM2/19/15
to trinityrn...@googlegroups.com
I am using Trinity to assemble RNA-Seq data from a polyploid plant species. This presents the problem of a great deal of homology between genes which makes it difficult for assemblers using relatively short kmer lengths to distinguish between homologs. I would like to test, how using longer k-mers affect assembly quality under these conditions.

I understand that Trinity in it's current form, does not allow for changing the value of k-mer length. So I am wondering if there are plans to implement this feature in the near future, if there is a test or older build of trinity that has this, or if there was a way to add this functionality to trinity myself.

Regards,
Miles

Brian Haas

unread,
Feb 19, 2015, 5:52:52 PM2/19/15
to Miles Buchwaldt, trinityrn...@googlegroups.com
Hi Miles,

The latest version of Trinity has a parameter

   --KMER_SIZE

where you can change it from the default (25), but can't go higher than 32 (max) because the kmers are initially stored in 64-bit integers with 2-bit encoding.

Note, separating out alleles and paralogs is a challenge and Trinity will often generate chimeras between them unless there's good pair coverage across their entire lengths and there's significant variation in sequence along their lengths.  Longer reads definitely help too.

best,

~brian




--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Jack Cui

unread,
Apr 26, 2015, 9:58:39 AM4/26/15
to trinityrn...@googlegroups.com, buchw...@gmail.com
Hi Brain,

I've seen more than two papers using Trinity in metatranscriptome assembly and it showed that Trinity outperformed DBG-based assembler such as metavelvet and others not alike, so I gave it a try with my dataset(88m paired end 125bp reads), I made two separate assembly using kmer = 25 and kmer=32, while the former one ended in  about 13.5h, the latter has not finished right now (>80h).

I wonder if we can set kmer to a EVEN numer?  Acctually I started another assembly job yesterday setting kmer = 31 when i found it has consumed an abnormally long  time, and  has not finished at this time. Is this difference expected or I need to do other optimization?


Best,

Jack




On Friday, February 20, 2015 at 6:52:52 AM UTC+8, Brian Haas wrote:
Hi Miles,

The latest version of Trinity has a parameter

   --KMER_SIZE

where you can change it from the default (25), but can't go higher than 32 (max) because the kmers are initially stored in 64-bit integers with 2-bit encoding.

Note, separating out alleles and paralogs is a challenge and Trinity will often generate chimeras between them unless there's good pair coverage across their entire lengths and there's significant variation in sequence along their lengths.  Longer reads definitely help too.

best,

~brian



On Thu, Feb 19, 2015 at 5:25 PM, Miles Buchwaldt <buchw...@gmail.com> wrote:
I am using Trinity to assemble RNA-Seq data from a polyploid plant species. This presents the problem of a great deal of homology between genes which makes it difficult for assemblers using relatively short kmer lengths to distinguish between homologs. I would like to test, how using longer k-mers affect assembly quality under these conditions.

I understand that Trinity in it's current form, does not allow for changing the value of k-mer length. So I am wondering if there are plans to implement this feature in the near future, if there is a test or older build of trinity that has this, or if there was a way to add this functionality to trinity myself.

Regards,
Miles

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Apr 26, 2015, 5:21:04 PM4/26/15
to Jack Cui, trinityrn...@googlegroups.com, Miles Buchwaldt
Hi Jack,

In all honesty, we made the kmer value adjustment available but it hasn't been rigorously  evaluated yet, and has been considered an experimental feature.  We should probably make that more apparent in the usage info. All our production work with Trinity still involves k=25 (the original default).  I hadn't imagined it causing serious performance problems, but that could certainly be the case here.  

Also, we haven't had the time or resources to explore use of Trinity for metatranscriptomics. It's good to know that others have demonstrated Trinity to be useful for this application, but it was never one of our targeted use cases.

best,

~brian


On Sun, Apr 26, 2015 at 9:58 AM, Jack Cui <pop...@gmail.com> wrote:
Hi Brain,

I've seen more than two papers using Trinity in metatranscriptome assembly and it showed that Trinity outperformed DBG-based assembler such as metavelvet and others not alike, so I gave it a try with my dataset(88m paired end 125bp reads), I made two separate assembly using kmer = 25 and kmer=32, while the former one ended in  about 13.5h, the latter has not finished right now (>80h).

I wonder if we can set kmer to a EVEN numer?  Acctually I started another assembly job yesterday setting kmer = 31 when i found it has consumed an abnormally long  time, and  has not finished at this time. Is this difference expected or I need to do other optimization?


Best,

Jack




On Friday, February 20, 2015 at 6:52:52 AM UTC+8, Brian Haas wrote:
Hi Miles,

The latest version of Trinity has a parameter

   --KMER_SIZE

where you can change it from the default (25), but can't go higher than 32 (max) because the kmers are initially stored in 64-bit integers with 2-bit encoding.

Note, separating out alleles and paralogs is a challenge and Trinity will often generate chimeras between them unless there's good pair coverage across their entire lengths and there's significant variation in sequence along their lengths.  Longer reads definitely help too.

best,

~brian



On Thu, Feb 19, 2015 at 5:25 PM, Miles Buchwaldt <buchw...@gmail.com> wrote:
I am using Trinity to assemble RNA-Seq data from a polyploid plant species. This presents the problem of a great deal of homology between genes which makes it difficult for assemblers using relatively short kmer lengths to distinguish between homologs. I would like to test, how using longer k-mers affect assembly quality under these conditions.

I understand that Trinity in it's current form, does not allow for changing the value of k-mer length. So I am wondering if there are plans to implement this feature in the near future, if there is a test or older build of trinity that has this, or if there was a way to add this functionality to trinity myself.

Regards,
Miles

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

lani.g...@gmail.com

unread,
Nov 21, 2016, 6:56:32 PM11/21/16
to trinityrnaseq-users
I have also encountered a similar problem where the kmer=25 assembly finishes fine, but adjusting the kmer size to 30, 31, or 32 results in the assembly stalling in the Crysalis stage indefinitely.  I looked in recent Release Notes and didn't immediately find anything mentioning kmer size...does that mean that at this point adjusting the kmer value is still considered an experimental feature?

Thank you,
Lani

Brian Haas

unread,
Nov 21, 2016, 7:03:38 PM11/21/16
to lani.g...@gmail.com, trinityrnaseq-users
Hi Lani,

Is this happening with the very latest release (2.3.2)?   The kmer length in chrysalis is actually fixed and only the other parts of Trinity will use the different kmer size, so I wouldn't expect chrysalis to have a problem with it.   If you're having trouble with it in the latest release, I'll happily explore it further.  Although changing the kmer length is a feature in Trinity, we don't tend to change it in practice.... it's provided as an exploratory feature, mostly for troubleshooting certain edge cases or otherwise tough transcripts.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages