Arguments in TransDecoder

383 views
Skip to first unread message

Dapeng Wang

unread,
Apr 5, 2017, 7:49:26 AM4/5/17
to TransDecoder-users
Hi there,

I am using the TransDecoder software to predict the protein sequences for transcripts and unfortunately I feel a bit confusion about the argument of TransDecoder.Predict.

I wonder if I turn on all these four arguments such as --retain_long_orfs 90, --retain_pfam_hits, --retain_blastp_hits and --single_best_orf, what will happen?

The purpose is to select only one best ORF for each input gene sequence based on the principle that selecting those with pfam or blastp data support and for the rest sequences without pfam and blastp data support choosing the longest one. Is the above command line correct?

Another question: is there any link between the --retain_long_orfs in TransDecoder.Predict and -m in TransDecoder.LongOrfs? Is it necessary to keep them consistent in some way?



Many thanks for your help,

Regards,

Tom

Brian Haas

unread,
Apr 5, 2017, 7:55:38 PM4/5/17
to Dapeng Wang, TransDecoder-users, trinityrn...@googlegroups.com
responses below

On Wed, Apr 5, 2017 at 7:49 AM, Dapeng Wang <wang...@gmail.com> wrote:
Hi there,

I am using the TransDecoder software to predict the protein sequences for transcripts and unfortunately I feel a bit confusion about the argument of TransDecoder.Predict.

I wonder if I turn on all these four arguments such as --retain_long_orfs 90, --retain_pfam_hits, --retain_blastp_hits and --single_best_orf, what will happen?


I expect it should retain any ORFs that meet the defined criteria.  In the case where there are multiple candidate ORFs meeting that criteria on the same transcript contig, the 'best' orf is selected according to:

  --single_best_orf                      Retain only the single best ORF per transcript.

 (Best is defined as having (optionally pfam and/or blast support) and longest orf)

Think of this as a 2 column sort, first by pfam|blast, and then descendingly by length.  Only the top entry should be retained.


 
The purpose is to select only one best ORF for each input gene sequence based on the principle that selecting those with pfam or blastp data support and for the rest sequences without pfam and blastp data support choosing the longest one. Is the above command line correct?


yes
 
Another question: is there any link between the --retain_long_orfs in TransDecoder.Predict and -m in TransDecoder.LongOrfs? Is it necessary to keep them consistent in some way?



The -m value is the shortest orf it'll consider.  Note that the shorter the orf, the higher the false positive rate.


 

Many thanks for your help,

Regards,

Tom

--
You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-users+unsub...@googlegroups.com.
To post to this group, send email to transdecoder-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/transdecoder-users/f9b7c115-f389-4de2-9c2e-2bfb9bd97da8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Dapeng Wang

unread,
Apr 5, 2017, 8:17:48 PM4/5/17
to trinityrnaseq-users, wang...@gmail.com, transdeco...@googlegroups.com
Thank you.

For the blast+ search, can I use my own protein sequence database instead of Swiss-prot (or uniref90) database mentioned the manual for the homology analysis and further orf prediction analysis?


Thanks,

Tom




Brian Haas於 2017年4月6日星期四 UTC+1上午12時55分39秒寫道:
> responses below
>
>
> On Wed, Apr 5, 2017 at 7:49 AM, Dapeng Wang <wang...@gmail.com> wrote:
>
> Hi there,
>
>
> I am using the TransDecoder software to predict the protein sequences for transcripts and unfortunately I feel a bit confusion about the argument of TransDecoder.Predict.
>
>
> I wonder if I turn on all these four arguments such as --retain_long_orfs 90, --retain_pfam_hits, --retain_blastp_hits and --single_best_orf, what will happen?
>
>
>
>
> I expect it should retain any ORFs that meet the defined criteria.  In the case where there are multiple candidate ORFs meeting that criteria on the same transcript contig, the 'best' orf is selected according to:
>
>
>
>
>
>
>
>
>
>
>
>   --single_best_orf                      Retain only the single best ORF per transcript.
>
>  (Best is defined as having (optionally pfam and/or blast support) and longest orf)
>
> Think of this as a 2 column sort, first by pfam|blast, and then descendingly by length.  Only the top entry should be retained.
>
>
>  
>
>
> The purpose is to select only one best ORF for each input gene sequence based on the principle that selecting those with pfam or blastp data support and for the rest sequences without pfam and blastp data support choosing the longest one. Is the above command line correct?
>
>
>
>
>
> yes
>  
>
>
> Another question: is there any link between the --retain_long_orfs in TransDecoder.Predict and -m in TransDecoder.LongOrfs? Is it necessary to keep them consistent in some way?
>
>
>
>
>
>
> The -m value is the shortest orf it'll consider.  Note that the shorter the orf, the higher the false positive rate.
>
>
>
>
>  
>
>
>
>
>
> Many thanks for your help,
>
>
> Regards,
>
>
> Tom
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-us...@googlegroups.com.
>
> To post to this group, send email to transdeco...@googlegroups.com.

Brian Haas

unread,
Apr 5, 2017, 8:30:48 PM4/5/17
to Dapeng Wang, trinityrnaseq-users, TransDecoder-users
Yes, for the blast search you can use whatever you'd like.

best of luck!

~b

On Wed, Apr 5, 2017 at 8:17 PM, Dapeng Wang <wang...@gmail.com> wrote:
Thank you.

For the blast+ search, can I use my own protein sequence database instead of Swiss-prot (or uniref90) database mentioned the manual for the homology analysis and further orf prediction analysis?


Thanks,

Tom




Brian Haas於 2017年4月6日星期四 UTC+1上午12時55分39秒寫道:
> responses below
>
>
> On Wed, Apr 5, 2017 at 7:49 AM, Dapeng Wang <wang...@gmail.com> wrote:
>
> Hi there,
>
>
> I am using the TransDecoder software to predict the protein sequences for transcripts and unfortunately I feel a bit confusion about the argument of TransDecoder.Predict.
>
>
> I wonder if I turn on all these four arguments such as --retain_long_orfs 90, --retain_pfam_hits, --retain_blastp_hits and --single_best_orf, what will happen?
>
>
>
>
> I expect it should retain any ORFs that meet the defined criteria.  In the case where there are multiple candidate ORFs meeting that criteria on the same transcript contig, the 'best' orf is selected according to:
>
>
>
>
>
>
>
>
>
>
>
>   --single_best_orf                      Retain only the single best ORF per transcript.
>
>  (Best is defined as having (optionally pfam and/or blast support) and longest orf)
>
> Think of this as a 2 column sort, first by pfam|blast, and then descendingly by length.  Only the top entry should be retained.
>
>
>  
>
>
> The purpose is to select only one best ORF for each input gene sequence based on the principle that selecting those with pfam or blastp data support and for the rest sequences without pfam and blastp data support choosing the longest one. Is the above command line correct?
>
>
>
>
>
> yes
>  
>
>
> Another question: is there any link between the --retain_long_orfs in TransDecoder.Predict and -m in TransDecoder.LongOrfs? Is it necessary to keep them consistent in some way?
>
>
>
>
>
>
> The -m value is the shortest orf it'll consider.  Note that the shorter the orf, the higher the false positive rate.
>
>
>
>
>  
>
>
>
>
>
> Many thanks for your help,
>
>
> Regards,
>
>
> Tom
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-users+unsub...@googlegroups.com.
>
> To post to this group, send email to transdeco...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/transdecoder-users/f9b7c115-f389-4de2-9c2e-2bfb9bd97da8%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>
> --
>
>
>
> --
> Brian J. Haas
> The Broad Institute
> http://broadinstitute.org/~bhaas
>
>  

--
You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-users+unsub...@googlegroups.com.
To post to this group, send email to transdecoder-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/transdecoder-users/e366f2d5-ec2a-4af4-a211-afab4a352f03%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dapeng Wang

unread,
Apr 6, 2017, 8:31:09 AM4/6/17
to trinityrnaseq-users, wang...@gmail.com, transdeco...@googlegroups.com
Hi,

I have successfully run the Transdecoder throughout but  I am not sure about how to understand the naming system in the *.pep and *.cds.

For instance, the sequence name in the input fasta files is ">TRINITY_DN51741_c2_g5_i2", and in the resultant *.pep file its name changed into the following:

>TRINITY_DN51741_c2_g5::TRINITY_DN51741_c2_g5_i2::g.1::m.1 TRINITY_DN51741_c2_g5::TRINITY_DN51741_c2_g5_i2::g.1  ORF type:internal len:137 (+) TRINITY_DN51741_c2_g5_i2:3-410(+)

I think it should start with the original sequence name in my input file but it seems that the software tried to extract the gene name only in the first place. Could you please help me know the inner logic in the name from the left to the right? e.g., what is g.1? what is m.1?

Many thanks,

Tom



Brian Haas

unread,
Apr 6, 2017, 8:59:53 AM4/6/17
to Dapeng Wang, trinityrnaseq-users, TransDecoder-users
It's basically creating a unique accession value for that coding region on that transcript.

The formatting might change in a future release given how much confusion it's been causing for folks.

--
You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-users+unsub...@googlegroups.com.
To post to this group, send email to transdecoder-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages