my data was non-stranded but i used --SS_lib_type and --Jaccard_clip (Suggestions required)

132 views
Skip to first unread message

bilal....@cemb.edu.pk

unread,
Oct 31, 2016, 4:22:32 AM10/31/16
to trinityrnaseq-users
Hello all 
I have assembled the RNA-Seq data in recent past and now on working on Downstream analysis. I considered the reads as strand-specific and used the command --SS_lib_type during assembly. but now i came to know that the data was non-stranded. So please guide me either i should go again for zero for denovo assembly or should continue. 

Secondly, could we use the --Jaccard_clip command for RNA-Seq reads or this is specific for the genomic DNA specific reads.  

Thanks 
--bilal

Brian Haas

unread,
Oct 31, 2016, 7:06:26 AM10/31/16
to bilal....@cemb.edu.pk, trinityrnaseq-users
Hi,

You'll want to redo the assembly without the --SS_lib_type, otherwise you'll end up with a huge amount of redundancy in your current assembly (with the same transcripts being assembled as antisense and sense orientations), and assembled at half the effective sequence coverage.

Use the --jaccard_clip only if you're working on a microbial eukaryote that is thought (or known) to have a compact genome.  Otherwise, it's a large computational effort w/ little impact in running the jaccard clip step.

best,

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Brian Haas

unread,
Nov 18, 2016, 9:47:38 AM11/18/16
to Muhammad Bilal Sarwar, trinityrn...@googlegroups.com
Hi,

That script is no longer supported.  We now use this:  $TRINITY_HOME/util/filter_low_expr_transcripts.pl

as described at the bottom of this page:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification

best of luck,

~brian

 

On Thu, Nov 17, 2016 at 6:23 PM, Muhammad Bilal Sarwar <bilal....@cemb.edu.pk> wrote:
>
> Hello Brain
> i need your help regarding the following issue
> please help me as soon as possible
>
> i am trying to use the filter_fasta_by_rsem_value.pl script to filter the denovo assembly. i used following parameters
>
>  filter_fasta_by_rsem_value.pl
> --rsem_output  
> --fasta Trinity.fasta
> --output
> --filtered_ouput
> --fpkm_cutoff 1.0
>
> but each time i got the following error
> found a transcript ID (TRINITY_DN98_c0_g1_i1) in the FASTA that wasn't in the RSEM file
>
> how to resolve this error. all the files i used were unedited.
> what i am doing wrong. please guide me
>
> thanks
>
> On Mon, Oct 31, 2016 at 12:06 PM, Brian Haas <bh...@broadinstitute.org> wrote:
>>
>> Hi,
>>
>> You'll want to redo the assembly without the --SS_lib_type, otherwise you'll end up with a huge amount of redundancy in your current assembly (with the same transcripts being assembled as antisense and sense orientations), and assembled at half the effective sequence coverage.
>>
>> Use the --jaccard_clip only if you're working on a microbial eukaryote that is thought (or known) to have a compact genome.  Otherwise, it's a large computational effort w/ little impact in running the jaccard clip step.
>>
>> best,
>>
>> ~brian
>>
>> On Mon, Oct 31, 2016 at 4:22 AM, <bilal....@cemb.edu.pk> wrote:
>>>
>>> Hello all
>>> I have assembled the RNA-Seq data in recent past and now on working on Downstream analysis. I considered the reads as strand-specific and used the command --SS_lib_type during assembly. but now i came to know that the data was non-stranded. So please guide me either i should go again for zero for denovo assembly or should continue.
>>>
>>> Secondly, could we use the --Jaccard_clip command for RNA-Seq reads or this is specific for the genomic DNA specific reads.  
>>>
>>> Thanks
>>> --bilal
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
>>> To post to this group, send email to trinityrn...@googlegroups.com.

>>> Visit this group at https://groups.google.com/group/trinityrnaseq-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>
>> --
>> --
>> Brian J. Haas
>> The Broad Institute
>> http://broadinstitute.org/~bhaas
>>
>>  
>
>
>
>
> --
> With best Regards
>
> Muhammad Bilal Sarwar
> Ph.D. Research Fellow
> Plant Genomic Lab
> National Center of Excellence in Molecular Biology
> University of The Punjab, Lahore
> bilal....@cemb.edu.pk
> bilal_...@yahoo.com
> Ph # +92 (323) 6409666

Saurabh Gupta

unread,
Nov 14, 2017, 12:25:51 PM11/14/17
to trinityrnaseq-users
Hi Brian,

A follow-up on the same question. I also assembled using SS at first when my data was unstranded.

1. I have a interesting observation. When I quantify the reads using kallisto for stranded and unstranded assembly I see a difference of 10% in the alignment %age (~70%: unstranded and ~80%: stranded). Is it expected?
2. If we do cdhit with 95% for the stranded assembly, the redundancy problem should be taken care of, right? Actually the 10% difference seems a lot to me.

Please let me know if you need more details for reply.

Thanks

regards
Saurabh
>>> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Margo Maex

unread,
Dec 20, 2017, 11:20:10 AM12/20/17
to trinityrnaseq-users

We also once accidently specified our non-strand-specific library as being strand-specific (RF). Remarkably, this assembly produced less truncated contigs (so longer ‘genuine' contigs, confirmed by RACE-PCR) than when a default trinity was run. What is more surprising is that this assembly constructed with the -RF flag scored better metrics in Transrate (higher percentage of good read mappings, and overall higher transrate scores) and showed higher BUSCO scores (more complete BUSOs, however also more duplicated BUSCOs than in our default trinity run, which is a consequence of the redundancy you are talking about).


Does anyone have an idea why the assembler performs better in several ways (but not all, see higher BUSCO duplications) when strand-specificity is specified (RF), while the library is actually not strand-specific (we are sure about this, the company that did the sequencing also confirmed that they do not use a strand-specific library preparation kit)?


We tried the RF flag with some additional libraries, and we consequently got a lot less truncated contigs. So by specifying strand-specificity trinity will construct both sense as well as antisense transcripts and not merge them. If you would merge the sense and antisense transcripts by using a tool like dedupe (BBtools, since I read that CD hit does not merge reverse complements), will the assembly harbor additional artifacts because of specifying this strand-specificity, other than reducing sequencing depth (in theory) by 50%?


We are planning to make some real strand-specific libraries in the future, so we can actually benefit from the RF flag. But in the mean time, I am wondering whether anyone can get his head around these peculiar results.


Kind regards, 

Margo

Reply all
Reply to author
Forward
0 new messages