profile of Ex vs. N50 value

197 views
Skip to first unread message

Maripaz Celorio

unread,
Aug 7, 2017, 8:06:15 AM8/7/17
to trinityrnaseq-users
Hi,
I have recently obtained the "contig Ex90N50 and Ex90 transcript count" of 4 transcriptomes. When plotting the EX vs N50 value I get 3 profiles looking decent (like the one with "more" reads in this tutorial:https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Contig-Nx-and-ExN50-stats) and one with the profile of the "small" in the example plot of the same tutorial. The E90N50 values are around 2 kb, the worse is 1.1 kb.

1. This will suggest that I need to have more reads to improve my transcriptome assemblies right?
2. However I ran an insilico normalization  previous to trinity assembly, so I am thinking that I must had enough reads anyhow, is this a correct assumption?
3. Should I run the assemblies without the insilico normalization and use that data instead for downstream analysis?

thanks a lot!


Maria

Mark Chapman

unread,
Aug 7, 2017, 8:10:20 AM8/7/17
to Maripaz Celorio, trinityrnaseq-users
Hi Maria,

Is this for four projects or one? If it's just one you would normally do a single assembly.
If it's four, then for one it sounds like your sequencing depth isn't enough for a 'thorough' transcriptome, but this doesn't mean it's no use.
Normalising doesn't have any bearing on whether the data was good in the first place. You could run normalisation on 10 reads, but this doesn't mean 10 reads is sufficient.
What do you mean downstream? For creating more assemblies then you can use the same normalised reads. If you're doing gene expression analysis then use your raw (trimmed) data

Cheers. Mark



--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ
Message has been deleted

Mark Chapman

unread,
Aug 9, 2017, 9:03:50 AM8/9/17
to Maripaz Celorio, trinityrnaseq-users
Hi Maria,
The normalisation just gets rid of over-represented reads so for a highly expressed gene you dont need millions of reads to assemble it, so the normalisation reduces max coverage to 50, speeding up assembly, but shouldnt get rid of anything rare or unique.
I don't think AS would necesarily make youre assembly look 'bad'. The ExN50 plot shouldnt be significantly affected anyway.
For your error does this occur with just one of the four assemblies or all of them? Can you share the command you ran please.
Best wishes, Mark


On 9 August 2017 at 13:10, Maripaz Celorio <mpce...@gmail.com> wrote:
Hello Mark,

Thanks for your help. Yes it was for four projects.
I thought that by "normalizing" several rounds we might run the risk of "losing" reads in some way...
If you can help me "straight" up my concept of the normalization further I will appreciate it.
Also, would extensive alternative splicing give you something you can understand as bad quality of an assembly?

Another question...I get the following error after running RSEM and pretending to get ExN50 and Ex90N50 statistics:

Error, no seq length for acc: TRINITY_DN31440_c0_g4 at /data/programs/trinityrnaseq-Trinity-v2.3.2/util/misc/contig_ExN50_statistic.pl line 54, <$fh> line 2.

Can u help me fixing the issue?

Best regards,

Maria







Den måndag 7 augusti 2017 kl. 14:10:20 UTC+2 skrev Mark Chapman:
Hi Maria,

Is this for four projects or one? If it's just one you would normally do a single assembly.
If it's four, then for one it sounds like your sequencing depth isn't enough for a 'thorough' transcriptome, but this doesn't mean it's no use.
Normalising doesn't have any bearing on whether the data was good in the first place. You could run normalisation on 10 reads, but this doesn't mean 10 reads is sufficient.
What do you mean downstream? For creating more assemblies then you can use the same normalised reads. If you're doing gene expression analysis then use your raw (trimmed) data

Cheers. Mark


On 7 August 2017 at 13:06, Maripaz Celorio <mpce...@gmail.com> wrote:
Hi,
I have recently obtained the "contig Ex90N50 and Ex90 transcript count" of 4 transcriptomes. When plotting the EX vs N50 value I get 3 profiles looking decent (like the one with "more" reads in this tutorial:https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Contig-Nx-and-ExN50-stats) and one with the profile of the "small" in the example plot of the same tutorial. The E90N50 values are around 2 kb, the worse is 1.1 kb.

1. This will suggest that I need to have more reads to improve my transcriptome assemblies right?
2. However I ran an insilico normalization  previous to trinity assembly, so I am thinking that I must had enough reads anyhow, is this a correct assumption?
3. Should I run the assemblies without the insilico normalization and use that data instead for downstream analysis?

thanks a lot!


Maria

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages