Hi everyone,
I hope you're doing well.
I am working on a de novo assembly using Trinity with 21 paired-end libraries. I've performed a few test runs using different parameters and obtained the following statistics:
Assembly 1
Trinity parameters:
--seqType fq --left Rm01_1.trimmed.fq.gz, ..., Rm21_1.trimmed.fq.gz --right Rm01_2.trimmed.fq.gz, ..., Rm21_2.trimmed.fq.gz --CPU 6 --max_memory 20G
Total Trinity 'genes': 12,887
Total Trinity transcripts: 40,183
GC content: 62.96%
Stats for all transcripts:
N50: 7,327 bp
Median contig length: 1,999 bp
Average contig length: 3,542.02 bp
Total assembled bases: 142,328,846
Stats for only the longest isoform per gene:
N50: 5,580 bp
Median contig length: 328 bp
Average contig length: 1,478.32 bp
Total assembled bases: 19,051,173
Assembly 2
Trinity parameters:
Same as above, with --min_kmer_cov 2
Total Trinity 'genes': 8,464
Total Trinity transcripts: 36,175
GC content: 63.05%
Stats for all transcripts:
N50: 7,226 bp
Median contig length: 2,825 bp
Average contig length: 4,093.31 bp
Total assembled bases: 148,075,523
Stats for only the longest isoform per gene:
N50: 5,960 bp
Median contig length: 394.5 bp
Average contig length: 2,042.10 bp
Total assembled bases: 17,284,338
Assembly 3
Trinity parameters:
Same as above, with --min_kmer_cov 3 --jaccard_clip
Total Trinity 'genes': 9,741
Total Trinity transcripts: 41,759
GC content: 62.89%
Stats for all transcripts:
N50: 3,406 bp
Median contig length: 1,889 bp
Average contig length: 2,360.00 bp
Total assembled bases: 98,551,080
Stats for only the longest isoform per gene:
N50: 3,045 bp
Median contig length: 1,216 bp
Average contig length: 1,730.70 bp
Total assembled bases: 16,858,754
In the first two assemblies, I observed excellent N50 values and general metrics. However, I noticed a dramatic decrease in the median isoform length. In contrast, the third assembly, while having a lower N50, shows a significantly higher median isoform length.
I am concerned that the decrease in isoform lengths could affect downstream analyses such as functional annotation and differential expression. I would greatly appreciate your thoughts and feedback on these results.
Best regards,
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/trinityrnaseq-users/26dd423c-90de-4256-89e7-b9daf737e069n%40googlegroups.com.