Request for Feedback on Trinity de novo Assembly Results

11 views
Skip to first unread message

Miguel

unread,
Jul 17, 2025, 3:09:39 PMJul 17
to trinityrnaseq-users

Hi everyone,

I hope you're doing well.

I am working on a de novo assembly using Trinity with 21 paired-end librariesI've performed a few test runs using different parameters and obtained the following statistics:


Assembly 1
Trinity parameters:
--seqType fq --left Rm01_1.trimmed.fq.gz, ..., Rm21_1.trimmed.fq.gz --right Rm01_2.trimmed.fq.gz, ..., Rm21_2.trimmed.fq.gz --CPU 6 --max_memory 20G

  • Total Trinity 'genes': 12,887

  • Total Trinity transcripts: 40,183

  • GC content: 62.96%

Stats for all transcripts:

  • N50: 7,327 bp

  • Median contig length: 1,999 bp

  • Average contig length: 3,542.02 bp

  • Total assembled bases: 142,328,846

Stats for only the longest isoform per gene:

  • N50: 5,580 bp

  • Median contig length: 328 bp

  • Average contig length: 1,478.32 bp

  • Total assembled bases: 19,051,173


Assembly 2
Trinity parameters:
Same as abovewith --min_kmer_cov 2

  • Total Trinity 'genes': 8,464

  • Total Trinity transcripts: 36,175

  • GC content: 63.05%

Stats for all transcripts:

  • N50: 7,226 bp

  • Median contig length: 2,825 bp

  • Average contig length: 4,093.31 bp

  • Total assembled bases: 148,075,523

Stats for only the longest isoform per gene:

  • N50: 5,960 bp

  • Median contig length: 394.5 bp

  • Average contig length: 2,042.10 bp

  • Total assembled bases: 17,284,338


Assembly 3
Trinity parameters:
Same as abovewith --min_kmer_cov 3 --jaccard_clip

  • Total Trinity 'genes': 9,741

  • Total Trinity transcripts: 41,759

  • GC content: 62.89%

Stats for all transcripts:

  • N50: 3,406 bp

  • Median contig length: 1,889 bp

  • Average contig length: 2,360.00 bp

  • Total assembled bases: 98,551,080

Stats for only the longest isoform per gene:

  • N50: 3,045 bp

  • Median contig length: 1,216 bp

  • Average contig length: 1,730.70 bp

  • Total assembled bases: 16,858,754


In the first two assembliesI observed excellent N50 values and general metricsHoweverI noticed a dramatic decrease in the median isoform lengthIn contrastthe third assemblywhile having a lower N50, shows a significantly higher median isoform length.

I am concerned that the decrease in isoform lengths could affect downstream analyses such as functional annotation and differential expressionI would greatly appreciate your thoughts and feedback on these results.

Best regards,

Brian Haas

unread,
Jul 18, 2025, 1:25:50 PMJul 18
to Miguel, trinityrnaseq-users
Hi,

We have some documentation here on various QC stats we use for Trinity:
https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

I'd focus more on the number of full-length transcripts and the ExN50 values - an N50 calc that's expression-aware.

BUSCO is also super useful.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/trinityrnaseq-users/26dd423c-90de-4256-89e7-b9daf737e069n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Reply all
Reply to author
Forward
0 new messages