multi species comparative transcriptomics and differential expression analyses

Emily T

unread,

Jan 22, 2024, 3:20:03 PMJan 22

to trinityrnaseq-users

Hello everyone,

I want to do differential expression analyses on six species of closely related fish (with no reference genomes) in the same family, where there are three clades which contain a large and a small species:

Clade 1: Species A (large) vs Species B (small)

Clade 2: Species C (large) vs Species D (small)

Clade 3: Species E (large) vs Species F (small)

I have muscle and liver tissue for 3 individuals per species. The idea is to try and find differentially expressed genes involved in body size that are common across all 3 clades (i.e. is there convergence on a macroevolutionary scale?).

I am worried that I will be comparing apples to oranges because they are different species, so my initial thought was to run de novo assemblies for each of the six species separately in Trinity (combining all the tissue and biological replicates per species). Then I read here that I should perhaps run TransDecoder to convert the output Trinity fastas to protein sequences, then run OrthoFinder to search for single copy orthologs between each species pair (i.e. A vs B). Then I think I can somehow do differential expression on the OrthoFinder output (?)

Am I on the right track? This is a very new field for me, so wanted to seek out some advice.

Thank you!

Mark Chapman

unread,

Jan 22, 2024, 7:44:53 PMJan 22

to Emily T, trinityrnaseq-users

Hi Emily,

We tried to do this, but just for one species pair, so if you follow this you'd probably do it three times.

Essentially we found that doing one assembly and mapping reads from both species was a worse approach that assembling the two species, mapping within species and then identifying orthogroups.

https://academic.oup.com/g3journal/article/13/10/jkad158/7227612

There's probably no one size fits all but just being aware that different approaches give you different results (and we even found a different biological interpretation of the results) is important.

Best wishes, Mark

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/37ac259a-ff25-406b-8a7c-d221b44033ean%40googlegroups.com.

Emily Troyer

unread,

Jan 22, 2024, 7:44:57 PMJan 22

to Mark Chapman, trinityrnaseq-users

Mark,

Thank you! Those pipeline results are incredibly useful.

Many thanks,
Emily

Lada Jovović

unread,

Feb 13, 2024, 7:22:33 AMFeb 13

to trinityrnaseq-users

Hi Emily,

I have a similar experimental design and had the same idea for the pipeline. Also, I am a super beginner in bioinformatics so I was excited to see someone is doing similar research :)

What I found is that:

- Orthofinder is useful because it gives you the nice output of single copy orthologues but it is very conservative and depending on the species number and divergence between them it might give you a small number of SCO so keep that in mind - meaning maybe the genes involved in the trait you are interested in won't come up (happened to me!). Think about including some additional species which you might find on TSA database

- there are other ways of inferring ortholog -check this out

- you might also want to remove the redundancy in your transcriptomes before converting them to proteins with Transdecoder (CD-HIT or Corset)

- for gene expression, I am also thinking that some kind of scaling or normalization factor should be used when doing cross-species comparisons because even when you define orthologues I guess it should be taken into account things such as gene length differences between different species. i am looking into this SCBN package for R, not sure if it's something recognized in the scientific community, also check this paper

- I like this paper regarding looking for convergence in gene expression so maybe check them out

https://academic.oup.com/icb/article/58/3/398/4995854

https://pubmed.ncbi.nlm.nih.gov/29788330/

- if you wish to search for selection signatures in your data think about making a phylogenetic time tree with MCMCTree (PAML package)

Trancriptomic cross-species comparison is challenging and not something I recommend for beginners in bioinfo such as myself so just take into account I am still learning and developing my pipelines and these are just some thoughts...Might be wrong on smoothing, but hopefully not. :)

Good luck!

Lada

Emily Troyer

unread,

Feb 20, 2024, 5:49:52 PMFeb 20

to Mark Chapman, trinityrnaseq-users

Hi Mark, thanks again for linking your paper. I've now assembled my species specific transcriptomes and successfully ran OrthoFinder. I have a question on how to set up the set up the gene_to_trans flag file when I quantify gene expression, similar to how you did in the paper, "transcripts from the same orthogroup identified as such using the gene_to_trans flag, therefore quantifying transcript expression at the level of the orthogroup."

I have my list of one-to-one orthologs between species from the OrthoFinder output, but it's not quite clear to me how to translate this into the gene_to_trans flag file needed for the 'align_and_estimate_abundance.pl' script. Would you be able to link an example?

Thanks!

On Mon, Jan 22, 2024 at 3:31 PM Mark Chapman <markcha...@gmail.com> wrote:

Reply all

Reply to author

Forward