Simulation - Big gene family

52 views
Skip to first unread message

maxboys91

unread,
Sep 21, 2021, 7:21:23 AM9/21/21
to SimPhy
Hi ! 

First, thanks a lot for the SimPhy program that is very user friendly. 

Here is my question : 

I am PhD student (CNRS - France) working on olfactory receptor genes and I observed that 2 sub-families of these genes were co-evolving (correlation of the number of genes in a species tree). I would like to test the probabilities to observe this pattern only by chance. To do so, I want to simulate the evolution of gene families with a given or simulated species tree. 
I ran some simulations with SimPhy with this command, that (If I understand correctly) will generate 3 different gene families evolution scenario, for 100 different species trees : 

./SimPhy_1.0.2/bin/simphy_mac64 -sb f:0.000001 -ld f:0.0000005 -lb f:0.0000005 -lt f:0 -rs 100 -rl U:3,3 -rg 1 -o SimPhy_test -sp f:10000 -su f:0.00001 -sg f:1 -sl U:185,185 -st f:1000000 -om 1 -v 2 -od 1 -op 1 -oc 1 -on 1 -cs 50 -ll 1000 

Than I extracted the number of genes present in each of the species using R (let's say that "g_trees1.trees" is the first simulation for the first species tree) :

>library("ape")
>mytree <- read.tree("g_trees1.trees")
>all_labels <- mytree$tip.label
>sp_labels <- gsub("_.*","",all_labels)
> table(sp_labels)

With this command, I know that the species 89 have a 2 genes while the species 96 have 4 genes.

I am however facing some issues: 

First, which parameter should I change for all my species to be represented by at least 1 genes in the gene tree ? I tried to modify -ll and -ls without any success. 

The gene families I want to generate should also be way bigger (In the real dataset, I am working on olfactory receptor genes with hundreds of genes in each species). How can I achieve this ?

And finally, is that a good way to answer my initial goal ? 

Thanks a lot for your help and advices ,

Maxime Policarpo







Diego M.

unread,
Oct 12, 2021, 12:42:02 PM10/12/21
to SimPhy
Dear Maxime,

I am glad you are finding SimPhy useful for your project.

Answers to your specific questions:

Q0) I ran some simulations with SimPhy with this command, that (If I understand correctly) will generate 3 different gene families evolution scenario, for 100 different species trees
A0) That's correct.

Q1) First, which parameter should I change for all my species to be represented by at least 1 genes in the gene tree ? I tried to modify -ll and -ls without any success.
A1) SimPhy does not have an option to force scenarios in which all species carry at least one copy of the evolving gene family. However, I can think of two alternatives you could consider to use SimPhy under this requirement. The most optimal for you would depend on your specific biological question. i) Do not simulate gene loss: if you fix the loss rate to 0 ( -ld f:0 ) all species will have at least one copy from the gene family of interest. However, this may not be realistic to answer your specific biological problem. ii) Subsample the species tree to analyze only species with >1 gene copy.

Q2)  The gene families I want to generate should also be way bigger (In the real dataset, I am working on olfactory receptor genes with hundreds of genes in each species). How can I achieve this ?
A2) That will depend on the ratio between gain and loss rate (lb and ld) and the evolutionary time (species tree height and number of species).

Q3) And finally, is that a good way to answer my initial goal ? 
A3) It sounds like a good strategy to me.
  
I hope this helps,

DM

Reply all
Reply to author
Forward
0 new messages