--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hahnlabcafe/cb382db0-3379-4bb6-8c80-f791fdfc0bd1n%40googlegroups.com.
On Apr 4, 2025, at 11:20 AM, Flora Uesseler <fues...@gmail.com> wrote:
You don't often get email from fues...@gmail.com. Learn why this is important Here's the code where I 1)simulate 2) run global model on simulated data and 3) run 2-lambda model on simulated data. Actually, I mistyped in my previous message and so far only have 100 simulated data. The other +900 are still running.Hello Dr. Hahn,Thank you for your reply! I can give more information, maybe then the problem becomes more easy to pinpoint. My dataset (clustered with OrthoMCL following steps in your CAFE tutorial) has around 11 thousand clusters. So for similar size in each simulation I produce 10k gene families.
LAMBDA=0.00076025704930212 #single
#for some reason running simulation with error model always fails, so simulate without:
$CAFE5 -s10000 -i ../all_by_all/filtered_cafe_input.txt -t $TREE -l $LAMBDA -o sim10000_single_lambda_${SLURM_ARRAY_TASK_ID}
$CAFE5 -i sim10000_single_lambda_${SLURM_ARRAY_TASK_ID}/simulation.txt -t $TREE -l $LAMBDA -e${ERROR} -o sim10000_single_lambda_${SLURM_ARRAY_TASK_ID}/results LAMBDA=0.0030770389046543,0.0005562107568264 # 2-lambda
$CAFE5 -i sim10000_single_lambda_${SLURM_ARRAY_TASK_ID}/simulation.txt -t $TREE -l $LAMBDA -e${ERROR} -y $LAMBDA_TREE -o sim10000_single_lambda_${SLURM_ARRAY_TASK_ID}/results_multi_lambda_Galapagos
The lambda are from running the global vs 2-lambda model on my real data (and checking for convergence in 3 separate runs) like this:
$CAFE5 -i ../all_by_all/filtered_cafe_input.txt -t $TREE --cores $SLURM_CPUS_PER_TASK -e${ERROR} -o results_with_errormodel$CAFE5 -i ../all_by_all/filtered_cafe_input.txt -t $TREE --cores $SLURM_CPUS_PER_TASK -y $LAMBDA_TREE -e${ERROR} -o results_with_error
From my 100 simulations I took the "Model Base Final Likelihood (-lnL):" values from Base_results.txt (100 respectively, for running under Global Lambda Model and for running under 2-Lambda Model). I attach a plot of the distribution to this mail.
Regardless of the details. the 2-rate model should always fit slightly better, and so have the same sign as the contrast on your real data. The fact that this isn’t true suggests something is amiss.
This is true for my real data. Here I get -lnL 81508 for the global rate model and 80730 for the 2-rate model - so the likelihood shows a better fit for the 2-rate model. But if I check my simulations, the opposite is true - the -lnL are around 16.000 for the global rate model and around 18.000 for the 2-rate model.
Unfortunately I was unable to understand the structure and purpose of the root distribution file. I know there are examples on the GitHub page, but as I did not understand, I could not recreate it with my data to use for simulations. Maybe not using it is causing my simulations to behave so weirdly? If you also think so, I would be very grateful for some explanation on the root distribution, so I can try it.
Thank you a lot and have a nice weekend,
F
cheers,Matt
<LRT_distribution_simulations.png><LRT_distribution_simulations.png>