Issues parallelizing gl.LDNe()

16 views
Skip to first unread message

João Pedro Fontenelle

unread,
Mar 19, 2025, 9:48:35 AMMar 19
to dartR
Hello everyone,

Thanks for accepting my request to join this discussion group.

I have been using dartR for a while but recently I've faced an issue that I am having a hard time to address.

I've build a model that simulates demographic variations of populations where the rapidly contract or expand. One of the metrics that I am interested is inferring Ne from generated .vcf files.

I have different demography treatments and for each, several replicates. For each replicate, I "sample" vcf files in different opportunities, and that is the challenge I am facing. This way, I have over 15000 vcf files that I need to explore to infer Ne.

My idea was to deploy gl.LDNe() in parallel to speed up my process. I've scripted these two ways:

1) Opening nodes for the replicates, and deploying gl.LDNe() for the samples within each replicate in a serialized manner.

Node1 -> Replicate1 -> sample1, sample2, sample3,..., sampleN.
Node2-> Replicate2 -> sample1, sample2, ....
...
NodeN -> ReplicateN -> sample1, sample2,....


2) Opening nodes for the samples:

Replicate1 -> Node1 -> Sample1
                    -> Node2 -> Sample2
                    -> Node3 -> Sample3
                     ...

With both approaches, I face the same issue: Some of the samples are not properly read/processed. I get the following message from dartR:


Population 1 [Pop1_i0] Data of sample XXX end too soon. No population is run!

XXX varies. It is not the same every time I try the inference. For example, in a simple case with 10 samples in a replicate, in one attempt this "error" can occur in sample 3 and sample 10. Running it again, the same dataset or approach, the "error" can be in sample 1 and 2. Or 7. Or none.

I've searched the discussion group and couldn't find any topic that has discussed this.

Because the sample that presents the "error" varies from attempt to attempt, I believe the problem is not with my data. In fact, if I don't parallelize it, I don't get any issues.

I've tried to parallelize using parallel/mclapply and using doMC/plyr::llply, both with the same results.

I am using what is the most recent version of dartR.popgen that is available on CRAN.

Has anyone seen anything related to this before? Any insights?

Thank you very much for your help!

Cheers

JP




Bernd.Gruber

unread,
Mar 19, 2025, 6:29:22 PMMar 19
to da...@googlegroups.com, Isobel.Walcott, Robyn.Shaw, Richard.Duncan
Hi Jp

Sounds like an interesting project. We did a similar (simulating heaps of trajectories on but looked at historic population sizes. 

Would be interesting to see your results. 

In regards to parallelizing. In principle that should work. To understand better can you let me know the dartRverse versions you are using run dartRverse_install()

and also you are running the parallel version under windows ? Or Linux or Mac

Cheers Bernd
---------


On 20 Mar 2025, at 00:48, João Pedro Fontenelle <font...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/54e00dfc-6aee-4e64-a131-198d4bf570c9n%40googlegroups.com.

João Pedro Fontenelle

unread,
Mar 20, 2025, 9:38:00 AMMar 20
to da...@googlegroups.com, Isobel.Walcott, Robyn.Shaw, Richard.Duncan
Hi Bernd,

Good to hear that I am not the only one with somewhat crazy ideas like this hahaha

I will make sure to share the results once they are good to go.

This is what we have:

> dartRverse_install()

dartRverse packages:
 dartR.base   0.65 | CRAN: 1.0.5 | Github: 1.0.5 (main) | 1.0.5 (beta) | 1.0.5 (dev) 
 dartR.data   1.0.2 | CRAN: 1.0.8 | Github: 1.0.8 (main) | 1.0.8 (beta) | 1.0.8 (dev)
 dartR.popgen 1.0.0 | CRAN: 1.0.0 | Github: 1.0.3 (main) | 1.0.3 (beta) | 1.0.5 (dev)
 dartR.sim       ---  | CRAN:0.70 | Github:  0.70 (main) | 0.89 (beta) | 0.94 (dev)    
 dartR.spatial   ---  | CRAN:0.78 | Github:  0.78 (main) | 0.89 (beta) | 0.89 (dev)    
 dartR.captive   ---  | CRAN:1.0.2 | Github:  1.0.2 (main) | 1.0.2 (beta) | 1.0.2 (dev)
 dartR.sexlinked ---  | CRAN:1.0.5 | Github:  1.0.5 (main) | 1.0.5 (beta) | 1.0.5 (dev)

I am running this on Linux, mainly on a Ubuntu cluster, but I also test smaller tasks on my linux (arch-based) laptop.

Yeah, I have reached out to a few colleagues to double check about the parallelization, and they all gave me the same feedback, that in theory it should work... My wild guess is that for some reason, when parallelizing, the process of reading the genlight comes short? Is that what the "error" message refers to? It is intriguing how the failures are not consistent. I've tested with larger (more individuals) and smaller (less individuals) datasets and that doesn't seem to be the problem either.

Thanks for the help!

Cheers

JP

You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/QVT2w6115y0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/107F7256-99DA-421C-809C-776D4468BEAC%40canberra.edu.au.


--

_______{: }---'-

João Pedro (JP) Fontenelle, PhD (he/him)

OG-CO Postdoctoral Fellow in Genome Data Science

João Pedro Fontenelle

unread,
Mar 20, 2025, 9:40:01 AMMar 20
to da...@googlegroups.com, Isobel.Walcott, Robyn.Shaw, Richard.Duncan
Ah, quick follow up:

I thought that maybe the problem was creating temporary files with the same name would be the issue, so I even added a "random number" to it, but no luck:

dartR.popgen::gl.LDNe(glight,outfile = paste0("LD",floor(runif(1, 0, 66666)),".txt"), singleton.rm = F, neest.path = neestdir, critical = c(0,0.05), plot.out = F)


Reply all
Reply to author
Forward
0 new messages