--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/789c5316-6d8f-4833-91d3-c62dc6f5a2a2%40googlegroups.com.
Hi David,That error looks like it typically arises due to the system not having the available resources to create another thread, or you're about to hit some sort of system imposed max. I don't recall the default for WM_N_WORKERS; it's possible it might be 64 (or 128) if it's pulling that info from the OS. I wonder if AWS has some limitations on the number of threads that can be created?Judging from a Google search, it seems that the amount of virtual memory is involved in the calculation for the thread limit for a particular process. The following command on a linux machine should tell you how many threads you can have: cat /proc/sys/kernel/threads-maxOn my home machine, this yields 514054. Given the stats for the p3.16xlarge instance, I'd be surprised if this number is very low, but it's possible we're somehow trying to create way more threads than we anticipated. Do you have any log files from your run that include an output of the env command? In addition, it might not hurt to provide the result of the threads-max.Since it's going for a few iterations and then failing out, perhaps threads are lingering or not being re-used in the way we think they would. Is membrane_prod.py code you've written? Are you attempting to fork or use multiprocessing within that script?Best,Audrey
On Thu, 13 Feb 2020 at 08:13, David LeBard <david...@eyesopen.com> wrote:
Hi WESTPA folks,--I am trying to run a WESTPA simulation on a single AWS p3.16xlarge instance that is packed with 64 hyperthreaded cores, 8 V100 GPUs, and uses Amazon's flavor of linux. Unfortunately, if I run the simulation as I normally would by setting the CUDA_VISIBLE_DEVICES to be a modulus of the WM_PROCESS_ID, I can run for a few iterations then I reproducibly hit this strange error and the simulation stops:+ python /home/ec2-user/membrane/common_files/membrane_prod.pyERROR; return code from pthread_create() is 11Error detail: Resource temporarily unavailableIt seems I can mitigate the problem by setting the WM_N_WORKERS variable to be 2x the number of GPUs (i.e. 16 workers), but this seems like it should be unnecessary and might not be the actual fix. I have tried using both the and threads processes work managers and both have this problem.Has anyone else run into similar issues as this? Do you think it could be due to the hyperthreading of the cores, and if so, should I turn that "feature" off? Or, are there better fixes out there that others know about?I should also mention that I have run this simulation successfully across 4x K80s on a local GPU node, and on single GTX1080 on my local workstation without any issues.Thanks in advance,David
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/789c5316-6d8f-4833-91d3-c62dc6f5a2a2%40googlegroups.com.
Hi David,
That error looks like it typically arises due to the system not having the available resources to create another thread,oryou're about to hit some sort of system imposed max. I don't recall the default for WM_N_WORKERS; it's possible it might be 64 (or 128) if it's pulling that info from the OS. I wonder if AWS has some limitations on the number of threads that can be created?
Could it be a memory issue when it’s spawning the new processes? A simple way to check that would be through just watching the “top” command as your code runs, and keep an eye on the memory usage column.
From: westpa...@googlegroups.com <westpa...@googlegroups.com> on behalf of David LeBard <david...@eyesopen.com>
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/72e5746b-be59-4444-bd52-67d67f59724d%40googlegroups.com.