Hi all,
I'm running into a similar problem, maybe the same one so I thought I would respond in this thread. I read the above discussion and the related issue on GitHub and updated WarpTools to dev34 yesterday, which if I understood correctly contains the parallelization fix (am not 100% sure about that though - 'pip list' lists warp 1.0.4, during the install I did see 2.0.0dev34 zoom by). Some example errors pasted below. It seems to happen a bit randomly. Sometimes I get lucky and it doesn't occur, but other times with the same command and on the same GPU node I do see it.
Example 1
MCore --population m/tric.population --refine_particles
Loading population... Done
Creating directories... Done
Spawning workers... Done
Preparing for refinement – this will take a few minutes per species
Preparing refinement requisites...
1/1
Performing refinement
Preparing population for data source tric...Done
Loading gain reference for tric... Done
Refining all series in data source...
612/612
Commiting changes in tric...Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception.Unhandled exception. System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
Aborted (core dumped)
Example 2
(warp) mlast@fmg58 slurm:5830122 [HeLa_MPA_merged]: MCore --population m/tric.population --iter 0 --perdevice_refine 8
Loading population... Done
Creating directories... Done
Spawning workers... Done
Preparing for refinement – this will take a few minutes per species
Preparing refinement requisites...
1/1
Performing refinement
Preparing population for data source tric...Done
Loading gain reference for tric... Done
Refining all series in data source...
612/612
Commiting changes in tric...Unhandled exception.Unhandled exception.Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
Real-time signal 0
Example 3 (the command in this example did once complete succesfully in a previous run, but most of the time it raises this error)
(warp) mlast@fmg58 slurm:5830122 [HeLa_MPA_merged]: MCore --population m/tric.population --iter 0 --perdevice_refine 1
Loading population... Done
Creating directories... Done
Spawning workers... Done
Preparing for refinement – this will take a few minutes per species
Preparing refinement requisites...
1/1
Performing refinement
Preparing population for data source tric...Done
Loading gain reference for tric... Done
Refining all series in data source...
612/612
Commiting changes in tric...Unhandled exception.Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190System.NotImplementedException: The method or operation is not implemented.
at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/MCore/MCore.cs:line 599
at Warp.WorkerWrapper.ReportDeath() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 264
at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /home/runner/micromamba/envs/package-build/conda-bld/warp_1750964440471/work/WarpLib/WorkerWrapper.cs:line 190
Aborted (core dumped)
Thanks in advance for the help!
Best wishes,
Mart
Op zondag 29 juni 2025 om 11:11:20 UTC+1 schreef Bin Cai: