MCore failed after a while running

31 views
Skip to first unread message

HUGO MUÑOZ HERNÁNDEZ

unread,
Oct 27, 2025, 4:36:01 AM (10 days ago) Oct 27
to Warp
Hi everyone! 

 am running a MCore job on a workstation - Ubuntu 22.04.5 LTS (GNU/Linux 6.8.0-85-generic x86_64) with four GPUs RTX A5500 24 G and I get this error after a while of the MCore job running. 

1 - Can you help me to solve the issue? 
2 - Is there a way to continue the job from there? 

Thanks in advance, 
Hugo 



MCore --population M_populations/Cs_W_R_M.population --iter 1 --refine_imagewarp 3x3 --refine_particles --devicelist  0 1 2 3 --perdevice_refine 1 --port -1                                                                                    
Loading population... Done
Creating directories... Done
Spawning workers... Done
Preparing for refinement – this will take a few minutes per species
Preparing refinement requisites...
1/1                                                                                                                                                                                                                                                                            
Performing refinement
Preparing population for data source Cs_W_R_M...Done
Loading gain reference for Cs_W_R_M... Done
Refining all series in data source...
1584/2108terminate called after throwing an instance of 'std::runtime_error'                                                                                                                                     what():  cuFFT error: CUFFT_INTERNAL_ERROR at /programs/x86_64-linux/warp/2.0.0dev36/warp/NativeAcceleration/gtom/src/FFT/FFT.cu:23

Unhandled exception. Unhandled exception. System.NotImplementedException: The method or operation is not implemented.
   at MCore.MCore.WorkerDied(Object sender, EventArgs e) in /programs/x86_64-linux/warp/2.0.0dev36/warp/MCore/MCore.cs:line 599
   at Warp.WorkerWrapper.ReportDeath() in /programs/x86_64-linux/warp/2.0.0dev36/warp/WarpLib/WorkerWrapper.cs:line 264
   at Warp.WorkerWrapper.<StartHeartbeat>b__14_0() in /programs/x86_64-linux/warp/2.0.0dev36/warp/WarpLib/WorkerWrapper.cs:line 190
/programs/share/capsules/lib/job.sh: line 126:  4872 Aborted                 (core dumped) "$SB_EXECFILE" "$@"

Alister Burt

unread,
Oct 27, 2025, 8:54:50 AM (10 days ago) Oct 27
to HUGO MUÑOZ HERNÁNDEZ, Warp
Hi Hugo,

cuFFT is usually because you run out of memory

Things that define memory usage
- pixel spacing at which you’re doing refinement
- number of particles per image 
- parameters you’re refining

Cheers,

Alister

Sent from mobile - apologies for brevity

On Oct 27, 2025, at 01:36, HUGO MUÑOZ HERNÁNDEZ <h.mu...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "Warp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warp-em+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/warp-em/3652efa0-9eaf-41b8-8ef6-58abbc1e7a1en%40googlegroups.com.

Alister Burt

unread,
Oct 27, 2025, 8:55:16 AM (10 days ago) Oct 27
to HUGO MUÑOZ HERNÁNDEZ, Warp
Oh, and no way to continue from a partially completely job as far as I know

Sent from mobile - apologies for brevity

On Oct 27, 2025, at 01:36, HUGO MUÑOZ HERNÁNDEZ <h.mu...@gmail.com> wrote:


--

HUGO MUÑOZ HERNÁNDEZ

unread,
Oct 27, 2025, 10:30:39 AM (10 days ago) Oct 27
to Alister Burt, Warp
Thank you for your quick response, Alister!

I am still not fully familiar with MCore. How would it be possible to reduce the number of particles per image? I have some  like ~300/image, but I wonder if with fewer particles/image the refinement will not be as accurate as expected. I aim for real high resolution.

Best,
Hugo 

Alister Burt

unread,
Oct 27, 2025, 11:07:47 AM (10 days ago) Oct 27
to HUGO MUÑOZ HERNÁNDEZ, Warp
It’s not a parameter you have much control over unless you have significantly overpicked, we sometimes see people who have many thousands of (not real) particles per tilt series that they push through without critically considering whether or not their picks are sane :-)


Sent from mobile - apologies for brevity

On Oct 27, 2025, at 07:30, HUGO MUÑOZ HERNÁNDEZ <h.mu...@gmail.com> wrote:

Thank you for your quick response, Alister!

Hamidreza Rahmani

unread,
Oct 27, 2025, 11:39:43 AM (10 days ago) Oct 27
to Alister Burt, HUGO MUÑOZ HERNÁNDEZ, Warp
Hi Hugo, 

By any chance any of your GPUs is the display GPU? I had this problem that limiting myself to the "free" GPUs or even trying to go for 1 GPU after a reboot helped. We have ATX 5000s and I can go up to box size of 458^3 using --cpu_memory.

Best,
Hamid

HUGO MUÑOZ HERNÁNDEZ

unread,
Oct 27, 2025, 11:54:36 AM (10 days ago) Oct 27
to Hamidreza Rahmani, Warp
Hi Hamid,

It is very kind of you to offer your advice. This workstation is placed in a server room without a display, my box size is 320, but I had a similar intuition to send the job to a single GPU using the “cpu_memory" option ( and after a reboot - I don’t know why after crash MCore had a few ghost jobs still running) . 


I am hopeful but it will take some time, so far no crash 36/2108.


Best,
Hugo 

Alister Burt

unread,
Oct 27, 2025, 1:56:02 PM (10 days ago) Oct 27
to HUGO MUÑOZ HERNÁNDEZ, Hamidreza Rahmani, Warp
2000 tilt series!!!!

Good luck and sorry there isn't an easier method for resuming from crashes - if you hit crashes again it might be worth running a bunch of test jobs on small subsets to identify problematic tilt series

Cheers,

Alister

Reply all
Reply to author
Forward
0 new messages