RELION 3D Refine Crashes with WARP Subtomograms in Maximisation

102 views
Skip to first unread message

Euan William Pyle

unread,
Dec 12, 2024, 2:46:56 PM12/12/24
to Warp
Hi all,
Was in two minds about which mailing list to post to first, so sorry if this is the wrong place to ask. It's got three parts so I'll break it down.

I'm currently refining a fairly large number of particles at bin1 in RELION 4 using the classic WARP style 3D subtomograms as opposed to RELION 5 stacks. 

1. On the final iteration of 3D refine, RELION keeps crashing on the maximisation step (see pasted run.out and run.err at bottom). This consistently happens, across different commits of RELION 4, even if I use GPUs with huge amounts of memory, if I reduce threads to 1, if I use --free_gpu_memory 4000, and other memory saving tricks. I think it may be an issue with memory however, as other refinements, eg at bin2 and bin4, worked fine. Has anyone come across this and found a way around it? FWIW, RELION is crashing at the stage of outputting files called: run_half1_class001_data_image.mrc, run_half1_class001_data_real.mrc, run_half1_class001_data_weights.mrc, which I have never seen before, and I think are temporary files which are deleted after the generation of the _unfilterd.mrc.

2.  Instead of letting the final iteration complete, I tried to use relion_reconstruct to get the half maps and full map from the _data.star from the final iteration instead as a bit of a cheat:

relion_reconstruct --i final_data.star_from_refine --o out.mrc --ctf --3d_rot

but the reconstructed particle looks like a ribosome (yay), but much lower resolution and quality than the refinement iteration maps (boo). Something is going wrong, am I missing a flag?

3. I tried using the 2D stacks export tool and refining in RELION 5, but I believe the particles are being extracted in the wrong place. Using the exact same .star to  define coordinates, WARP extracts 3D subtomograms correctly (the averaged particle for each tomo is a white blob as expected) whereas I don't see the same in the 2D stacks. Does WARP require the input for 2D stacks to be centred coordinates as opposed to CoordinateXYZ? That's the only reason I could think of.

Sorry for the essay, I'm crowdsourcing knowledge as it's been a tricky thing to troubleshoot as the dataset is so large it takes a few days per test.

Thanks,
Euan




run.err:
corrupted size vs. prev_size
[gpu47:06576] *** Process received signal ***
[gpu47:06576] Signal: Aborted (6)
[gpu47:06576] Signal code:  (-6)
[gpu47:06576] [ 0] /lib64/libpthread.so.0(+0x12d20)[0x7fffe3fafd20]
[gpu47:06576] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fffe32fd52f]
[gpu47:06576] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fffe32d0e65]
[gpu47:06576] [ 3] /lib64/libc.so.6(+0x8f727)[0x7fffe333e727]
[gpu47:06576] [ 4] /lib64/libc.so.6(+0x96a2c)[0x7fffe3345a2c]
[gpu47:06576] [ 5] /lib64/libc.so.6(+0x972d6)[0x7fffe33462d6]
[gpu47:06576] [ 6] /lib64/libc.so.6(+0x97465)[0x7fffe3346465]
[gpu47:06576] [ 7] /lib64/libc.so.6(+0x99b28)[0x7fffe3348b28]
[gpu47:06576] [ 8] /lib64/libc.so.6(+0x9a65b)[0x7fffe334965b]
[gpu47:06576] [ 9] /lib64/libc.so.6(+0x9b6fa)[0x7fffe334a6fa]
[gpu47:06576] [10] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_malloc_plain+0x15)[0x7fffea70b0c5]
[gpu47:06576] [11] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x39c6d)[0x7fffea70cc6d]
[gpu47:06576] [12] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3a493)[0x7fffea70d493]
[gpu47:06576] [13] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3aeca)[0x7fffea70deca]
[gpu47:06576] [14] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_mkplan_d+0xf)[0x7fffea70e34f]
[gpu47:06576] [15] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_mkplan_f_d+0x59)[0x7fffea70e3c9]
[gpu47:06576] [16] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x40f89)[0x7fffea713f89]
[gpu47:06576] [17] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3ab4e)[0x7fffea70db4e]
[gpu47:06576] [18] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3adf2)[0x7fffea70ddf2]
[gpu47:06576] [19] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_mkplan_d+0xf)[0x7fffea70e34f]
[gpu47:06576] [20] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x46bec)[0x7fffea719bec]
[gpu47:06576] [21] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3ab4e)[0x7fffea70db4e]
[gpu47:06576] [22] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3adf2)[0x7fffea70ddf2]
[gpu47:06576] [23] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_mkplan_d+0xf)[0x7fffea70e34f]
[gpu47:06576] [24] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x41b83)[0x7fffea714b83]
[gpu47:06576] [25] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3ab4e)[0x7fffea70db4e]
[gpu47:06576] [26] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3adf2)[0x7fffea70ddf2]
[gpu47:06576] [27] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(fftw_mkplan_d+0xf)[0x7fffea70e34f]
[gpu47:06576] [28] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x46bec)[0x7fffea719bec]
[gpu47:06576] [29] /g/easybuild/x86_64/Rocky/8/genoa/software/FFTW.MPI/3.3.10-gompi-2023a/lib/libfftw3.so.3(+0x3ab4e)[0x7fffea70db4e]
[gpu47:06576] *** End of error message ***



run.out:
Expectation iteration 15
2.58/7.01 hrs ......................~~,_,">                                   [oo]
4.35/6.85 hrs .....................................~~(,_,">
5.67/6.79 hrs ..................................................~~(,_,>
6.77/6.77 hrs ............................................................~~(,_,">
 Averaging half-reconstructions up to 40 Angstrom resolution to prevent diverging orientations ...
 Note that only for higher resolutions the FSC-values are according to the gold-standard!
 Calculating gold-standard FSC ...
 Maximization ...
000/??? sec ~~(,_,">    


Alister Burt

unread,
Dec 12, 2024, 4:55:40 PM12/12/24
to Euan William Pyle, Warp
Hey Euan,

Sounds like a memory thing in RELION and you’ve tried the usual things to solve it… how big is your box? I know Pranav often uses much higher numbers for —free_gpu_memory…

Re: unsuccessful 2D export, the code for dealing with coordinates is shared between 2D and 3D export and they take the same inputs so I think there is some user error in your 2D export test… could you try that again and show the commands/per-ts-averages you use/get in each case?

Cheers,

Alister

Sent from mobile - apologies for brevity

On Dec 12, 2024, at 11:46, Euan William Pyle <euan...@embl.de> wrote:

Hi all,
--
You received this message because you are subscribed to the Google Groups "Warp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warp-em+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/warp-em/1461764d-ad22-4258-a21e-065f08f32f75n%40googlegroups.com.

Alister Burt

unread,
Dec 12, 2024, 4:57:13 PM12/12/24
to Euan William Pyle, Warp
Oh, also: we only guarantee compatibility with RELION 5 - you can run refinements with 3D subtomos from the single particle GUI

Cheers,

Alister

Sent from mobile - apologies for brevity

On Dec 12, 2024, at 11:46, Euan William Pyle <euan...@embl.de> wrote:

Hi all,

Euan William Pyle

unread,
Dec 13, 2024, 3:57:44 AM12/13/24
to Warp
Hi Alister, 
Thanks - you were right about the stacks, I thought I couldn't see the blob I was supposed to in the average.mrcs, what I needed to do was to increase the thickness in slicer to see it. Before I was only looking in one 2D slice. 

Running the job again now with high free-gpu-memory. The box is 192, so not outrageously large. 77k particles or so. Will update when job is finished.

Thanks,
Euan

Euan William Pyle

unread,
Dec 15, 2024, 11:23:18 AM12/15/24
to Warp
Cool so the problem is now solved. I'll write here what I found in case anyone else runs into a similar problem.

TL,DR: RELION refine crashing on maximisation step with 3D subtomograms, used 2D stacks instead, now fixed.

1. No matter what I tried to save memory in RELION it always crashed in the maximisation step of the final iteration. I still think it's a memory problem so I switched to 2D stacks in particle export instead of using 3D subtomograms. Initially I thought the particle extraction for 2D stacks was wrong as I couldnt see a particle in the *_average.mrcs file. I just needed to go into slicer in IMOD and increase thickness to see the particle, which showed there was no problem.

2. I experienced strange behaviour on my first refinements with the stacks (again why I thought something was going wrong) and the first few refinements looked like junk. This was probably something to do with the greyscale of the reference (even though my ref was RELION generated) as when I selected No on Ref. Map on Absolute Greyscale, suddenly the refinements looked much better. So, if switching from 3D subtomos to 2D stacks again in future I'll always do this. 

3. 2D stacks processed (much!!!) faster than 3D subtomograms (for obvious reasons), so I'll always use them for large datasets (either in box size or particle number).

4. Structure went to Nyquist, I love M. 

Alister Burt

unread,
Dec 16, 2024, 12:11:36 AM12/16/24
to Euan William Pyle, Warp
love to hear it Euan, thanks for reporting back - sounds like you’ll need smaller pixels moving forwards! 

Cheers,

Alister

Sent from mobile - apologies for brevity

On Dec 15, 2024, at 08:23, Euan William Pyle <euan...@embl.de> wrote:

Cool so the problem is now solved. I'll write here what I found in case anyone else runs into a similar problem.
Reply all
Reply to author
Forward
0 new messages