align.AlignTraj memory issue

205 views
Skip to first unread message

John-Paul Alao

unread,
Apr 16, 2021, 10:52:49 AM4/16/21
to MDnalysis discussion
Hi, I'm looping through different trajectories, after like 2 loops, my memory is full so I had to consider using "in_memory=False" instead of "in_memory=True".  However, i got the error message       OSError: DCD file does not exist

My code is below, Please how do resolve this? Or is there an efficient way to clear the memory after each loop?

    aligner = align.AlignTraj(u, ref,
                              select='protein and name CA',
                              filename='aligned_traj.dcd',
                              in_memory=False).run()
    value = mda.Universe('protein.psf', 'aligned_traj.dcd')

Thanks.


Tamas Hegedus

unread,
Apr 16, 2021, 1:39:30 PM4/16/21
to mdnalysis-...@googlegroups.com
I have the feeling that your code is incomplete and can not be understood.
Please provide it with the full code including the loop. It is also good to indicate, which line throw the OSError.
--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/e806f829-d061-45cf-9079-dadc1e4f19cdn%40googlegroups.com.

Alao, John-Paul

unread,
Apr 16, 2021, 4:09:40 PM4/16/21
to mdnalysis-...@googlegroups.com
#https://userguide.mdanalysis.org/stable/examples/analysis/alignment_and_rms/rmsf.html

uni = {'AN': u, 'APP' : u1 , 'PPI' : u2, 'PPA': u3, 'AO': u4}
df_collection = []
for key, value in uni.items():
   
    protein = value.select_atoms("protein")
    prealigner = align.AlignTraj(value, value, select="protein and name CA",
                                 in_memory=False).run()
# ref = average structure
    ref_coordinates = value.trajectory.timeseries(asel=protein).mean(axis=1)
# Make a reference structure (need to reshape into a1-frame "trajectory").
    ref = mda.Merge(protein).load_new(ref_coordinates[:, None, :],
                                  order="afc")

#Align to ref using in memory=True
  #  aligner = align.AlignTraj(value, ref,
   #                           select='protein and name CA',
    #                          in_memory=True).run()

## Align without using in memory


 aligner = align.AlignTraj(u, ref,
                           select='protein and name CA',
                           filename='aligned_traj.dcd',
                           in_memory=False).run()
  u = mda.Universe('5uls-anp/prep/namd/step3_charmm2namd.psf', 'aligned_traj.dcd')
    c_alphas = value.select_atoms('protein and name CA')
    R = rms.RMSF(c_alphas).run()
   
    df=pd.DataFrame(c_alphas.resids)
    df['rmsf'] = R.rmsf
    df_collection.append(df)
    c_alphas.atoms.write('CA_{0}.pdb'.format(key))



This is the full code and the bolded and underlined part is what gave the error OSError: DCD file does not exist.

Using in_memory=True gave the error MemoryError: Unable to allocate 6.38 GiB for an array with shape (285542, 2000, 3) and data type float32 after the second loop iteration.

Thank you in advance.


Tamas Hegedus

unread,
Apr 16, 2021, 5:11:10 PM4/16/21
to mdnalysis-...@googlegroups.com
Important: I have never done alignment w MDAanlysis; I use the md engine tools (e.g. gromacs tools) to create a merged and aligned trajectory then I use that one for subsequent analysis.


uni = {'AN': u, 'APP' : u1 , 'PPI' : u2, 'PPA': u3, 'AO': u4}
df_collection = []
for key, value in uni.items():
   
    protein = value.select_atoms("protein")
    prealigner = align.AlignTraj(value, value, select="protein and name CA",
                                 in_memory=False).run()
Why do you do this "prealignment"?
Why do you align each trajectory to "itself"? Why not use the first frame of one of the  universes for reference?
If the in_memory is False, you should provide a file name, not?


# ref = average structure
    ref_coordinates = value.trajectory.timeseries(asel=protein).mean(axis=1)
# Make a reference structure (need to reshape into a1-frame "trajectory").
    ref = mda.Merge(protein).load_new(ref_coordinates[:, None, :],
                                  order="afc")
This ref seems to be in memory.


 aligner = align.AlignTraj(u, ref,
                           select='protein and name CA',
                           filename='aligned_traj.dcd',
                           in_memory=False).run()
  u = mda.Universe('5uls-anp/prep/namd/step3_charmm2namd.psf', 'aligned_traj.dcd')
You create the dcd file with AlignTraj. And I think that you write out only "u". It seems that you put the prealigned frames into ref. You want to save that (possibly). Did you check, if the dcd file was created? Is it in the current directory.
I guess the error says that AlignTraj did not create the dcd file.

**************
I would simplify the code. It seems that in the for cycle you read all the universes into the memory.
So I think this should work, pseudo-code:
uref = Universe(ref.psf)
umobile = Universe(ref.psf, [dcd, dcd1, dcd2, dcd3, dcd4])
AlignTraj(umobile, uref, "name CA", in_memory=True, verbose=True)
# OR if you get memory error here then:
#   AlignTraj(umobile, uref, "name CA", filename="alignedtrj.dcd", in_memory=False, verbose=True)
#   del uref ; del umobile
#   umobile = Universe(ref.psf, alignedtrj.dcd)
R = rms.RMSF(umobile.select_atoms("name CA"), verbose=True)


Oliver Beckstein

unread,
Apr 16, 2021, 8:20:35 PM4/16/21
to mdnalysis-discussion
Hi John-Paul,

Comments below:

On Apr 16, 2021, at 1:09 PM, Alao, John-Paul <ala...@miamioh.edu> wrote:


You can try enabling logging to get more output:

import MDAnalysis as mda
mda.start_logging()

uni = {'AN': u, 'APP' : u1 , 'PPI' : u2, 'PPA': u3, 'AO': u4}
df_collection = []
for key, value in uni.items():
   
    protein = value.select_atoms("protein")

the “prealigner” approach is outdated. You can use align.AverageStructure . But for completeness sake I’ll comment on the code:

 
    prealigner = align.AlignTraj(value, value, select="protein and name CA",
                                 in_memory=False).run()

If you don’t use in_memory=True then the next line will NOT create the average superimposed structure that is needed for the RMSF calculation.

# ref = average structure
    ref_coordinates = value.trajectory.timeseries(asel=protein).mean(axis=1)

The above line relies on the trajectory being superimposed. You either have to write out a superimposed trajectory and the average it or you need to use in-memory.

To make this work you need to use in_memory=True.



# Make a reference structure (need to reshape into a1-frame "trajectory").
    ref = mda.Merge(protein).load_new(ref_coordinates[:, None, :],
                                  order="afc")

#Align to ref using in memory=True
  #  aligner = align.AlignTraj(value, ref,
   #                           select='protein and name CA',
    #                          in_memory=True).run()

## Align without using in memory

 aligner = align.AlignTraj(u, ref,
                           select='protein and name CA',
                           filename='aligned_traj.dcd',
                           in_memory=False).run()

Did you check that this step wrote the output file?

Is u for some reason a Universe with an in-memory representation? Find out by printing

print(u.trajectory)

and see if it says MemoryReader.

If it’s a MemoryReader then AlignTraj will change the trajectory in memory only and ignore the filename but that’s not really documented. I had to look at the source to confirm this behavior.
 

  u = mda.Universe('5uls-anp/prep/namd/step3_charmm2namd.psf', 'aligned_traj.dcd')
    c_alphas = value.select_atoms('protein and name CA')
    R = rms.RMSF(c_alphas).run()
   
    df=pd.DataFrame(c_alphas.resids)
    df['rmsf'] = R.rmsf
    df_collection.append(df)
    c_alphas.atoms.write('CA_{0}.pdb'.format(key))



This is the full code and the bolded and underlined part is what gave the error OSError: DCD file does not exist.

To debug, run the code line by line interactively and check after each step that you get what you expect to get.


Using in_memory=True gave the error MemoryError: Unable to allocate 6.38 GiB for an array with shape (285542, 2000, 3) and data type float32 after the second loop iteration.

This sounds as if your in-memory universe was not cleaned up in time. You can try to explicitly deleting it and forcing Python to garbage collect the memory. Something like


import gc

del u
gc.collect()


However, that’s no guarantee. It’s really up to Python when it frees memory.


Hope that helps,
Oliver



Thank you in advance.


On Fri, Apr 16, 2021 at 1:39 PM Tamas Hegedus <biohe...@gmail.com> wrote:
I have the feeling that your code is incomplete and can not be understood.
Please provide it with the full code including the loop. It is also good to indicate, which line throw the OSError.

On 4/16/21 4:51 PM, John-Paul Alao wrote:
Hi, I'm looping through different trajectories, after like 2 loops, my memory is full so I had to consider using "in_memory=False" instead of "in_memory=True".  However, i got the error message       OSError: DCD file does not exist

My code is below, Please how do resolve this? Or is there an efficient way to clear the memory after each loop?

    aligner = align.AlignTraj(u, ref,
                              select='protein and name CA',
                              filename='aligned_traj.dcd',
                              in_memory=False).run()
    value = mda.Universe('protein.psf', 'aligned_traj.dcd')

Thanks.


--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/e806f829-d061-45cf-9079-dadc1e4f19cdn%40googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/2240a991-9d42-010d-4b9c-5a87ed51b3a0%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.

--
Oliver Beckstein (he/his/him)


MDAnalysis – a NumFOCUS fiscally sponsored project




Reply all
Reply to author
Forward
0 new messages