Hoping to Please Check in on HydrogenBondAnalysis Questions

281 views
Skip to first unread message

Dina Sharon

unread,
Oct 4, 2021, 10:11:24 PM10/4/21
to mdnalysis-...@googlegroups.com

Dear MDAnalysis Team,

 

I thank you so much for making MDAnalysis an excellent toolkit, which is very helpful in my graduate school research! If possible, I am hoping to please check in on some questions about HydrogenBondAnalysis. I am including a code example below this email, which contains the observations noted in this email.

 

As the following code example below my email demonstrates, when I select water and specify acceptor atoms and hydrogen atoms in water molecules with the new module, I obtain fewer hydrogen bonds than when I select water and do not specify atoms with the new module. (If I understood section 4.3.1.4’s donors_sel bullet point in this link correctly, I should not specify donors, so I did not include donors_sel.) I was not sure how to interpret this, because the only hydrogens participating in hydrogen bonds when water is selected would be expected to be the water hydrogens, and the only acceptors participating in hydrogen bonds when water is selected would be expected to be the water oxygens. Also, the older hydrogen bond analysis package provides similar results to the new version in the with-atom-specifications run. (The older hydrogen bond analysis package appeared to produce the same results in my example both when I selected water/specified oxygens in both selections and when I selected water/did not specify atoms.) Furthermore, I checked hydrogen bonding participants in the output, and I found some unexpected atom designations in the output of the new version’s without-atom-specifications run. The code and printouts below have more details.

 

I wanted to please check in: would you recommend using the new version with or without selections? What types of checks would you suggest conducting to make sure my analysis includes all relevant hydrogen bonds and only relevant hydrogen bonds?

 

Also, it appears that, with some methods, there are fewer water-water hydrogen bonds than might be expected per water molecule: there are 3267 (newer method, specifying atoms)/5038 (newer method, not specifying atoms)/3781 (older method) hydrogen bonds, and 3444 water molecules. The code below has more information on this. Counting each hydrogen bond twice because two water molecules participate, I calculated 1.90 (newer method, specifying atoms)/2.93 (newer method, not specifying atoms)/2.20 (older method) water-water hydrogen bonds per water molecule (2*hydrogen bond count/3444 water molecules). While this preliminary initial analysis neglects protein-water hydrogen bonds, I was not sure if there would be, on average, over 1 protein-water hydrogen bond per water molecule across all the water molecules in the system, which would be the approximate amount needed to bring the two lower values of water hydrogen bond counts closer to the water hydrogen bond counts reported around room temperature in sources such as this paper and this paper. (I was assuming this simulation I analyzed in the code below was conducted around room temperature: I could not immediately find the temperature in the intestinal fatty acid binding protein simulation information.) If possible, I wanted to please check in: what could be causing this (I realize it may be due to the force field), and are there any settings you would suggest changing in the hydrogen bond analysis I am running?

 

Last, as I was running these checks, in the older module's documentation (section 4.3.2.1), I saw a mention that the output contains “the identities of donor and acceptor heavy-atoms…”, but my printout (below) is reporting only hydrogen atoms in the donor position. I wanted to please check in on what could be causing this situation (in light of my preliminary initial inspections thus far, I ran my checks for the older module to check for hydrogen, not oxygen, in the donor position of the output).

 

I would be very grateful for any advice or suggestions you could please provide on the best way to run the new module, and on checks I can conduct to make sure the new module is including all relevant hydrogen bonds and only relevant hydrogen bonds. I really appreciate all that you do to make MDAnalysis an excellent toolkit!

 

With very many thanks,

Dina

 

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

from MDAnalysis.analysis import hbonds

import MDAnalysisData

from MDAnalysisData import datasets

 

#Download test data (ref: https://www.mdanalysis.org/MDAnalysisData/usage.htmlhttps://www.mdanalysis.org/MDAnalysisData/ifabp_water.html#module-MDAnalysisData.ifabp_waterhttps://figshare.com/articles/dataset/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030/1)

ifabp_data = MDAnalysisData.ifabp_water.fetch_ifabp_water()

test_univ = MDAnalysis.Universe(ifabp_data.topology, ifabp_data.trajectory)

 

#Select waters and determine atom names and water count

waters = test_univ.select_atoms("resname TIP3")

unique_water_atom_names = list(set([water_check.name for water_check in waters]))

print(f"Unique water atom names {unique_water_atom_names}")

print(f"Water molecule count {len(waters) / 3}")

 

Printout:

Unique water atom names ['H1', 'H2', 'OH2']

Water molecule count 3444.0

 

#Create lists of atom ids of oxygen atoms and hydrogen atoms

oxygen_list = []

hydrogen_list = []

 

#For each atom, add its index to the appropriate list

for water_atom_check in waters:

    name_of_atom = water_atom_check.name

    id_of_atom = water_atom_check.index

    if (name_of_atom == "OH2"): #Collect oxygens

        oxygen_list.append(id_of_atom)

    if (name_of_atom in ["H1", "H2"]): #Collect hydrogens

        hydrogen_list.append(id_of_atom)


#Run H bond analyses for older and new modules, with and without specifying atoms (trying different selections)

#When running all 8 analyses together, received a deprecation warning and several selection warnings such as, “SelectionWarning: No acceptors found in selection 2. You might have to specify a custom 'acceptors' keyword. Selection will update so continuing with fingers crossed.

# warnings.warn(errmsg, category=SelectionWarning)”

 

step_size = 100000 #Make large for testing purposes

 

#New module

#ref: https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.htmlhttps://userguide.mdanalysis.org/dev/examples/analysis/hydrogen_bonds/hbonds.html and other webpages in this section were very helpful 

 

#New module, specifying atoms: OH2 and H*

w_w_hb_new = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ, acceptors_sel = "resname TIP3 and name OH2", hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_new.run(step = step_size)

 

#New module, specifying atoms: OH2 and OH2

w_w_hb_o_o_new = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ, acceptors_sel = "resname TIP3 and name OH2", hydrogens_sel = "resname TIP3 and name OH2", d_h_a_angle_cutoff = 150.0)

w_w_hb_o_o_new.run(step = step_size)

 

#New module, specifying atoms: H* and H*

w_w_hb_h_h_new = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ, acceptors_sel = "resname TIP3 and name H*", hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_h_h_new.run(step = step_size)

 

#New module, not specifying atoms

w_w_hb_no_at_new = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ, acceptors_sel = "resname TIP3", hydrogens_sel = "resname TIP3", d_h_a_angle_cutoff = 150.0)

w_w_hb_no_at_new.run(step = step_size)

 

#Older module

#ref: https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html, set angle to match new module

 

#Older module, specifying atoms- OH2 and OH2

w_w_hb_old = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name OH2", "resname TIP3 and name OH2", angle = 150.0)

w_w_hb_old.run(step = step_size)

 

#Older module, specifying atoms- try OH2 and H*

w_w_hb_o_h_old = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name OH2", "resname TIP3 and name H*", angle = 150.0)

w_w_hb_o_h_old.run(step = step_size)


#Older module, specifying atoms- try H* and H*

w_w_hb_h_h_old = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name H*", "resname TIP3 and name H*", angle = 150.0)

w_w_hb_h_h_old.run(step = step_size)


#Older module, not specifying atoms

w_w_hb_no_at_old = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3", "resname TIP3", angle = 150.0)

w_w_hb_no_at_old.run(step = step_size)

 

#Print H bond counts obtained with the different methods

print("HB with specifying atoms-new module, O/H")

print(w_w_hb_new.count_by_time())

print("HB with specifying atoms-new module, O/O")

print(w_w_hb_o_o_new.count_by_time())

print("HB with specifying atoms-new module, H/H")

print(w_w_hb_h_h_new.count_by_time())

print("HB without specifying atoms-new module")

print(w_w_hb_no_at_new.count_by_time())

print("HB with specifying atoms-old module, O/O")

print(w_w_hb_old.count_by_time())

print("HB with specifying atoms-old module, O/H")

print(w_w_hb_o_h_old.count_by_time())

print("HB with specifying atoms-old module, H/H")

print(w_w_hb_h_h_old.count_by_time())

print("HB without specifying atoms-old module")

print(w_w_hb_no_at_old.count_by_time())

 

Printout:
HB with specifying atoms-new module, O/H
[3267]
HB with specifying atoms-new module, O/O
[129]
HB with specifying atoms-new module, H/H
[1018]
HB without specifying atoms-new module
[5038]
HB with specifying atoms-old module, O/O
[(6.99999938, 3781)]
HB with specifying atoms-old module, O/H
[(6.99999938, 0)]
HB with specifying atoms-old module, H/H
[(6.99999938, 0)]
HB without specifying atoms-old module
[(6.99999938, 3781)]

 

#Proceed with two versions each of the old and new methods: one without specifying atoms, and the one with specifying atoms that leads to the largest H bond count

h_bond_tracker_dictionary_for_printout = {

   "new_module_specif_atoms" : w_w_hb_new,

   "new_module_no_specif_atoms" : w_w_hb_no_at_new,

   "old_module_specif_atoms" : w_w_hb_old,

   "old_module_no_specif_atoms" : w_w_hb_no_at_old

}

 

#Check HB participants in the new methods

for new_hb_method in ["new_module_specif_atoms", "new_module_no_specif_atoms"]:

    print(new_hb_method)

    

    #For each HB, see if acceptor and donor are O, and if hydrogen is H

    #Ref https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hydrogenbonds.html#module-MDAnalysis.analysis.hydrogenbonds.hbond_analysis section 4.3.1.2 describes output

    for hb_entry in h_bond_tracker_dictionary_for_printout[new_hb_method].hbonds:

        donor_o_index = int(hb_entry[1])

        h_index = int(hb_entry[2])

        acceptor_o_index = int(hb_entry[3])

        

        #Check against the lists created earlier

        #Print if there are atoms in unexpected places for their elements (H as an acceptor or donor, O as hydrogen)

        if donor_o_index not in oxygen_list:

            print(f"Donor O unexpected, hb entry {hb_entry}")

        if h_index not in hydrogen_list:

            print(f"H unexpected, hb entry {hb_entry}")

        if acceptor_o_index not in oxygen_list:

            print(f"Acceptor O unexpected, hb entry {hb_entry}")

            

    print("---Finished method's analysis")

 

Printout (truncated):

new_module_specif_atoms
---Finished method's analysis
new_module_no_specif_atoms
Acceptor O unexpected, hb entry [   0.         2125.         2126.         2175.            2.97198387
  152.90867857]
Donor O unexpected, hb entry [0.00000000e+00 2.13800000e+03 2.13700000e+03 7.39400000e+03
 2.85263921e+00 1.53409566e+02]
H unexpected, hb entry [0.00000000e+00 2.13800000e+03 2.13700000e+03 7.39400000e+03
 2.85263921e+00 1.53409566e+02]
Acceptor O unexpected, hb entry [0.00000000e+00 2.13800000e+03 2.13700000e+03 7.39400000e+03
 2.85263921e+00 1.53409566e+02]
Acceptor O unexpected, hb entry [   0.         2143.         2144.         2165.            2.84679047
  159.81574163]
Donor O unexpected, hb entry [0.00000000e+00 2.16200000e+03 2.16100000e+03 7.22400000e+03
 2.55687637e+00 1.50581809e+02]
…(truncated)

 

#Check HB participants in the older methods

for old_hb_method in ["old_module_specif_atoms", "old_module_no_specif_atoms"]:

    print(old_hb_method)

    

    #For each HB, see if acceptor is O, and if donor is H

    #Ref https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html section 4.3.2.1 describes output- this mentions O heavy atoms are recorded for donors, but I saw all Hs in the printout, so I checked for Hs in the donor position, will ask about this (example is below)

    for hb_entry in h_bond_tracker_dictionary_for_printout[old_hb_method].timeseries:

        for time_point in hb_entry:

            

            #Obtain indices of what believe should be O and H atoms

            h_atom_index_from_hb_data = time_point[0]

            o_atom_index_from_hb_data = time_point[1]

            #Remove the residue name, only keeping the atom name

            h_atom_name_from_hb_data = time_point[2].split(":")[1] 

            o_atom_name_from_hb_data = time_point[3].split(":")[1]

            

            #Record if there are any unexpected indices or names- if I understand correctly from inspecting output (will ask about documentation), the first entry is the H atom, and the second is the O atom

            #Index check

            if h_atom_index_from_hb_data not in hydrogen_list: 

                print(f"H atom index unexpected {time_point}")

            if o_atom_index_from_hb_data not in oxygen_list:

                print(f"O atom index unexpected {time_point}")

                

            #Name check

            if h_atom_name_from_hb_data not in ["H1", "H2"]: 

                print(f"H atom name unexpected {time_point}")

            if o_atom_name_from_hb_data != "OH2":

                print(f"O atom name unexpected {time_point}")

                

    print("---Finished method's analysis")

 

Printout:

old_module_specif_atoms
---Finished method's analysis
old_module_no_specif_atoms
---Finished method's analysis

 

#Print first entries of the timeseries to check for the identity of the atom in the donor position: always see hydrogen

#Ref https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html section 4.3.2.1 helped in indexing output for printing

print(h_bond_tracker_dictionary_for_printout["old_module_no_specif_atoms"].timeseries[0][0:10])

 

Printout:

[[2114, 8188, 'TIP31:H1', 'TIP31788:OH2', 2.1173231298732356, 157.5229441087807], [2117, 7381, 'TIP32:H1', 'TIP31519:OH2', 2.023594066321529, 150.38991779620892], [2118, 2113, 'TIP32:H2', 'TIP31:OH2', 1.880757802970006, 164.9775655924464], [2120, 3274, 'TIP33:H1', 'TIP3150:OH2', 1.8798327138461597, 156.84897397248378], [2123, 2134, 'TIP34:H1', 'TIP38:OH2', 1.764895047279124, 157.45442248282995], [2126, 7393, 'TIP35:H1', 'TIP31523:OH2', 1.7613017537338087, 151.4520287466912], [2132, 7288, 'TIP37:H1', 'TIP31488:OH2', 1.6248297266146112, 169.58821103916455], [2139, 2143, 'TIP39:H2', 'TIP311:OH2', 1.9794347894818085, 169.16618238543683], [2144, 2164, 'TIP311:H1', 'TIP318:OH2', 2.006974967081382, 167.85959055741296], [2145, 8344, 'TIP311:H2', 'TIP31840:OH2', 2.1473081409748302, 160.91020279907468]]

Paul Smith

unread,
Oct 6, 2021, 4:40:54 AM10/6/21
to MDnalysis discussion
Hi Dina,

I'll try to answer all your questions, but let me know if anything is still not clear!

    > If I understood section 4.3.1.4’s donors_sel bullet point in this link correctly, I should not specify donors, so I did not include donors_sel.

Yep, that's correct. By not specifying the donor_sel, the donor atoms will be identified based on the topology (i.e. we know the donors are bonded to the hydrogen atoms selected in hydrogen_sel)

    > I wanted to please check in: would you recommend using the new version with or without selections? What types of checks would you suggest conducting to make sure my analysis includes all relevant hydrogen bonds and only relevant hydrogen bonds?

Your first method of calculating hydrogen bonds:

    #New module, specifying atoms: OH2 and H*
    w_w_hb_new = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ, acceptors_sel = "resname TIP3 and name OH2", hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)
    w_w_hb_new.run(step = step_size)

is correct. In other methods where, for example, you set acceptors_sel = "resname TIP3" you are including too many (and incorrect) atoms in the acceptors_sel. The new HBA module doesn't check whether the atoms you supply to acceptors_sel are actually acceptors - it simply assumes they are. So when you provide this selection, it assumes that OH2, H1, and H2 are all acceptors atoms. This is why you see more hydrogen bonds found. It is also why you see the unexpected acceptors/hydrogens/donors in your results.

As a quick comparison, I used the VMD hbonds plugin to find water-water hydrogen bonds at the first frame in this system, and obtained around 3000 hydrogen bonds. I'm not sure why it doesn't match exactly with the new HBA module as I'm not familiar with the implementation details of the VMD plugin. Nonetheless, they both give approximately the same number of hydrogen bonds.

In terms of checks you can do, the count_by_type() method of HBA is an efficient way of checking all pairs of atoms involved in hydrogen bonds.

    > it appears that, with some methods, there are fewer water-water hydrogen bonds than might be expected per water molecule

As you mention, in bulk water the number of hydrogen bonds per water molecule should be around 3. However, this is a relatively small box of water with a protein in it. The effects of the protein on the hydrogen bond network of water can be surprisingly long-range - it is not just the water molecules immediately next the to protein that have water-water hydrogen bonds broken. Depending on the surface (e.g. protein, lipid bilayer, graphene), the disruption of the hydrogen bond network of water can easily extend past 10 Angstrom. So this is why you are seeing fewer hydrogen bonds per water molecule than you would expect in bulk water.

    > and are there any settings you would suggest changing in the hydrogen bond analysis I am running?

Unless your selection strings include e.g. selecting water atoms within a cutoff radius of your protein, you can set update_selection=False. This will give you a performance boost by not creating the atom groups at every frame.

Also, it seems like you have been reading the documentation to learn how to use the new HBA tool (which is a great idea, thanks for reading it before asking questions!). However, there are newer tutorials in the MDA UserGuide that go into more detail and are hopefully clearer and easier to understand than the documentation (https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds.html). If you haven't seen these yet, I would definitely recommend taking a look.

I hope that helps!

Cheers,
Paul

Dina Sharon

unread,
Nov 1, 2021, 12:34:59 PM11/1/21
to mdnalysis-...@googlegroups.com

Dear Paul,

 

I thank you so much for sharing these very helpful answers with me! I really appreciate your sharing these very thorough and thoughtful answers and providing very helpful information on the tutorials, and I apologize for the extreme delay in my following up. I was looking into protein hydrogen bonds and trying to figure out something on that topic before responding. I am including a code example below this email, which contains the observations noted in this email. If possible, I hope to please check in on a few follow-ups.

 

It is very helpful to know about not specifying donors_sel, about specifying the atoms with the new module, about count_by_type(), and about update_selection=False. I wanted to please confirm: in the old module, is it correct that selection1_type can be set without specifying atoms (simply selecting whole waters)? My code checks below this email indicate this is the case (obtaining identical results when selecting oxygens and whole waters, whether or not one selection is specified as a donor or acceptor), and I just wanted to please confirm to be safe.

 

Also, on the topic of the older module, as I was running these checks, in the older module's documentation (section 4.3.2.1), I saw a mention that the output contains “the identities of donor and acceptor heavy-atoms…”, but my printout in the code example below is reporting only hydrogen atoms in the donor position. I wanted to please check in on what could be causing this situation.

 

I also was looking into protein hydrogen bonds and seeing if I could analyze individual residues. To automate, I am trying to loop over several residues. I noticed that proline, which does not have any hydrogen bond donors, appears to yield many hydrogen bonds when it is input as a donor through this automation, including hydrogen bonds which do not appear to contain a proline atom as a participant in the count_by_type() output (I have a code example below). I wanted to please check in: are there options I can select to prevent this, or is it better to modify my code for the special case of proline? More generally, do I need to ensure I avoid empty selections with the new module (including updating selections’ being temporarily empty)? I had updating selections’ being temporarily empty in the old MDAnalysis.analysis.hbonds.HydrogenBondAnalysis module, and I have not yet found any issues with this. To be safe, I wanted to please check in: could there be issues with this in the old MDAnalysis.analysis.hbonds.HydrogenBondAnalysis module?

 

I really appreciate your sharing information about the count of water-water hydrogen bonds. I also ran some sensitivity analyses and saw how the d_h_a_angle_cutoff and d_a_cutoff settings can change the water-water hydrogen bond count, and I may change these values from the default in my analysis.

 

I thank you so much for helping me with my questions and making MDAnalysis an excellent toolkit!

 

With very many thanks,

Dina

 

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

from MDAnalysis.analysis import hbonds

import MDAnalysisData

from MDAnalysisData import datasets

from MDAnalysis.tests.datafiles import PSF, DCD

#Run H bond analyses with and without specifying atoms and selection types (trying different combinations to see what happens, selecting a few waters for simplicity)

 

step_size = 100000 #make large for testing purposes

 

#Old module

#ref: https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html, set angle to match new module (ref: https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html)

 

#Old module, not specifying atoms, no donor/acceptor designation

w_w_hb_no_at_specify_no_sel_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and not (resid 1000 or resid 2000 or resid 3000)", angle = 150.0)

w_w_hb_no_at_specify_no_sel_specify.run(step = step_size)

 

#Old module, specifying water Hs, no donor/acceptor designation

w_w_hb_h_specify_no_sel_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name H* and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and name H* and not (resid 1000 or resid 2000 or resid 3000)", angle = 150.0)

w_w_hb_h_specify_no_sel_specify.run(step = step_size)

 

#Old module, specifying water Os, no donor/acceptor designation

w_w_hb_o_specify_no_sel_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name OH2 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and name OH2 and not (resid 1000 or resid 2000 or resid 3000)", angle = 150.0)

w_w_hb_o_specify_no_sel_specify.run(step = step_size)

 

#Old module, not specifying atoms, 1st selection is donor

w_w_hb_no_at_specify_1_sel_d_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "donor", angle = 150.0)

w_w_hb_no_at_specify_1_sel_d_specify.run(step = step_size, selection1_type = "donor")

 

#Old module, specifying water Hs, 1st selection is donor

w_w_hb_h_specify_1_sel_d_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name H* and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and name H* and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "donor", angle = 150.0)

w_w_hb_h_specify_1_sel_d_specify.run(step = step_size, selection1_type = "donor")

 

#Old module, specifying water Os, 1st selection is donor

w_w_hb_o_specify_1_sel_d_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name OH2 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and name OH2 and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "donor", angle = 150.0)

w_w_hb_o_specify_1_sel_d_specify.run(step = step_size, selection1_type = "donor")

 

#Old module, not specifying atoms, 1st selection is acceptor

w_w_hb_no_at_specify_1_sel_a_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "acceptor", angle = 150.0)

w_w_hb_no_at_specify_1_sel_a_specify.run(step = step_size, selection1_type = "acceptor")

 

#Old module, specifying water Hs, 1st selection is acceptor

w_w_hb_h_specify_1_sel_a_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name H* and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3 and name H* and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "acceptor", angle = 150.0)

w_w_hb_h_specify_1_sel_a_specify.run(step = step_size, selection1_type = "acceptor")

 

#Old module, specifying water Os, 1st selection is acceptor

w_w_hb_o_specify_1_sel_a_specify = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ, "resname TIP3 and name OH2 and (resid 1000 or resid 2000 or resid 3000)", "resname TIP3  and name OH2 and not (resid 1000 or resid 2000 or resid 3000)", selection1_type = "acceptor", angle = 150.0)

w_w_hb_o_specify_1_sel_a_specify.run(step = step_size, selection1_type = "acceptor")

 

#Received warnings

#/Users/dinasharon/opt/anaconda3/lib/python3.7/site-packages/MDAnalysis/analysis/hbonds/hbond_analysis.py:580: DeprecationWarning: This class is deprecated as of MDAnalysis version 1.0 and will be removed in version 2.0.Please use MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis instead.
#  category=DeprecationWarning
#/Users/dinasharon/opt/anaconda3/lib/python3.7/site-#packages/MDAnalysis/analysis/hbonds/hbond_analysis.py:673: SelectionWarning: No donors found in selection 1. You might have to specify a custom 'donors' keyword. Selection will update so continuing with fingers crossed.
#  warnings.warn(errmsg, category=SelectionWarning)
#/Users/dinasharon/opt/anaconda3/lib/python3.7/site-packages/MDAnalysis/analysis/hbonds/hbond_analysis.py:673: SelectionWarning: No acceptors found in selection 1. You might have to specify a custom 'acceptors' keyword. Selection will update so continuing with fingers crossed.
#  warnings.warn(errmsg, category=SelectionWarning)
#/Users/dinasharon/opt/anaconda3/lib/python3.7/site-packages/MDAnalysis/analysis/hbonds/hbond_analysis.py:673: SelectionWarning: No acceptors found in selection 2. You might have to specify a custom 'acceptors' keyword. Selection will update so continuing with fingers crossed.
#  warnings.warn(errmsg, category=SelectionWarning)
#/Users/dinasharon/opt/anaconda3/lib/python3.7/site-packages/MDAnalysis/analysis/hbonds/hbond_analysis.py:673: SelectionWarning: No donors found in selection 2. You might have to specify a custom 'donors' keyword. Selection will update so continuing with fingers crossed.
#  warnings.warn(errmsg, category=SelectionWarning)

 

#Print H bond counts with the different methods

print("HB without specifying atoms or donor/acceptor-old module")

print(w_w_hb_no_at_specify_no_sel_specify.count_by_time())

print("HB with specifying water H, without specifying donor/acceptor-old module")

print(w_w_hb_h_specify_no_sel_specify.count_by_time())

print("HB with specifying water O, without specifying donor/acceptor-old module")

print(w_w_hb_o_specify_no_sel_specify.count_by_time())

print("HB without specifying atoms, 1st selection as donor-old module")

print(w_w_hb_no_at_specify_1_sel_d_specify.count_by_time())

print("HB with specifying water H, 1st selection as donor-old module")

print(w_w_hb_h_specify_1_sel_d_specify.count_by_time())

print("HB with specifying water O, 1st selection as donor-old module")

print(w_w_hb_o_specify_1_sel_d_specify.count_by_time())

print("HB without specifying atoms, 1st selection as acceptor-old module")

print(w_w_hb_no_at_specify_1_sel_a_specify.count_by_time())

print("HB with specifying water H, 1st selection as acceptor-old module")

print(w_w_hb_h_specify_1_sel_a_specify.count_by_time())

print("HB with specifying water O, 1st selection as acceptor-old module")

print(w_w_hb_o_specify_1_sel_a_specify.count_by_time())

 

#Printout
HB without specifying atoms or donor/acceptor-old module
[(6.99999938, 4)]
HB with specifying water H, without specifying donor/acceptor-old module
[(6.99999938, 0)]
HB with specifying water O, without specifying donor/acceptor-old module
[(6.99999938, 4)]
HB without specifying atoms, 1st selection as donor-old module
[(6.99999938, 3)]
HB with specifying water H, 1st selection as donor-old module
[(6.99999938, 0)]
HB with specifying water O, 1st selection as donor-old module
[(6.99999938, 3)]
HB without specifying atoms, 1st selection as acceptor-old module
[(6.99999938, 1)]
HB with specifying water H, 1st selection as acceptor-old module
[(6.99999938, 0)]
HB with specifying water O, 1st selection as acceptor-old module
[(6.99999938, 1)]

 

#Print first entries of the timeseries to check for the identity of the atom in the donor position

print(w_w_hb_no_at_specify_no_sel_specify.timeseries)

 

#Printout

[[[5825, 3643, 'TIP31000:H1', 'TIP3273:OH2', 2.6942279189473073, 167.43563772955233], [8825, 8833, 'TIP32000:H1', 'TIP32003:OH2', 1.907703766019566, 167.75383184967944], [11825, 11761, 'TIP33000:H1', 'TIP32979:OH2', 2.079647130504742, 163.60065906508382], [8721, 8824, 'TIP31965:H2', 'TIP32000:OH2', 1.9168825375069918, 154.7278701372644]]]

 

#B New module amino acid hydrogen bond checks

 

#Download test will use for amino acids

#Ref: https://userguide.mdanalysis.org/1.0.0/testing.html#

test_univ_for_amino_acids = MDAnalysis.Universe(PSF, DCD)

 

#Check hydrogen bonds between a few amino acids and the entire protein

#What are the selections with guess_hydrogens and guess_acceptors?

#What are the count_by_type() and count_by_time() results?

#Ref: https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds-selections.html,  https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html

#Many thanks to Paul Smith for sharing the tutorial and for advice on update_selections = False and count_by_type()!

 

step_size_for_amino_acids = 50000 #large step size for testing

 

#Iterate over several residues(only selecting two in addition to proline for brevity)

for amino_acid in ["GLY", "LYS", "PRO"]:

   

    print("++++++++++++")

   

    #Set up string

    residue_string = f"protein and resname {amino_acid}"

    print(residue_string)

   

    #First- residue as donor

    print("residue as donor")

    hb_check_residue_as_donor = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_amino_acids, update_selections = False)

   

    #Residue is donor, acceptors are all other protein atoms

    hb_check_residue_as_donor.hydrogens_sel = hb_check_residue_as_donor.guess_hydrogens(residue_string)

    hb_check_residue_as_donor.acceptors_sel = hb_check_residue_as_donor.guess_acceptors("protein")

   

    #Report selections

    print("hydrogens")

    print(hb_check_residue_as_donor.hydrogens_sel)

    print("acceptors")

    print(hb_check_residue_as_donor.acceptors_sel)

   

    #Run analysis and print out results

    hb_check_residue_as_donor.run(step = step_size_for_amino_acids)

    print("count by type")

    print(hb_check_residue_as_donor.count_by_type())

    print("count by time")

    print(hb_check_residue_as_donor.count_by_time())

    print("------")

   

    #Second- residue as acceptor

    print("residue as acceptor")

    hb_check_residue_as_acceptor = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_amino_acids, update_selections = False)

    hb_check_residue_as_acceptor.hydrogens_sel = hb_check_residue_as_acceptor.guess_hydrogens("protein")

    hb_check_residue_as_acceptor.acceptors_sel = hb_check_residue_as_acceptor.guess_acceptors(residue_string)

   

    #Report selections

    print("hydrogens")

    print(hb_check_residue_as_acceptor.hydrogens_sel)

    print("acceptors")

    print(hb_check_residue_as_acceptor.acceptors_sel)

   

    #Run analysis and print out results

    hb_check_residue_as_acceptor.run(step = step_size_for_amino_acids)

    print("count by type")

    print(hb_check_residue_as_acceptor.count_by_type())

    print("count by time")

    print(hb_check_residue_as_acceptor.count_by_time())

    print("------")

    print("++++++++++++")

 

#Printout

++++++++++++
protein and resname GLY
residue as donor
hydrogens
(resname GLY and name HN)
acceptors
(resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname CYS and name O) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLY and name O) or (resname GLY and name OT1) or (resname GLY and name OT2) or (resname HSD and name NE2) or (resname HSD and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname PRO and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)
count by type
[['GLY:54' 'ASP:70' '1']
 ['GLY:54' 'HSD:70' '1']]
count by time
[2]
------
residue as acceptor
hydrogens
(resname ALA and name HN) or (resname ARG and name HE) or (resname ARG and name HH11) or (resname ARG and name HH12) or (resname ARG and name HH21) or (resname ARG and name HH22) or (resname ARG and name HN) or (resname ASN and name HD21) or (resname ASN and name HD22) or (resname ASN and name HN) or (resname ASP and name HN) or (resname CYS and name HN) or (resname GLN and name HE21) or (resname GLN and name HE22) or (resname GLN and name HN) or (resname GLU and name HN) or (resname GLY and name HN) or (resname HSD and name HD1) or (resname HSD and name HN) or (resname ILE and name HN) or (resname LEU and name HN) or (resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3) or (resname MET and name HN) or (resname MET and name HT1) or (resname MET and name HT2) or (resname MET and name HT3) or (resname PHE and name HN) or (resname SER and name HG1) or (resname SER and name HN) or (resname THR and name HG1) or (resname THR and name HN) or (resname TYR and name HH) or (resname TYR and name HN) or (resname VAL and name HN)
acceptors
(resname GLY and name O) or (resname GLY and name OT1) or (resname GLY and name OT2)
count by type
[['ARG:57' 'GLY:70' '2']
 ['LEU:54' 'GLY:70' '1']
 ['LYS:56' 'GLY:72' '1']
 ['TYR:73' 'GLY:72' '1']]
count by time
[5]
------
++++++++++++
++++++++++++
protein and resname LYS
residue as donor
hydrogens
(resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3)
acceptors
(resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname CYS and name O) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLY and name O) or (resname GLY and name OT1) or (resname GLY and name OT2) or (resname HSD and name NE2) or (resname HSD and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname PRO and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)
count by type
[['LYS:54' 'ASP:72' '1']
 ['LYS:54' 'ILE:70' '1']
 ['LYS:56' 'ASP:72' '7']
 ['LYS:56' 'GLU:70' '1']
 ['LYS:56' 'GLU:72' '6']
 ['LYS:56' 'GLY:72' '1']
 ['LYS:56' 'ILE:70' '1']
 ['LYS:56' 'LEU:70' '1']
 ['LYS:56' 'TYR:73' '1']]
count by time
[20]
------
residue as acceptor
hydrogens
(resname ALA and name HN) or (resname ARG and name HE) or (resname ARG and name HH11) or (resname ARG and name HH12) or (resname ARG and name HH21) or (resname ARG and name HH22) or (resname ARG and name HN) or (resname ASN and name HD21) or (resname ASN and name HD22) or (resname ASN and name HN) or (resname ASP and name HN) or (resname CYS and name HN) or (resname GLN and name HE21) or (resname GLN and name HE22) or (resname GLN and name HN) or (resname GLU and name HN) or (resname GLY and name HN) or (resname HSD and name HD1) or (resname HSD and name HN) or (resname ILE and name HN) or (resname LEU and name HN) or (resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3) or (resname MET and name HN) or (resname MET and name HT1) or (resname MET and name HT2) or (resname MET and name HT3) or (resname PHE and name HN) or (resname SER and name HG1) or (resname SER and name HN) or (resname THR and name HG1) or (resname THR and name HN) or (resname TYR and name HH) or (resname TYR and name HN) or (resname VAL and name HN)
acceptors
(resname LYS and name O)
count by type
[['ALA:54' 'LYS:70' '1']
 ['GLU:54' 'LYS:70' '1']]
count by time
[2]
------
++++++++++++
++++++++++++
protein and resname PRO
residue as donor
hydrogens
 
acceptors
(resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname CYS and name O) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLY and name O) or (resname GLY and name OT1) or (resname GLY and name OT2) or (resname HSD and name NE2) or (resname HSD and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname PRO and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)
count by type
[['ALA:54' 'GLU:70' '1']
 ['ALA:54' 'ILE:70' '1']
 ['ALA:54' 'LYS:70' '1']
 ['ALA:54' 'MET:70' '2']
 ['ALA:54' 'TYR:70' '1']
 ['ARG:54' 'ASP:72' '1']
 ['ARG:54' 'TYR:70' '1']
 ['ARG:54' 'VAL:70' '1']
 ['ARG:57' 'ARG:70' '2']
 ['ARG:57' 'ASP:72' '12']
 ['ARG:57' 'GLU:72' '6']
 ['ARG:57' 'GLY:70' '2']
 ['ARG:57' 'THR:70' '1']
 ['ARG:57' 'VAL:70' '1']
 ['ASN:54' 'HSD:70' '1']
 ['ASP:54' 'GLU:70' '1']
 ['ASP:54' 'ILE:70' '1']
 ['ASP:54' 'VAL:70' '1']
 ['GLN:55' 'ASP:72' '1']
 ['GLN:55' 'GLU:70' '1']
 ['GLU:54' 'ALA:70' '2']
 ['GLU:54' 'ARG:70' '1']
 ['GLU:54' 'LYS:70' '1']
 ['GLY:54' 'ASP:70' '1']
 ['GLY:54' 'HSD:70' '1']
 ['HSD:54' 'ARG:70' '1']
 ['HSD:54' 'ASN:70' '1']
 ['ILE:54' 'ASP:70' '1']
 ['ILE:54' 'LEU:70' '1']
 ['ILE:54' 'MET:70' '1']
 ['ILE:54' 'PHE:70' '1']
 ['ILE:54' 'TYR:70' '1']
 ['LEU:54' 'GLY:70' '1']
 ['LEU:54' 'ILE:70' '1']
 ['LEU:54' 'LEU:70' '1']
 ['LEU:54' 'VAL:70' '1']
 ['LYS:54' 'ASP:72' '1']
 ['LYS:54' 'ILE:70' '1']
 ['LYS:56' 'ASP:72' '7']
 ['LYS:56' 'GLU:70' '1']
 ['LYS:56' 'GLU:72' '6']
 ['LYS:56' 'GLY:72' '1']
 ['LYS:56' 'ILE:70' '1']
 ['LYS:56' 'LEU:70' '1']
 ['LYS:56' 'TYR:73' '1']
 ['SER:73' 'ILE:70' '1']
 ['THR:54' 'ASP:72' '1']
 ['THR:73' 'ARG:70' '1']
 ['THR:73' 'ASP:70' '1']
 ['THR:73' 'ASP:72' '2']
 ['THR:73' 'GLU:72' '1']
 ['TYR:54' 'ARG:70' '1']
 ['TYR:54' 'LEU:70' '1']
 ['TYR:73' 'ASP:72' '2']
 ['TYR:73' 'GLU:72' '1']
 ['TYR:73' 'GLY:72' '1']
 ['VAL:54' 'ASP:72' '1']
 ['VAL:54' 'GLU:70' '1']]
count by time
[91]
------
residue as acceptor
hydrogens
(resname ALA and name HN) or (resname ARG and name HE) or (resname ARG and name HH11) or (resname ARG and name HH12) or (resname ARG and name HH21) or (resname ARG and name HH22) or (resname ARG and name HN) or (resname ASN and name HD21) or (resname ASN and name HD22) or (resname ASN and name HN) or (resname ASP and name HN) or (resname CYS and name HN) or (resname GLN and name HE21) or (resname GLN and name HE22) or (resname GLN and name HN) or (resname GLU and name HN) or (resname GLY and name HN) or (resname HSD and name HD1) or (resname HSD and name HN) or (resname ILE and name HN) or (resname LEU and name HN) or (resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3) or (resname MET and name HN) or (resname MET and name HT1) or (resname MET and name HT2) or (resname MET and name HT3) or (resname PHE and name HN) or (resname SER and name HG1) or (resname SER and name HN) or (resname THR and name HG1) or (resname THR and name HN) or (resname TYR and name HH) or (resname TYR and name HN) or (resname VAL and name HN)
acceptors
(resname PRO and name O)
count by type
[]
count by time
[0]
------
++++++++++++

 


--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/f9be860d-02fd-431c-a3e6-24ad2db4802en%40googlegroups.com.

Paul Smith

unread,
Nov 7, 2021, 11:40:29 AM11/7/21
to MDnalysis discussion

Hi Dina,

You're welcome! I'll try to answer your remaining questions, but again please let me know if anything is not clear:

in the old module, is it correct that selection1_type can be set without specifying atoms (simply selecting whole waters)? 

Yep, that's right. The previous module has a set of pre-defined acceptor and donor atom names for various forcefields. For example,

    hbonds.HydrogenBondAnalysis.DEFAULT_DONORS['CHARMM27']

will return the following tuple:

('NE2', 'ND2', 'OH', 'NZ', 'NH1', 'NE', 'N', 'ND1', 'SG', 'OW', 'NE1', 'NH2', 'OG', 'OG1', 'OH2')

To determine the donor atoms, the previous module will take your selection string, create an atom group from it, and from this atom group will select any atoms that are in the default donor list. So all atoms not on the donor list, such as hydrogen atoms, will be ignored when identifying donor atoms.

> Also, on the topic of the older module, as I was running these checks, in the older module's documentation (section 4.3.2.1), I saw a mention that the output contains “the identities of donor and acceptor heavy-atoms…”, but my printout in the code example below is reporting only hydrogen atoms in the donor position. I wanted to please check in on what could be causing this situation.

 I think this is an issue with the documentation. Looking at the code for the previous module, the following data is stored for each hydrogen bond at a given frame:

[

    h.index,

    a.index,

    (h.resname, h.resid, h.name),

    (a.resname, a.resid, a.name),

    dist,

    angle

]

Information about the hydrogen and acceptor atoms is being stored (rather than about donor and acceptor atoms). This will be because unique hydrogen bonds can be identified by hydrogen-acceptor pairs but not necessarily by donor-acceptor pairs (because a donor atom may have multiple hydrogen atoms). So your results are correct, but the returned information is not what you expected. Sorry for the confusion!

> I noticed that proline, which does not have any hydrogen bond donors, appears to yield many hydrogen bonds when it is input as a donor through this automation, including hydrogen bonds which do not appear to contain a proline atom as a participant in the count_by_type() output 

Thanks for raising this! I've raised the issue on github and submitted pull-request to fix it.

The issue, as you mentioned, is that proline has no hydrogen atoms to donate. This means that this line:

    hb_check_residue_as_donor.hydrogens_sel = hb_check_residue_as_donor.guess_hydrogens(residue_string)

assigns hydrogens_sel to be an empty string. Currently, if an empty selection string is passed to HydrogenBondAnalysis, the hydrogen atoms will be identified using guess_hydrogens applied to the entire system. So this is why you incorrectly end up with a bunch of hydrogen bonds that don't involve proline. This issue will be fixed in the next release of MDAnalysis, but for now I'm afraid the easiest solution might be to check if your hydrogen_sel (and acceptors_sel) is an empty string, and if it is then set it to be None instead.

Thanks again for raising this issue, and for your thorough comparison of the different versions of the module!

Cheers,
Paul

Dina Sharon

unread,
Nov 23, 2021, 8:48:30 PM11/23/21
to mdnalysis-...@googlegroups.com

Dear Paul,

 

I thank you so much for sharing these very helpful answers! I really appreciate your looking into my questions so carefully. I thank you for explaining how the earlier module does not require atom specifications when using selection1_type and your sharing this CHARMM27 example and explaining how atoms are selected. It is very helpful to know this about the earlier module’s documentation and the output format: I am very grateful to you for looking into the code and what is recorded. I thank you so much for following up on the proline question and for making these updates on GitHub. I appreciate your explaining what happens when an empty string is used and how selections can be checked. I apologize for the extreme delay in my responding to your very helpful email: I wanted to look into temporarily empty updating selections a bit before responding. I am including a code example below this email, which contains the observations noted in this email. If possible, I hope to please check in on a few follow-ups.

 

For both the older and new modules, I checked what would happen when an updating selection sometimes contains atoms and sometimes does not contain atoms, as the code example below shows. If I am interpreting the output correctly, it seems that, when not using guess_acceptors, there are no hydrogen bonds when there are no water molecules in a temporarily empty updating selection, for both the older and the new modules, for both a situation where the updating selection is empty in the first frame and a situation where the updating selection is not empty in the first frame. However, it seems that, when using guess_acceptors, there are hydrogen bonds when there are no water molecules in a temporarily empty updating selection, for the new module, for both a situation where the updating selection is empty in the first frame and a situation where the updating selection is not empty in the first frame. (The hydrogen bond counts, while being non-zero for frames with 0 water molecules in the selection both when the updating selection is empty in the first frame and when the updating selection is not empty in the first frame, also differ slightly in these two cases with guess_acceptors and include different atom types.) The example below is definitely an artificial example, to illustrate the concept, because here guess_acceptors is not as necessary as it is, for instance, in the case of a protein selection where there are different relevant atom types.

 

If possible, I hope to please confirm: is it safe to conclude from the results of this example that, in contrast to the proline case, if an updating selection is sometimes empty and sometimes not and guessing is not used, then there should not be issues with the output obtained previously from the older or the new hydrogen bond modules? Or are there further checks you would suggest before concluding this? Meanwhile, if guessing is used, I see how there would be issues with the output, as in the proline case. Will this situation with updating selections which are sometimes empty and with guessing also be changed in the next release, in addition to situations I asked about in my previous message where a selection is empty in every frame? For checking, would iterating through frames and finding the count of atoms for a selection for each analyzed frame, as I do in the code below, be the best way to see if a selection is ever empty, or is there a better approach?

 

Also, as I was running these checks, I obtained a count_by_type() error for the older module with these updating selections, as I show at the bottom of my code example below. I wanted to please check in: what could be causing this error?

 

I really appreciate your helping me with my questions and all you do to make MDAnalysis such a wonderful toolkit!

 

With very many thanks,

Dina

 

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

from MDAnalysis.analysis import hbonds

import MDAnalysisData

from MDAnalysisData import datasets

from MDAnalysis.tests.datafiles import PSF, DCD

 

#Download test data (ref: https://www.mdanalysis.org/MDAnalysisData/usage.html,  https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html#module-MDAnalysisData.ifabp_water,  https://figshare.com/articles/dataset/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030/1)

ifabp_data = MDAnalysisData.ifabp_water.fetch_ifabp_water()

test_univ_for_updating_selections = MDAnalysis.Universe(ifabp_data.topology, ifabp_data.trajectory)

 

#Set stride, since having only a few frames will facilitate inspection

stride_for_analysis = 50

 

#Select waters so close to the protein that the selection will sometimes be empty (water_nearby_selection), also select protein residues involved in these contacts in the second selection (protein_nearby_selection) ref https://userguide.mdanalysis.org/stable/selections.html?highlight=selections

water_nearby_selection = "resname TIP3 and same resid as (resname TIP3 and around 0.55 protein)"

protein_nearby_selection = "protein and same resid as (protein and around 0.55 resname TIP3)"

#From a check, I found the protein resids with water nearby in the 1st frame (the subsequent code has more details), and in the water_nearby_selection_not_first_frame selection I exclude these resids to create a selection which is empty in the 1st frame but not empty in all frames

water_nearby_selection_not_first_frame = "resname TIP3 and same resid as (resname TIP3 and around 0.55 (protein and not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98)))"

 

#Iterate over the frames, showing how water_nearby_selection and water_nearby_selection_not_first_frame sometimes contain 0 atoms, and how water_nearby_selection contains atoms in the 1st frame while water_nearby_selection_not_first_frame does not contain atoms in the 1st frame

#Also show the protein resids, which I checked to create water_nearby_selection_not_first_frame

#Trajectory iteration ref: https://userguide.mdanalysis.org/stable/examples/quickstart.html#Working-with-trajectories

for fr_check in test_univ_for_updating_selections.trajectory[::stride_for_analysis]:

   

    #Print water selections' atom counts in the frame

    print(f"Frame {fr_check.frame} time {fr_check.time} count of water_nearby_selection water atoms {len(test_univ_for_updating_selections.select_atoms(water_nearby_selection))} count of water_nearby_selection_not_first_frame water atoms {len(test_univ_for_updating_selections.select_atoms(water_nearby_selection_not_first_frame))}")

   

    #Obtain resids of the protein residues near water in the frame

    #Only unique residues are reported, ref: https://www.geeksforgeeks.org/python-get-unique-values-list/

    unique_protein_residues_near_water = sorted(list(set(p_atom.resid for p_atom in test_univ_for_updating_selections.select_atoms(protein_nearby_selection))))

    print("Unique protein resids:")

    print(unique_protein_residues_near_water)

    print("----")

 

#Printout

Frame 0 time 6.99999938344013 count of water_nearby_selection water atoms 33 count of water_nearby_selection_not_first_frame water atoms 0

Unique protein resids:

[19, 25, 26, 28, 44, 94, 98]

----

Frame 50 time 56.99999497944106 count of water_nearby_selection water atoms 18 count of water_nearby_selection_not_first_frame water atoms 6

Unique protein resids:

[25, 26, 79, 94, 129]

----

Frame 100 time 106.99999057544198 count of water_nearby_selection water atoms 6 count of water_nearby_selection_not_first_frame water atoms 6

Unique protein resids:

[79, 100]

----

Frame 150 time 156.9999861714429 count of water_nearby_selection water atoms 0 count of water_nearby_selection_not_first_frame water atoms 0

Unique protein resids:

[]

----

Frame 200 time 206.99998176744384 count of water_nearby_selection water atoms 18 count of water_nearby_selection_not_first_frame water atoms 6

Unique protein resids:

[25, 94, 99, 100]

----

Frame 250 time 256.99997736344477 count of water_nearby_selection water atoms 18 count of water_nearby_selection_not_first_frame water atoms 12

Unique protein resids:

[24, 25, 29, 96, 98]

----

Frame 300 time 306.9999729594457 count of water_nearby_selection water atoms 12 count of water_nearby_selection_not_first_frame water atoms 12

Unique protein resids:

[96, 97]

----

Frame 350 time 356.9999685554466 count of water_nearby_selection water atoms 0 count of water_nearby_selection_not_first_frame water atoms 0

Unique protein resids:

[]

----

Frame 400 time 406.99996415144756 count of water_nearby_selection water atoms 3 count of water_nearby_selection_not_first_frame water atoms 0

Unique protein resids:

[26]

----

Frame 450 time 456.9999597474485 count of water_nearby_selection water atoms 3 count of water_nearby_selection_not_first_frame water atoms 3

Unique protein resids:

[29]

----

 

#New module, for these selections that, as shown above, sometimes contain atoms and sometimes does not, both specifying oxygens and trying guess_acceptors

#Many thanks to Paul Smith for sharing the tutorial and for advice on when to use update_selections = False and on count_by_type()!

#Here I want the selections to update, so I keep the default update_selections value of True

 

#Specify oxygens, for a selection which is not empty in the 1st frame

oxygen_nearby_selection = f"name OH2 and ({water_nearby_selection})"

print(f"Acceptor selection for nearby oxygens, specifying oxygens, is {oxygen_nearby_selection}")

w_w_hb_new_updating_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, acceptors_sel = oxygen_nearby_selection, hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_new_updating_specify.run(step = stride_for_analysis)

 

#Guess acceptors, for a selection which is not empty in the 1st frame

w_w_hb_new_updating_guess = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_new_updating_guess.acceptors_sel = w_w_hb_new_updating_guess.guess_acceptors(water_nearby_selection)

print(f"Acceptor selection for nearby oxygens, guessing acceptors, is {w_w_hb_new_updating_guess.acceptors_sel}")

w_w_hb_new_updating_guess.run(step = stride_for_analysis)

 

#Specify oxygens, for a selection which is empty in the 1st frame

oxygen_nearby_selection_not_first_frame = f"name OH2 and ({water_nearby_selection_not_first_frame})"

print(f"Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, specifying oxygens, is {oxygen_nearby_selection_not_first_frame}")

w_w_hb_new_updating_zero_first_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, acceptors_sel = oxygen_nearby_selection_not_first_frame, hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_new_updating_zero_first_specify.run(step = stride_for_analysis)

 

#Guess acceptors, for a selection which is empty in the 1st frame

w_w_hb_new_updating_zero_first_guess = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, hydrogens_sel = "resname TIP3 and name H*", d_h_a_angle_cutoff = 150.0)

w_w_hb_new_updating_zero_first_guess.acceptors_sel = w_w_hb_new_updating_zero_first_guess.guess_acceptors(water_nearby_selection_not_first_frame)

print(f"Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, guessing acceptors, is {w_w_hb_new_updating_zero_first_guess.acceptors_sel}")

w_w_hb_new_updating_zero_first_guess.run(step = stride_for_analysis)

 

#Printout

Acceptor selection for nearby oxygens, specifying oxygens, is name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 protein))
Acceptor selection for nearby oxygens, guessing acceptors, is (resname TIP3 and name OH2)
Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, specifying oxygens, is name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 (protein and not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98))))
Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, guessing acceptors, is 

 

#Print results

print("All waters near the protein, specifying oxygens, hydrogen bond results (new module)")

print("count by time")

print(w_w_hb_new_updating_specify.count_by_time())

print("count by type")

print(w_w_hb_new_updating_specify.count_by_type())

print("frames")

print(w_w_hb_new_updating_specify.frames)

print("times")

print(w_w_hb_new_updating_specify.times)

print("----")

print("All waters near the protein, guessing acceptors, hydrogen bond results (new module)")

print("count by time")

print(w_w_hb_new_updating_guess.count_by_time())

print("count by type")

print(w_w_hb_new_updating_guess.count_by_type())

print("frames")

print(w_w_hb_new_updating_guess.frames)

print("times")

print(w_w_hb_new_updating_guess.times)

print("----")

print("Waters near the protein, for the selection prepared to be empty in the 1st frame, specifying oxygens, hydrogen bond results (new module)")

print("count by time")

print(w_w_hb_new_updating_zero_first_specify.count_by_time())

print("count by type")

print(w_w_hb_new_updating_zero_first_specify.count_by_type())

print("frames")

print(w_w_hb_new_updating_zero_first_specify.frames)

print("times")

print(w_w_hb_new_updating_zero_first_specify.times)

print("----")

print("Waters near the protein, for the selection prepared to be empty in the 1st frame, guessing acceptors, hydrogen bond results (new module)")

print("count by time")

print(w_w_hb_new_updating_zero_first_guess.count_by_time())

print("count by type")

print(w_w_hb_new_updating_zero_first_guess.count_by_type())

print("frames")

print(w_w_hb_new_updating_zero_first_guess.frames)

print("times")

print(w_w_hb_new_updating_zero_first_guess.times)

 

#Printout

All waters near the protein, specifying oxygens, hydrogen bond results (new module)
count by time
[8 4 1 0 4 3 4 0 0 0]
count by type
[['TIP3:75' 'TIP3:75' '24']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
----
All waters near the protein, guessing acceptors, hydrogen bond results (new module)
count by time
[3267 3187 3196 3120 3183 3163 3135 3010 3045 2986]
count by type
[['TIP3:75' 'TIP3:75' '31292']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
----
Waters near the protein, for the selection prepared to be empty in the 1st frame, specifying oxygens, hydrogen bond results (new module)
count by time
[0 1 1 0 0 2 4 0 0 0]
count by type
[['TIP3:75' 'TIP3:75' '8']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
----
Waters near the protein, for the selection prepared to be empty in the 1st frame, guessing acceptors, hydrogen bond results (new module)
count by time
[3447 3366 3357 3283 3342 3327 3302 3171 3204 3133]
count by type
[['TIP3:75' 'ALA:70' '35']
 ['TIP3:75' 'ARG:70' '8']
 ['TIP3:75' 'ASN:55' '1']
 ['TIP3:75' 'ASN:70' '142']
 ['TIP3:75' 'ASP:70' '42']
 ['TIP3:75' 'ASP:72' '245']
 ['TIP3:75' 'GLN:70' '27']
 ['TIP3:75' 'GLU:70' '54']
 ['TIP3:75' 'GLU:72' '676']
 ['TIP3:75' 'GLY:70' '77']
 ['TIP3:75' 'HSE:52' '11']
 ['TIP3:75' 'HSE:70' '8']
 ['TIP3:75' 'ILE:70' '20']
 ['TIP3:75' 'LEU:70' '28']
 ['TIP3:75' 'LYS:70' '48']
 ['TIP3:75' 'MET:70' '17']
 ['TIP3:75' 'PHE:70' '30']
 ['TIP3:75' 'SER:70' '16']
 ['TIP3:75' 'SER:73' '18']
 ['TIP3:75' 'THR:70' '17']
 ['TIP3:75' 'THR:73' '61']
 ['TIP3:75' 'TIP3:75' '31292']
 ['TIP3:75' 'TYR:70' '6']
 ['TIP3:75' 'TYR:73' '13']
 ['TIP3:75' 'VAL:70' '40']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]

 

#Conduct hydrogen bonding analyses with the earlier module

#Many thanks to Paul Smith for explaining atoms do not need to be specified with the earlier module, in his Google group 11/7/21 post!

#Therefore I am only conducting analyses with entire water molecules as selections

#Ref: https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html, set angle to match new module (ref: https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html)

w_w_hb_old_updating = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, water_nearby_selection, "resname TIP3", selection1_type = "acceptor", angle = 150.0)

w_w_hb_old_updating.run(step = stride_for_analysis, selection1_type = "acceptor")

w_w_hb_old_updating_zero_first = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, water_nearby_selection_not_first_frame, "resname TIP3", selection1_type = "acceptor", angle = 150.0)

w_w_hb_old_updating_zero_first.run(step = stride_for_analysis, selection1_type = "acceptor")

 

#Warnings (abbreviating paths)

…/hbond_analysis.py:580: DeprecationWarning: This class is deprecated as of MDAnalysis version 1.0 and will be removed in version 2.0.Please use MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis instead.
  category=DeprecationWarning
…/hbond_analysis.py:673: SelectionWarning: No acceptors found in selection 1. You might have to specify a custom 'acceptors' keyword. Selection will update so continuing with fingers crossed.
  warnings.warn(errmsg, category=SelectionWarning)
…/hbond_analysis.py:673: SelectionWarning: No donors found in selection 2. You might have to specify a custom 'donors' keyword. Selection will update so continuing with fingers crossed.
  warnings.warn(errmsg, category=SelectionWarning)

 

 

#Report results

print("All waters near the protein hydrogen bond results (earlier module)")

print("count by time")

print(w_w_hb_old_updating.count_by_time())

print("----")

print("Waters near the protein, for the selection prepared to be empty in the 1st frame, hydrogen bond results (earlier module)")

print("count by time")

print(w_w_hb_old_updating_zero_first.count_by_time())

 

#Printout

All waters near the protein hydrogen bond results (earlier module)
count by time
[(  6.99999938, 9) ( 56.99999498, 6) (106.99999058, 1) (156.99998617, 0)
 (206.99998177, 7) (256.99997736, 5) (306.99997296, 4) (356.99996856, 0)
 (406.99996415, 0) (456.99995975, 0)]
----
Waters near the protein, for the selection prepared to be empty in the 1st frame, hydrogen bond results (earlier module)
count by time
[(  6.99999938, 0) ( 56.99999498, 1) (106.99999058, 1) (156.99998617, 0)
 (206.99998177, 1) (256.99997736, 4) (306.99997296, 4) (356.99996856, 0)
 (406.99996415, 0) (456.99995975, 0)]

 

print("count by type for w_w_hb_old_updating")

print(w_w_hb_old_updating.count_by_type())

 

#Output (abbreviating paths)

count by type for w_w_hb_old_updating
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-e6bb5399f115> in <module>
      1 print("count by type for w_w_hb_old_updating")
----> 2 print(w_w_hb_old_updating.count_by_type())
 
…/hbond_analysis.py in count_by_type(self)
   1264         # patch in donor heavy atom names (replaces '?' in the key)
   1265         h2donor = self._donor_lookup_table_byindex()
-> 1266         r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
   1267 
   1268         return r
 
…/hbond_analysis.py in <listcomp>(.0)
   1264         # patch in donor heavy atom names (replaces '?' in the key)
   1265         h2donor = self._donor_lookup_table_byindex()
-> 1266         r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
   1267 
   1268         return r
 
KeyError: 4347
 

 

print("count by type for w_w_hb_old_updating_zero_first")

print(w_w_hb_old_updating_zero_first.count_by_type())

 

#Output (abbreviating paths)

count by type for w_w_hb_old_updating_zero_first
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-d44d24315fcb> in <module>
      1 print("count by type for w_w_hb_old_updating_zero_first")
----> 2 print(w_w_hb_old_updating_zero_first.count_by_type())
 
…/hbond_analysis.py in count_by_type(self)
   1264         # patch in donor heavy atom names (replaces '?' in the key)
   1265         h2donor = self._donor_lookup_table_byindex()
-> 1266         r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
   1267 
   1268         return r
 
…/hbond_analysis.py in <listcomp>(.0)
   1264         # patch in donor heavy atom names (replaces '?' in the key)
   1265         h2donor = self._donor_lookup_table_byindex()
-> 1266         r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
   1267 
   1268         return r
 
KeyError: 4647

Paul Smith

unread,
Dec 13, 2021, 12:58:22 PM12/13/21
to MDnalysis discussion
Hi Dina,

I'm sorry for the very slow reply - I've not had much time to look through your questions until now. The different behaviour you're seeing is due to the different selection strings you use:

1) Acceptor selection for nearby oxygens, specifying oxygens, is:
"name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 protein))"

2) Acceptor selection for nearby oxygens, guessing acceptors, is:
"(resname TIP3 and name OH2)"

3) Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, specifying oxygens, is:
"name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 (protein and not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98))))"

4) Acceptor selection for nearby oxygens, for the selection prepared to be empty in the 1st frame, guessing acceptors, is 
""

The approach to use here is your first selection string, although you could simplify it to:

    "name OH2 and around 0.55 protein"
 
At each frame, this will find all OH2 atoms within the specified cutoff distance of the protein.

Your second selection string is selecting all water molecules, regardless of whether they are near the protein or not. So you are finding all water-water hydrogen bonds too, and thus find many more hydrogen bonds with this selection.

In your third selection string, you specify that you always want to consider only resids 19, 25, 26, 28, 44, 94, 98, and ignore all other residues of the protein. So this is why you see far fewer hydrogen bonds for this selection.

And your fourth selection string is empty, which creates the same problem you brought up last time in that an empty selection string passed to acceptors_sel will cause the acceptors to be guessed from all of the atoms (water and protein). If you have an empty selection string there is actually no need to run the HBA analysis, as it is guaranteed that you will find zero hydrogen bonds.

When you are creating complex selection strings with updating atom groups, the best thing to do before running the HBA analysis is to check that the string is selecting the atoms you want at each frame. 

As for the KeyError you get with the previous HBA analysis, it looks like this could be a bug in the code. It would be great if you could open an issue for this on github, an include a minimal example that produces the error (i.e. include the code you have above for the previous HBA module).

Sorry again for the slow reply!

Cheers,
Paul

Daniel Madulu SHADRACK

unread,
Dec 13, 2021, 6:20:37 PM12/13/21
to mdnalysis-...@googlegroups.com
Dear MDanalysis,
I want to extract a water molecules with in a cutoff along the x coordinate only, I have tried it but I get either an error or I get all the water along the xyz coordinate. I have checked the manual but I have not succeeded, the code I am using is here below.

ag_updating = u.select_atoms("prop x < 2 and prop x < 4", updating=True)
with mda.Writer('j.xyz', ag.n_atoms) as w:
     for ts in u.trajectory:
        w.write(ag)
     

Your help is appreciated.
Daniel.


--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/f9be860d-02fd-431c-a3e6-24ad2db4802en%40googlegroups.com.


--
Daniel M Shadrack
Department of Health and Biomedical sciences
Nanomedicine & Computational Chemistry
NM-AIST
P.O.Box 447
Arusha, Tanzania.

Oliver Beckstein

unread,
Dec 13, 2021, 6:28:36 PM12/13/21
to mdnalysis-discussion
Hi Daniel,

Welcome to the MDAnalysis mailing list!

> On Dec 13, 2021, at 3:46 PM, Daniel Madulu SHADRACK <shad...@nm-aist.ac.tz> wrote:
>
> Dear MDanalysis,
> I want to extract a water molecules with in a cutoff along the x coordinate only, I have tried it but I get either an error

What is the error?

> or I get all the water along the xyz coordinate.

Do you want to get waters that are within 2Å < z < 4Å ?

One problem is that this number of waters will almost certainly change. MDAnalysis cannot process trajectories with changing number of particles.

Do you need a trajectory or could you just do some analysis on these waters?

> I have checked the manual but I have not succeeded, the code I am using is here below.
>
> ag_updating = u.select_atoms("prop x < 2 and prop x < 4", updating=True)

At the moment your condition just says “x < 2” (and note that these are in Ångströms. If you wanted “z between 2 and 4 Å” then you would use ""prop x > 2 and prop x < 4”).

> with mda.Writer('j.xyz', ag.n_atoms) as w:
> for ts in u.trajectory:
> w.write(ag)

Instead of writing, first try just analyzing, e.g., count the number of water molecules:

ag_updating = u.select_atoms("prop x > 2 and prop x < 4", updating=True)
for ts in u.trajectory:
print(ts.frame, ag_updating.n_atoms)

See if this gives you what you expect (namely at least some numbers).

If this is not helpful then please try to explain in more detail what you want to achieve.

Best,
Oliver


>
>
> Your help is appreciated.
> Daniel.

--
Oliver Beckstein (he/his/him)

email: orbe...@mdanalysis.org
twitter: @orbeckst

MDAnalysis – a NumFOCUS fiscally sponsored project
https://www.mdanalysis.org/





Daniel Madulu SHADRACK

unread,
Dec 14, 2021, 1:37:43 AM12/14/21
to mdnalysis-...@googlegroups.com
Dear Oliver 
Thank you for the feedback. My problems is.

 I want to do a mean first passage time of water. And, I did not find if I can do it directly..., Therefore, I decided to do the following.

first. I want to calculate the time the water croses the boundary, let say -xl and +xl of the X coordinate.

Therefore, I want to get a trajectory of water, for example at 2A>= X <=4A, and plot them as a function of time i.e X(t) Vs time. 

Second, I estimate the passage of each water molecule. 

Now. How do I select some water molecule along the X coordinate? And plot them as function of time. 

Eg for now I am interested with getting at least one molecule of water and plot it against time.

That's is my problems

Your assistance is much appreciated
Daniel.




--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.

Dina Sharon

unread,
Dec 23, 2021, 9:19:20 PM12/23/21
to mdnalysis-...@googlegroups.com

Dear Paul,

 

I thank you so much for helping me and sharing these very helpful updates with me! I really appreciate your explaining considerations in these different selection strings and how these different selection strings lead to different results. Thank you for analyzing these different situations! I appreciate your suggestion on KeyError next steps, and I have opened an issue for this on GitHub. If possible, I am hoping to please check in on a few follow-ups. I am including a code example below this email, which contains observations noted in this email.

 

I apologize that I am not certain if I am understanding some aspects of selections, and I hope to please check in on your analysis of different selections. I really appreciate your explaining how using the atom name and distance cutoff in a HydrogenBondAnalysis selection string such as “name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 protein))” will help lead to finding the hydrogen bonds in which waters very close to the protein accept hydrogen bonds. I apologize if I am misunderstanding: looking through your suggestion, comparing my selection "name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 protein))" and your suggested "name OH2 and around 0.55 protein", would the latter include water oxygens which are not within 0.55 Å of the protein but which are covalently bonded to hydrogens which are within 0.55 Å of the protein, as I believe the former does?

 

Also, to follow up on your mentioning “In your third selection string, you specify that you always want to consider only resids 19, 25, 26, 28, 44, 94, 98, and ignore all other residues of the protein. So this is why you see far fewer hydrogen bonds for this selection.”, would my third selection string "name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 (protein and not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98))))" select waters very near only protein residues 19, 25, 26, 28, 44, 94, and 98, or waters very near all protein residues except 19, 25, 26, 28, 44, 94, and 98 (ignoring waters very near this set of residues, but considering waters very near all other residues)? I apologize if I am misunderstanding!

 

I wanted to please check in on cases where the updating selection has varying atom names, as my code example below shows. This could be relevant for a case where the updating selection involves protein atoms within a certain distance cutoff from another selection. In my code below I use the example of finding all of the protein-protein hydrogen bonds in which a residue very close to water is an acceptor. I changed my code example from 11/23/21 to have a temporarily empty updating selection that includes protein atoms, not water atoms. Here, as in the water case, guess_acceptors leads to finding hydrogen bonds for frames where the updating selection (which is not always empty) is temporarily empty, as my earlier message discusses in more detail. As the code example below shows, if I understand correctly, it appears that I can obtain the output of interest by running guess_acceptors on the entire protein, obtaining a protein acceptor string, and then, in a subsequent HydrogenBondAnalysis command, using this protein acceptor string with my updating selection and not running guess_acceptors.

 

If possible, I wanted to please check in:

 

Is it safe to conclude from the results of this example that, in contrast to the proline case, if an updating selection is sometimes empty and sometimes not and guessing acceptors or hydrogens is not used, then there should not be issues with the output obtained previously from the older or the new hydrogen bond modules? Or are there further checks you would suggest before concluding this?

 

Should I be concerned by differences in the new (with specifying, not guessing) module’s and older module’s hydrogen bond count outputs, or are these due to implementation considerations?

 

If guessing acceptors or hydrogens is used, is it correct to conclude the atom selection must be non-empty at every frame for guessing to work correctly and the output to be correct? If so, if I understand correctly, if there is a chance the selection is ever empty, I should be checking the selection in each frame, iterating through the frames and finding the selection size and atoms, before running HydrogenBondAnalysis: is this correct? Are there other guessing acceptors or hydrogens considerations to keep in mind with updating selections?

 

If I find an updating selection is empty in at least one frame, is a possible next step then to run guess_acceptors/hydrogens and substitute the resulting string into my updating selection, and then, in a subsequent HydrogenBondAnalysis command, to use this revised updating selection and not run guess_acceptors/hydrogens? Or would you recommend a different approach?

 

I really appreciate your helping me with my questions and all you do to make MDAnalysis such a wonderful toolkit! I send all best wishes for very happy holidays and a very happy new year!

 

With very many thanks,

Dina

 

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

from MDAnalysis.analysis import hbonds

import MDAnalysisData

from MDAnalysisData import datasets

from MDAnalysis.tests.datafiles import PSF, DCD

 

#Download test data (ref: https://www.mdanalysis.org/MDAnalysisData/usage.html,   https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html#module-MDAnalysisData.ifabp_water,   https://figshare.com/articles/dataset/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030/1)

ifabp_data = MDAnalysisData.ifabp_water.fetch_ifabp_water()

test_univ_for_updating_selections = MDAnalysis.Universe(ifabp_data.topology, ifabp_data.trajectory)

 

#Set stride, since having only a few frames will facilitate inspection

stride_for_analysis = 50

 

#Select protein residues so that the selection will sometimes be empty ref https://userguide.mdanalysis.org/stable/selections.html?highlight=selections

protein_nearby_selection = "protein and same resid as (protein and around 0.55 resname TIP3)"

 

#From a check, I found the protein resids with water nearby in the 1st frame (the subsequent code has more details), and in the protein_nearby_selection_not_first_frame selection I exclude these resids to create a selection which is empty in the 1st frame but not empty in all frames

protein_nearby_selection_not_first_frame = "protein and (same resid as (protein and around 0.55 resname TIP3)) and (not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98))"

 

#Iterate over the frames, showing how protein_nearby_selection and protein_nearby_selection_not_first_frame sometimes contain 0 atoms, and how protein_nearby_selection contains atoms in the 1st frame while protein_nearby_selection_not_first_frame does not contain atoms in the 1st frame

#Also show the protein resids, which I checked

for fr_check in test_univ_for_updating_selections.trajectory[::stride_for_analysis]:

 

    #Print protein selections' atom counts in the frame

    print(f"Frame {fr_check.frame} time {fr_check.time} count of protein_nearby_selection atoms {len(test_univ_for_updating_selections.select_atoms(protein_nearby_selection))} count of protein_nearby_selection_not_first_frame atoms {len(test_univ_for_updating_selections.select_atoms(protein_nearby_selection_not_first_frame))}")

 

    #Obtain resids of the protein residues near water in the frame

    #Only unique residues are reported, ref: https://www.geeksforgeeks.org/python-get-unique-values-list/

    unique_protein_residues_near_water = sorted(list(set(p_atom.resid for p_atom in test_univ_for_updating_selections.select_atoms(protein_nearby_selection))))

    print("Unique protein resids:")

    print(unique_protein_residues_near_water)

    print("----")

 

#Printout

Frame 0 time 6.99999938344013 count of protein_nearby_selection atoms 114 count of protein_nearby_selection_not_first_frame atoms 0
Unique protein resids:
[19, 25, 26, 28, 44, 94, 98]
----
Frame 50 time 56.99999497944106 count of protein_nearby_selection atoms 90 count of protein_nearby_selection_not_first_frame atoms 36
Unique protein resids:
[25, 26, 79, 94, 129]
----
Frame 100 time 106.99999057544198 count of protein_nearby_selection atoms 36 count of protein_nearby_selection_not_first_frame atoms 36
Unique protein resids:
[79, 100]
----
Frame 150 time 156.9999861714429 count of protein_nearby_selection atoms 0 count of protein_nearby_selection_not_first_frame atoms 0
Unique protein resids:
[]
----
Frame 200 time 206.99998176744384 count of protein_nearby_selection atoms 67 count of protein_nearby_selection_not_first_frame atoms 29
Unique protein resids:
[25, 94, 99, 100]
----
Frame 250 time 256.99997736344477 count of protein_nearby_selection atoms 82 count of protein_nearby_selection_not_first_frame atoms 52
Unique protein resids:
[24, 25, 29, 96, 98]
----
Frame 300 time 306.9999729594457 count of protein_nearby_selection atoms 28 count of protein_nearby_selection_not_first_frame atoms 28
Unique protein resids:
[96, 97]
----
Frame 350 time 356.9999685554466 count of protein_nearby_selection atoms 0 count of protein_nearby_selection_not_first_frame atoms 0
Unique protein resids:
[]
----
Frame 400 time 406.99996415144756 count of protein_nearby_selection atoms 16 count of protein_nearby_selection_not_first_frame atoms 0
Unique protein resids:
[26]
----
Frame 450 time 456.9999597474485 count of protein_nearby_selection atoms 22 count of protein_nearby_selection_not_first_frame atoms 22
Unique protein resids:
[29]
----

 

#Analyze hydrogen bonds between protein residues in which a protein residue very close to water is an acceptor

#New module, for these selections that, as shown above, sometimes contain atoms and sometimes do not, trying guess_acceptors

#Ref: https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds-selections.html,   https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html

#Many thanks to Paul Smith for sharing the tutorial and for advice on when to use update_selections = False and on count_by_type()!

#Here I want the selections to update, so I keep the default update_selections value of True

 

#Guess acceptors, for a selection which is not empty in the 1st frame

#Also need to guess protein hydrogens

p_hb_new_updating_guess = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)

p_hb_new_updating_guess.acceptors_sel = p_hb_new_updating_guess.guess_acceptors(protein_nearby_selection)

p_hb_new_updating_guess.hydrogens_sel = p_hb_new_updating_guess.guess_hydrogens("protein")

print(f"Acceptors selection for residues, guessing acceptors, for the selection prepared not to be empty in the 1st frame, is {p_hb_new_updating_guess.acceptors_sel}")

print("----------")

print(f"Hydrogens selection is {p_hb_new_updating_guess.hydrogens_sel}")

print("----------")

p_hb_new_updating_guess.run(step = stride_for_analysis)

 

#Guess acceptors, for a selection which is empty in the 1st frame

#Also need to guess protein hydrogens

p_hb_new_updating_zero_first_guess = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)

p_hb_new_updating_zero_first_guess.acceptors_sel = p_hb_new_updating_zero_first_guess.guess_acceptors(protein_nearby_selection_not_first_frame)

p_hb_new_updating_zero_first_guess.hydrogens_sel = p_hb_new_updating_zero_first_guess.guess_hydrogens("protein")

print(f"Acceptors selection for residues, guessing acceptors, for the selection prepared to be empty in the 1st frame, is {p_hb_new_updating_zero_first_guess.acceptors_sel}")

print("----------")

print(f"Hydrogens selection is {p_hb_new_updating_zero_first_guess.hydrogens_sel}")

print("----------")

p_hb_new_updating_zero_first_guess.run(step = stride_for_analysis)

 

#Printout

Acceptors selection for residues, guessing acceptors, for the selection prepared not to be empty in the 1st frame, is (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLY and name O) or (resname LYS and name O) or (resname VAL and name O)
----------
Hydrogens selection is (resname ALA and name HN) or (resname ALA and name HT1) or (resname ALA and name HT2) or (resname ALA and name HT3) or (resname ARG and name HE) or (resname ARG and name HH11) or (resname ARG and name HH12) or (resname ARG and name HH21) or (resname ARG and name HH22) or (resname ARG and name HN) or (resname ASN and name HD21) or (resname ASN and name HD22) or (resname ASN and name HN) or (resname ASP and name HN) or (resname GLN and name HE21) or (resname GLN and name HE22) or (resname GLN and name HN) or (resname GLU and name HN) or (resname GLY and name HN) or (resname HSE and name HE2) or (resname HSE and name HN) or (resname ILE and name HN) or (resname LEU and name HN) or (resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3) or (resname MET and name HN) or (resname PHE and name HN) or (resname SER and name HG1) or (resname SER and name HN) or (resname THR and name HG1) or (resname THR and name HN) or (resname TRP and name HE1) or (resname TRP and name HN) or (resname TYR and name HH) or (resname TYR and name HN) or (resname VAL and name HN)
----------
Acceptors selection for residues, guessing acceptors, for the selection prepared to be empty in the 1st frame, is 
----------
Hydrogens selection is (resname ALA and name HN) or (resname ALA and name HT1) or (resname ALA and name HT2) or (resname ALA and name HT3) or (resname ARG and name HE) or (resname ARG and name HH11) or (resname ARG and name HH12) or (resname ARG and name HH21) or (resname ARG and name HH22) or (resname ARG and name HN) or (resname ASN and name HD21) or (resname ASN and name HD22) or (resname ASN and name HN) or (resname ASP and name HN) or (resname GLN and name HE21) or (resname GLN and name HE22) or (resname GLN and name HN) or (resname GLU and name HN) or (resname GLY and name HN) or (resname HSE and name HE2) or (resname HSE and name HN) or (resname ILE and name HN) or (resname LEU and name HN) or (resname LYS and name HN) or (resname LYS and name HZ1) or (resname LYS and name HZ2) or (resname LYS and name HZ3) or (resname MET and name HN) or (resname PHE and name HN) or (resname SER and name HG1) or (resname SER and name HN) or (resname THR and name HG1) or (resname THR and name HN) or (resname TRP and name HE1) or (resname TRP and name HN) or (resname TYR and name HH) or (resname TYR and name HN) or (resname VAL and name HN)
----------

 

#Print results

print("Selected protein residues, guessing acceptors, for the selection prepared to not be empty in the 1st frame, hydrogen bond results (new module)")

print("count by time")

print(p_hb_new_updating_guess.count_by_time())

print("count by type")

print(p_hb_new_updating_guess.count_by_type())

print("frames")

print(p_hb_new_updating_guess.frames)

print("times")

print(p_hb_new_updating_guess.times)

print("----")

 

print("Selected protein residues, guessing acceptors, for the selection prepared to be empty in the 1st frame, hydrogen bond results (new module)")

print("count by time")

print(p_hb_new_updating_zero_first_guess.count_by_time())

print("count by type")

print(p_hb_new_updating_zero_first_guess.count_by_type())

print("frames")

print(p_hb_new_updating_zero_first_guess.frames)

print("times")

print(p_hb_new_updating_zero_first_guess.times)

 

#Printout

Selected protein residues, guessing acceptors, for the selection prepared to not be empty in the 1st frame, hydrogen bond results (new module)
count by time
[21 28 18 19 25 28 27 28 24 25]
count by type
[['ALA:54' 'ARG:70' '2']
 ['ALA:54' 'GLY:70' '5']
 ['ARG:54' 'ASN:70' '2']
 ['ARG:54' 'LYS:70' '6']
 ['ARG:57' 'ASN:70' '9']
 ['ARG:57' 'GLU:72' '14']
 ['ASN:55' 'ASN:70' '1']
 ['ASN:55' 'GLU:70' '2']
 ['GLN:54' 'ARG:70' '7']
 ['GLN:55' 'GLY:70' '7']
 ['GLU:54' 'GLU:70' '7']
 ['GLU:54' 'LYS:70' '21']
 ['GLY:54' 'ARG:70' '1']
 ['GLY:54' 'GLU:70' '1']
 ['GLY:54' 'LYS:70' '3']
 ['HSE:54' 'LYS:70' '2']
 ['ILE:54' 'ARG:70' '3']
 ['ILE:54' 'GLU:70' '9']
 ['ILE:54' 'GLY:70' '7']
 ['LEU:54' 'ARG:70' '6']
 ['LEU:54' 'ASN:70' '5']
 ['LEU:54' 'VAL:70' '4']
 ['LYS:54' 'ASN:70' '2']
 ['LYS:54' 'GLU:70' '13']
 ['LYS:54' 'LYS:70' '15']
 ['LYS:54' 'VAL:70' '2']
 ['LYS:56' 'ASN:70' '2']
 ['LYS:56' 'GLU:72' '27']
 ['PHE:54' 'ASN:70' '1']
 ['SER:54' 'ASN:70' '1']
 ['SER:54' 'GLU:70' '7']
 ['SER:54' 'LYS:70' '6']
 ['SER:73' 'ASN:70' '1']
 ['THR:54' 'GLU:72' '1']
 ['THR:54' 'LYS:70' '5']
 ['THR:54' 'VAL:70' '15']
 ['THR:73' 'GLU:72' '5']
 ['TYR:54' 'VAL:70' '4']
 ['VAL:54' 'VAL:70' '12']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
----
Selected protein residues, guessing acceptors, for the selection prepared to be empty in the 1st frame, hydrogen bond results (new module)
count by time
[109 128 115 119 118 119 116 124 111 124]
count by type
[['ALA:54' 'ARG:70' '2']
 ['ALA:54' 'GLY:70' '5']
 ['ALA:54' 'TIP3:75' '7']
 ['ALA:54' 'TYR:70' '6']
 ['ALA:56' 'ASP:72' '2']
 ['ALA:56' 'GLN:70' '1']
 ['ALA:56' 'TIP3:75' '19']
 ['ARG:54' 'ASN:70' '2']
 ['ARG:54' 'GLN:70' '5']
 ['ARG:54' 'LEU:70' '8']
 ['ARG:54' 'LYS:70' '6']
 ['ARG:57' 'ASN:70' '9']
 ['ARG:57' 'ASP:72' '44']
 ['ARG:57' 'GLN:70' '4']
 ['ARG:57' 'GLU:72' '14']
 ['ARG:57' 'ILE:70' '8']
 ['ARG:57' 'SER:70' '6']
 ['ARG:57' 'TIP3:75' '98']
 ['ARG:57' 'TYR:73' '2']
 ['ASN:54' 'ASP:70' '4']
 ['ASN:54' 'ASP:72' '1']
 ['ASN:54' 'TIP3:75' '24']
 ['ASN:55' 'ALA:70' '1']
 ['ASN:55' 'ASN:70' '1']
 ['ASN:55' 'ASP:72' '1']
 ['ASN:55' 'GLU:70' '2']
 ['ASN:55' 'SER:73' '1']
 ['ASN:55' 'TIP3:75' '30']
 ['ASP:54' 'ASP:72' '2']
 ['ASP:54' 'ILE:70' '9']
 ['ASP:54' 'TIP3:75' '5']
 ['GLN:54' 'ARG:70' '7']
 ['GLN:54' 'ASP:72' '7']
 ['GLN:54' 'TIP3:75' '2']
 ['GLN:55' 'GLY:70' '7']
 ['GLN:55' 'TIP3:75' '5']
 ['GLU:54' 'GLU:70' '7']
 ['GLU:54' 'ILE:70' '8']
 ['GLU:54' 'LYS:70' '21']
 ['GLU:54' 'SER:70' '1']
 ['GLU:54' 'THR:70' '3']
 ['GLU:54' 'TIP3:75' '19']
 ['GLY:54' 'ALA:70' '8']
 ['GLY:54' 'ARG:70' '1']
 ['GLY:54' 'GLU:70' '1']
 ['GLY:54' 'ILE:70' '5']
 ['GLY:54' 'LEU:70' '5']
 ['GLY:54' 'LYS:70' '3']
 ['GLY:54' 'TIP3:75' '15']
 ['GLY:54' 'TRP:70' '4']
 ['HSE:51' 'TIP3:75' '2']
 ['HSE:54' 'LYS:70' '2']
 ['ILE:54' 'ARG:70' '3']
 ['ILE:54' 'GLU:70' '9']
 ['ILE:54' 'GLY:70' '7']
 ['ILE:54' 'MET:70' '4']
 ['ILE:54' 'THR:70' '2']
 ['ILE:54' 'TIP3:75' '9']
 ['LEU:54' 'ARG:70' '6']
 ['LEU:54' 'ASN:70' '5']
 ['LEU:54' 'PHE:70' '10']
 ['LEU:54' 'THR:70' '8']
 ['LEU:54' 'TIP3:75' '1']
 ['LEU:54' 'TRP:70' '9']
 ['LEU:54' 'TYR:70' '3']
 ['LEU:54' 'VAL:70' '4']
 ['LYS:54' 'ASN:70' '2']
 ['LYS:54' 'GLU:70' '13']
 ['LYS:54' 'LYS:70' '15']
 ['LYS:54' 'SER:70' '8']
 ['LYS:54' 'THR:70' '5']
 ['LYS:54' 'TIP3:75' '8']
 ['LYS:54' 'VAL:70' '2']
 ['LYS:56' 'ASN:70' '2']
 ['LYS:56' 'ASP:72' '8']
 ['LYS:56' 'GLU:72' '27']
 ['LYS:56' 'THR:73' '4']
 ['LYS:56' 'TIP3:75' '240']
 ['MET:54' 'TIP3:75' '5']
 ['MET:54' 'TYR:70' '3']
 ['PHE:54' 'ASN:70' '1']
 ['PHE:54' 'LEU:70' '11']
 ['PHE:54' 'PHE:70' '12']
 ['PHE:54' 'SER:73' '2']
 ['PHE:54' 'TIP3:75' '7']
 ['SER:54' 'ASN:70' '1']
 ['SER:54' 'GLU:70' '7']
 ['SER:54' 'LYS:70' '6']
 ['SER:73' 'ASN:70' '1']
 ['SER:73' 'TIP3:75' '30']
 ['THR:54' 'GLU:72' '1']
 ['THR:54' 'ILE:70' '6']
 ['THR:54' 'LYS:70' '5']
 ['THR:54' 'THR:70' '11']
 ['THR:54' 'TIP3:75' '6']
 ['THR:54' 'VAL:70' '15']
 ['THR:73' 'ASP:72' '9']
 ['THR:73' 'GLU:72' '5']
 ['THR:73' 'THR:73' '2']
 ['THR:73' 'TIP3:75' '61']
 ['TRP:54' 'LEU:70' '9']
 ['TRP:54' 'TIP3:75' '5']
 ['TRP:58' 'PHE:70' '2']
 ['TRP:58' 'TIP3:75' '2']
 ['TYR:54' 'ALA:70' '7']
 ['TYR:54' 'TIP3:75' '1']
 ['TYR:54' 'VAL:70' '4']
 ['TYR:73' 'TIP3:75' '23']
 ['VAL:54' 'LEU:70' '3']
 ['VAL:54' 'THR:70' '16']
 ['VAL:54' 'TIP3:75' '14']
 ['VAL:54' 'TYR:70' '2']
 ['VAL:54' 'VAL:70' '12']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
 
#What are all the acceptor names?
#Run HydrogenBondAnalysis guess_acceptors in order to obtain acceptor names, which I will use in my new selections
p_check_acc = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)
p_check_acc.acceptors_sel = p_check_acc.guess_acceptors("protein")
print(p_check_acc.acceptors_sel)
 
#Printout
(resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)
 
 
#Now create selections with the acceptor names replacing "protein", revising protein_nearby_selection and protein_nearby_selection_not_first_frame
protein_nearby_selection_specify_acceptors = "((resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)) and same resid as (protein and around 0.55 resname TIP3)"
protein_nearby_selection_specify_acceptors_not_first_frame = "((resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)) and same resid as (protein and around 0.55 resname TIP3) and (not (resid 19 or resid 25 or resid 26 or resid 28 or resid 44 or resid 94 or resid 98))"
 
#Specify acceptors, for a selection which is not empty in the 1st frame
#Also need to guess protein hydrogens
p_hb_new_updating_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)
p_hb_new_updating_specify.acceptors_sel = protein_nearby_selection_specify_acceptors
p_hb_new_updating_specify.hydrogens_sel = p_hb_new_updating_specify.guess_hydrogens("protein")
p_hb_new_updating_specify.run(step = stride_for_analysis)
 
#Specify acceptors, for a selection which is empty in the 1st frame
#Also need to guess protein hydrogens
p_hb_new_updating_zero_first_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)
p_hb_new_updating_zero_first_specify.acceptors_sel = protein_nearby_selection_specify_acceptors_not_first_frame
p_hb_new_updating_zero_first_specify.hydrogens_sel = p_hb_new_updating_zero_first_specify.guess_hydrogens("protein")
p_hb_new_updating_zero_first_specify.run(step = stride_for_analysis)
 
#Print results from these specifying protein acceptor HydrogenBondAnalysis analyses
print("Selected protein residues, specifying acceptors, for the selection prepared to not be empty in the 1st frame, hydrogen bond results (new module)")
print("count by time")
print(p_hb_new_updating_specify.count_by_time())
print("count by type")
print(p_hb_new_updating_specify.count_by_type())
print("frames")
print(p_hb_new_updating_specify.frames)
print("times")
print(p_hb_new_updating_specify.times)
print("----")
 
print("Selected protein residues, specifying acceptors, for the selection prepared to be empty in the 1st frame, hydrogen bond results (new module)")
print("count by time")
print(p_hb_new_updating_zero_first_specify.count_by_time())
print("count by type")
print(p_hb_new_updating_zero_first_specify.count_by_type())
print("frames")
print(p_hb_new_updating_zero_first_specify.frames)
print("times")
print(p_hb_new_updating_zero_first_specify.times)
 
#Printout
Selected protein residues, specifying acceptors, for the selection prepared to not be empty in the 1st frame, hydrogen bond results (new module)
count by time
[2 2 0 0 1 2 2 0 0 0]
count by type
[['ALA:54' 'ARG:70' '1']
 ['ARG:54' 'ASN:70' '1']
 ['ARG:57' 'ASP:72' '2']
 ['ARG:57' 'GLU:72' '1']
 ['LEU:54' 'VAL:70' '1']
 ['LYS:54' 'LYS:70' '1']
 ['LYS:54' 'VAL:70' '1']
 ['LYS:56' 'ASN:70' '1']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
----
Selected protein residues, specifying acceptors, for the selection prepared to be empty in the 1st frame, hydrogen bond results (new module)
count by time
[0 1 0 0 0 2 2 0 0 0]
count by type
[['ARG:54' 'ASN:70' '1']
 ['ARG:57' 'ASP:72' '2']
 ['LYS:54' 'LYS:70' '1']
 ['LYS:56' 'ASN:70' '1']]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
 

 

#Conduct hydrogen bonding analyses with the earlier module

#Many thanks to Paul Smith for explaining atoms do not need to be specified with the earlier module, in his Google group 11/7/21 post!

#Therefore I am only conducting analyses with entire protein residues selected

p_hb_old_updating = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, protein_nearby_selection, "protein", selection1_type = "acceptor", angle = 150.0)

p_hb_old_updating.run(step = stride_for_analysis, selection1_type = "acceptor")

p_hb_old_updating_zero_first = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, protein_nearby_selection_not_first_frame, "protein", selection1_type = "acceptor", angle = 150.0)

p_hb_old_updating_zero_first.run(step = stride_for_analysis, selection1_type = "acceptor")

 

#Received warnings, details in 11/23/21 message

 

#Report results

print("Protein selection, sometimes empty, but not in the 1st frame, hydrogen bond results (earlier module)")

print("count by time")

print(p_hb_old_updating.count_by_time())

print("----")

print("Protein selection, sometimes empty, including in the 1st frame, hydrogen bond results (earlier module)")

print("count by time")

print(p_hb_old_updating_zero_first.count_by_time())

 

#Printout

Protein selection, sometimes empty, but not in the 1st frame, hydrogen bond results (earlier module)
count by time
[(  6.99999938, 4) ( 56.99999498, 4) (106.99999058, 2) (156.99998617, 0)
 (206.99998177, 2) (256.99997736, 4) (306.99997296, 2) (356.99996856, 0)
 (406.99996415, 1) (456.99995975, 0)]
----
Protein selection, sometimes empty, including in the 1st frame, hydrogen bond results (earlier module)
count by time
[(  6.99999938, 0) ( 56.99999498, 2) (106.99999058, 2) (156.99998617, 0)
 (206.99998177, 1) (256.99997736, 3) (306.99997296, 2) (356.99996856, 0)
 (406.99996415, 0) (456.99995975, 0)]

 


Paul Smith

unread,
Dec 30, 2021, 10:40:50 AM12/30/21
to MDnalysis discussion
Hi Dina,

Sorry for the confusion of what I wrote about your selection strings. For your first selection string, you're right that:

 "name OH2 and (resname TIP3 and same resid as (resname TIP3 and around 0.55 protein))"

and:

"name OH2 and around 0.55 protein"

would select different atoms. However, your original string could be simplified to:

"name OH2 and same residue as around 0.55 protein"

And for you third selection string, I missed the "not" in the selection, so you're correct that residues [19, 25, 26, 28, 44, 94, 98] will be ignored.

In terms of your questions about empty selections, there are two different things to consider:

i) Passing a selection string that sometimes results in an empty atom group (e.g. "name NA and around 3 protein"). This is absolutely fine to do. The atoms will be guessed at each frame using the string you supply.

ii) Passing an empty selection string. Currently,  if you pass an empty string to hydrogens_sel, then hydrogens will be guessed from all atoms in the universe (likewise if you pass an empty string to acceptors_sel). This is what happened in the case of your proline selection - you passed an empty string as the hydrogen selection. If you had instead passed e.g. "resname PRO and name H*" as the hydrogen selection string, then HBA would have performed the expected analysis (and found 0 hydrogen bonds as proline has no hydrogen atoms to donate). 

About the differences in count_by_type methods of the previous and current HBA classes, I'm not sure where these differences arise. It would be very useful if you could identify a hydrogen bond that is found by the previous implementation but not by the current one (or vice versa), then calculate by hand whether there is indeed a hydrogen bond (based on the geometric criteria you set). If it turns out to be an issue with the current implementation then that is definitely something I would need to look into.

Sorry again for the confusion in my previous email, let me know if anything is still unclear!

Cheers,
Paul

Dina Sharon

unread,
Jan 25, 2022, 4:58:29 PM1/25/22
to mdnalysis-...@googlegroups.com

Dear Paul,

 

I thank you so much for sharing these very helpful updates with me! I apologize for the delay in my following up on your very helpful email, which I really appreciated.

 

I really appreciate your answering my questions about selection strings and your advice on condensing selection strings.

 

Looking at (i) and (ii) in your email, my results (in my 11/23 and 12/23 messages’ code) suggest, as you indicated, using a sometimes empty selection directly in HBA (without guessing) works as expected.

 

However, looking at (i) in your email, regarding “Passing a selection string that sometimes results in an empty atom group, when I input a selection that is sometimes empty and sometimes not, and I use guessing,  it looks like the output of guess_acceptors is not what I would have expected. Many protein acceptors are output if the input to guess_acceptors is not empty in the first frame, and an empty string is output if the input to guess_acceptors is empty in the first frame (but is not empty in all frames). This looks to lead to unexpected results, with hydrogen bonds appearing for frames where the selection is empty when I use guessing and many more hydrogen bonds appearing when I use guessing than when I do not use guessing.

 

To be more specific, here is a minimal example (adapted from my 11/23 and 12/23 messages’ code) where I compare guessing acceptors and specifying acceptors for a selection which is not empty in the first frame:

 

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

import MDAnalysisData

from MDAnalysisData import datasets

from MDAnalysis.tests.datafiles import PSF, DCD

 

#Download test data (ref: https://www.mdanalysis.org/MDAnalysisData/usage.html,  https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html#module-MDAnalysisData.ifabp_water,  https://figshare.com/articles/dataset/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030/1)

ifabp_data = MDAnalysisData.ifabp_water.fetch_ifabp_water()

test_univ_for_updating_selections = MDAnalysis.Universe(ifabp_data.topology, ifabp_data.trajectory)

 

#Set stride, since having only a few frames will facilitate inspection

stride_for_analysis = 50

 

#Select protein residues so that the selection will sometimes be empty (as code in my 12/23 message shows) ref https://userguide.mdanalysis.org/stable/selections.html?highlight=selections

protein_nearby_selection = "protein and same resid as (protein and around 0.55 resname TIP3)"

 

#Analyze hydrogen bonds between protein residues in which a protein residue very close to water is an acceptor

#New module, for this selection that, as shown in earlier code, sometimes contains atoms and sometimes does not

#Trying guess_acceptors and specifying acceptors

#Ref: https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds-selections.html, https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html

#Many thanks to Paul Smith for sharing the tutorial and for advice on when to use update_selections = False and on count_by_type()!

#Here I want the selections to update, so I keep the default update_selections value of True

 

#A. Guess acceptors, for a selection which is not empty in the 1st frame

#Also need to guess protein hydrogens

p_hb_new_updating_guess = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)

p_hb_new_updating_guess.acceptors_sel = p_hb_new_updating_guess.guess_acceptors(protein_nearby_selection)

p_hb_new_updating_guess.hydrogens_sel = p_hb_new_updating_guess.guess_hydrogens("protein")

print(f"Acceptors selection for residues, guessing acceptors, for the selection prepared not to be empty in the 1st frame, is {p_hb_new_updating_guess.acceptors_sel}")

print("----------")

p_hb_new_updating_guess.run(step = stride_for_analysis)

print("count by time, guessing acceptors")

print(p_hb_new_updating_guess.count_by_time())

 

#B. Specify acceptors: set acceptors_sel as all the acceptors within 0.55 of water, not using guess_acceptors

#The code in my 12/23 message shows how I obtained this acceptors string

p_hb_new_updating_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, acceptors_sel = "((resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)) and same resid as (protein and around 0.55 resname TIP3)", d_h_a_angle_cutoff = 150.0)

p_hb_new_updating_specify.hydrogens_sel = p_hb_new_updating_specify.guess_hydrogens("protein")

p_hb_new_updating_specify.run(step = stride_for_analysis)

print("count by time, specifying acceptors")

print(p_hb_new_updating_specify.count_by_time())

 

#Printout

Acceptors selection for residues, guessing acceptors, for the selection prepared not to be empty in the 1st frame, is (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLY and name O) or (resname LYS and name O) or (resname VAL and name O)
----------
count by time, guessing acceptors
[21 28 18 19 25 28 27 28 24 25]
count by time, specifying acceptors
[2 2 0 0 1 2 2 0 0 0]

 

 

Code in my 11/23 and 12/23 messages contains more details, and I would be happy to provide more information.

 

If possible, I wanted to please check in: am I correct in my interpretation of the code’s doing what I think it is doing?

 

If so, I would be eager to please ask some follow-up questions, but your email described different behavior than what I believe I am seeing in my code, so I first wanted to please see if I am coding and understanding things correctly or if there is a mistake somewhere in my code or interpretation.

 

Regarding differences between the two modules, I am glad to report that, on closer inspection, my checks thus far indicate the differences I observed in my earlier email were due to different settings. The differences went away when I revised my code to have the two modules’ settings match (changing distance_type). I noticed that the default distance settings are different between the two modules. The hydrogen atom-acceptor atom distance is the default for the older module, and the heavy donor atom-acceptor atom distance is the default for the new module, if I am understanding the documentation and meanings of distance_type and d_a_cutoff correctly. When I reran my analysis changing the older module’s settings to be using the heavy donor atom-acceptor atom distance, the modules’ output then matched (both hydrogen bond counts and identities of atoms involved). My code below has more information.

 

I really appreciate your help, and I hope that 2022 has started very well for you!

 

With very many thanks,

Dina

 

#Code is adapted from earlier messages’ code

#Import modules (received deprecation warning)

import MDAnalysis.analysis

from MDAnalysis import analysis

from MDAnalysis.analysis import hydrogenbonds

from MDAnalysis.analysis.hydrogenbonds import hbond_analysis

from MDAnalysis.analysis import hbonds

import MDAnalysisData

from MDAnalysisData import datasets

from MDAnalysis.tests.datafiles import PSF, DCD

 

#Download test data (ref: https://www.mdanalysis.org/MDAnalysisData/usage.html,  https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html#module-MDAnalysisData.ifabp_water,  https://figshare.com/articles/dataset/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030/1)

ifabp_data = MDAnalysisData.ifabp_water.fetch_ifabp_water()

test_univ_for_updating_selections = MDAnalysis.Universe(ifabp_data.topology, ifabp_data.trajectory)

 

#Set stride, since having only a few frames will facilitate inspection

stride_for_analysis = 50

 

#Select protein residues so that the selection will sometimes be empty (as code in my 12/23 message shows) ref https://userguide.mdanalysis.org/stable/selections.html?highlight=selections

protein_nearby_selection = "protein and same resid as (protein and around 0.55 resname TIP3)"

 

#Analyze hydrogen bonds between protein residues in which a protein residue very close to water is an acceptor

#New module, is being used for a selection that, as shown in earlier code, sometimes contains atoms and sometimes does not

#Ref: https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds-selections.html,  https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html

#Many thanks to Paul Smith for sharing the tutorial and for advice on when to use update_selections = False and on count_by_type()!

#Here I want the selections to update, so I keep the default update_selections value of True

 

#What are all the acceptors' names?

#Run HydrogenBondAnalysis guess_acceptors in order to obtain acceptor names, which I will use in my selection

p_check_acc = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)

p_check_acc.acceptors_sel = p_check_acc.guess_acceptors("protein")

print(p_check_acc.acceptors_sel)

 

#Printout

(resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)
 

#Now create a selection with the acceptors' names

protein_nearby_selection_specify_acceptors = "((resname ALA and name O) or (resname ARG and name NE) or (resname ARG and name NH1) or (resname ARG and name NH2) or (resname ARG and name O) or (resname ASN and name ND2) or (resname ASN and name O) or (resname ASN and name OD1) or (resname ASP and name O) or (resname ASP and name OD1) or (resname ASP and name OD2) or (resname GLN and name NE2) or (resname GLN and name O) or (resname GLN and name OE1) or (resname GLU and name O) or (resname GLU and name OE1) or (resname GLU and name OE2) or (resname GLU and name OT1) or (resname GLU and name OT2) or (resname GLY and name O) or (resname HSE and name ND1) or (resname HSE and name O) or (resname ILE and name O) or (resname LEU and name O) or (resname LYS and name O) or (resname MET and name O) or (resname PHE and name O) or (resname SER and name O) or (resname SER and name OG) or (resname THR and name O) or (resname THR and name OG1) or (resname TRP and name NE1) or (resname TRP and name O) or (resname TYR and name O) or (resname TYR and name OH) or (resname VAL and name O)) and same resid as (protein and around 0.55 resname TIP3)"

 

#Run the hydrogen bond analysis

#Specify acceptors for this selection, which is not empty in the 1st frame

#Also need to guess protein hydrogens

p_hb_new_updating_specify = MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis(test_univ_for_updating_selections, d_h_a_angle_cutoff = 150.0)

p_hb_new_updating_specify.acceptors_sel = protein_nearby_selection_specify_acceptors

p_hb_new_updating_specify.hydrogens_sel = p_hb_new_updating_specify.guess_hydrogens("protein")

p_hb_new_updating_specify.run(step = stride_for_analysis)

 

#Print results from the HydrogenBondAnalysis run

print("New module hydrogen bond results")

print("count by time")

print(p_hb_new_updating_specify.count_by_time())

print("frames")

print(p_hb_new_updating_specify.frames)

print("times")

print(p_hb_new_updating_specify.times)

print("hbonds")

print(p_hb_new_updating_specify.hbonds)

print("hydrogen bonds at each time")

 

#Report donor and acceptor identities for each timestep

#Reference for output parsing: https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html

#Reference for finding atom information: https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds.html

#Using singular resname/resid/name, while code block 25 in the link above had these as plural

#Iterate over times and then hydrogen bonds for each time

for index_of_time, time_of_frame in enumerate(p_hb_new_updating_specify.times):

    frame_for_time = p_hb_new_updating_specify.frames[index_of_time]

    print(f"time {time_of_frame} frame {frame_for_time}")

   

    #What are the hydrogen bonds for this time? Obtain and report

    hb_for_time_new = [hb_check for hb_check in p_hb_new_updating_specify.hbonds if hb_check[0] == frame_for_time]

    if len(hb_for_time_new) > 0:

        for hb_reported in hb_for_time_new:

            donor_h_residue_name = test_univ_for_updating_selections.atoms[int(hb_reported[2])].resname

            donor_h_residue_number = test_univ_for_updating_selections.atoms[int(hb_reported[2])].resid

            donor_h_atom_name = test_univ_for_updating_selections.atoms[int(hb_reported[2])].name

            acceptor_residue_name = test_univ_for_updating_selections.atoms[int(hb_reported[3])].resname

            acceptor_residue_number = test_univ_for_updating_selections.atoms[int(hb_reported[3])].resid

            acceptor_atom_name = test_univ_for_updating_selections.atoms[int(hb_reported[3])].name

            print(f"donor {donor_h_residue_name}{donor_h_residue_number} {donor_h_atom_name} acceptor {acceptor_residue_name}{acceptor_residue_number} {acceptor_atom_name}")

    else:

        print("no hydrogen bonds found")

    print("------")

 

#Output

New module hydrogen bond results
count by time
[2 2 0 0 1 2 2 0 0 0]
frames
[  0  50 100 150 200 250 300 350 400 450]
times
[  6.99999938  56.99999498 106.99999058 156.99998617 206.99998177
 256.99997736 306.99997296 356.99996856 406.99996415 456.99995975]
hbonds
[[   0.          468.          470.          313.            2.60758887
   169.63272923]
 [   0.          521.          522.          472.            2.89945973
   160.45028328]
 [  50.           89.           90.         2074.            2.93543587
   170.44653357]
 [  50.          495.          496.          426.            2.85273691
   158.57545713]
 [ 200.          473.          474.          410.            2.83455926
   158.1246932 ]
 [ 250.          443.          445.          389.            2.68570714
   157.44969231]
 [ 250.          449.          450.          394.            2.95985009
   155.60296768]
 [ 300.         1514.         1515.         1550.            2.6547475
   174.83006305]
 [ 300.         1520.         1521.         1549.            2.52265617
   172.55628686]]
hydrogen bonds at each time
time 6.99999938344013 frame 0
donor ARG28 HH22 acceptor GLU19 OE2
donor ALA32 HN acceptor ARG28 O
------
time 56.99999497944106 frame 50
donor LYS7 HN acceptor LYS129 O
donor LEU30 HN acceptor VAL26 O
------
time 106.99999057544198 frame 100
no hydrogen bonds found
------
time 156.9999861714429 frame 150
no hydrogen bonds found
------
time 206.99998176744384 frame 200
donor LYS29 HN acceptor VAL25 O
------
time 256.99997736344477 frame 250
donor LYS27 HZ2 acceptor ASN24 OD1
donor ARG28 HN acceptor ASN24 O
------
time 306.9999729594457 frame 300
donor ARG95 HE acceptor ASP97 OD2
donor ARG95 HH21 acceptor ASP97 OD1
------
time 356.9999685554466 frame 350
no hydrogen bonds found
------
time 406.99996415144756 frame 400
no hydrogen bonds found
------
time 456.9999597474485 frame 450
no hydrogen bonds found
------

 

#Conduct hydrogen bonding analyses with the earlier module
#Many thanks to Paul Smith for explaining atoms do not need to be specified with the earlier module, in his Google group 11/7/21 post!
#Therefore I am only conducting analyses with entire protein residues selected
#Ref: https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/hbond_analysis.html, set angle to match new module (ref: https://docs.mdanalysis.org/stable/documentation_pages/analysis/hydrogenbonds.html)
p_hb_old_updating = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, protein_nearby_selection, "protein", selection1_type = "acceptor", angle = 150.0)
p_hb_old_updating.run(step = stride_for_analysis, selection1_type = "acceptor")
 
#Also have an analysis changing distance_type to align with the new module
p_hb_old_updating_heavy_distance = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(test_univ_for_updating_selections, protein_nearby_selection, "protein", selection1_type = "acceptor", angle = 150.0, distance_type = "heavy")
p_hb_old_updating_heavy_distance.run(step = stride_for_analysis, selection1_type = "acceptor")
 
#Report results for both analyses
#Report donor and acceptor identities for each timestep (iterate over each timeseries)
print("Earlier module, standard (hydrogen) distance setting, hydrogen bond results")
print("count by time")
print(p_hb_old_updating.count_by_time())
print("timeseries")
print(p_hb_old_updating.timeseries)
print("timesteps")
print(p_hb_old_updating.timesteps)
print("hydrogen bonds at each time")
print("------")
for time_of_frame, timestep_information in zip(p_hb_old_updating.timesteps, p_hb_old_updating.timeseries):
    print(f"time {time_of_frame}")
    if len(timestep_information) > 0:
        for hb_old_report in timestep_information:
            print(f"donor {hb_old_report[2]} acceptor {hb_old_report[3]}")
    else:
        print("no hydrogen bonds found")
    print("------")
    
print("********************************")
 
print("Earlier module, heavy distance setting, hydrogen bond results")
print("count by time")
print(p_hb_old_updating_heavy_distance.count_by_time())
print("timeseries")
print(p_hb_old_updating_heavy_distance.timeseries)
print("timesteps")
print(p_hb_old_updating_heavy_distance.timesteps)
print("hydrogen bonds at each time")
print("------")
for time_of_frame_heavy, timestep_information_heavy in zip(p_hb_old_updating_heavy_distance.timesteps, p_hb_old_updating_heavy_distance.timeseries):
    print(f"time {time_of_frame_heavy}")
    if len(timestep_information_heavy) > 0:
        for hb_old_report_heavy in timestep_information_heavy:
            print(f"donor {hb_old_report_heavy[2]} acceptor {hb_old_report_heavy[3]}")
    else:
        print("no hydrogen bonds found")
    print("------")
 
#Printout
Earlier module, standard (hydrogen) distance setting, hydrogen bond results
count by time
[(  6.99999938, 4) ( 56.99999498, 4) (106.99999058, 2) (156.99998617, 0)
 (206.99998177, 2) (256.99997736, 4) (306.99997296, 2) (356.99996856, 0)
 (406.99996415, 1) (456.99995975, 0)]
timeseries
[[[470, 313, 'ARG28:HH22', 'GLU19:OE2', 1.6176969938864014, 169.63272953839035], [474, 410, 'LYS29:HN', 'VAL25:O', 2.9368162637413975, 160.49644581123985], [496, 426, 'LEU30:HN', 'VAL26:O', 2.513965069410512, 178.2421063554173], [522, 472, 'ALA32:HN', 'ARG28:O', 1.9406786621042384, 160.45028361463346]], [[90, 2074, 'LYS7:HN', 'LYS129:O', 1.9475959881604479, 170.44653345938542], [474, 410, 'LYS29:HN', 'VAL25:O', 2.094796160801959, 155.33987392780452], [496, 426, 'LEU30:HN', 'VAL26:O', 1.9012893373719628, 158.57545691660252], [1480, 1260, 'LYS94:HN', 'THR79:O', 2.069889148268329, 162.5464669240155]], [[1480, 1260, 'LYS94:HN', 'THR79:O', 2.6061235816430974, 161.59742006595067], [1502, 1595, 'ARG95:HN', 'LYS100:O', 2.0722698055968287, 161.18954344288508]], [], [[474, 410, 'LYS29:HN', 'VAL25:O', 1.8848988955992887, 158.1246931272562], [1502, 1595, 'ARG95:HN', 'LYS100:O', 2.130053621798112, 154.09016878624382]], [[445, 389, 'LYS27:HZ2', 'ASN24:OD1', 1.6954430179706532, 157.44969204191966], [450, 394, 'ARG28:HN', 'ASN24:O', 2.023088186963373, 155.6029677709066], [532, 494, 'HSE33:HN', 'LYS29:O', 2.5648595265194674, 157.56489175922837], [1575, 1561, 'LYS100:HN', 'ASN98:OD1', 2.117270301980385, 163.33464338830598]], [[1515, 1550, 'ARG95:HE', 'ASP97:OD2', 1.6572856360613153, 174.83006310736693], [1521, 1549, 'ARG95:HH21', 'ASP97:OD1', 1.5277540489215113, 172.55628678910588]], [], [[496, 426, 'LEU30:HN', 'VAL26:O', 2.4782158795698277, 155.3432565675348]], []]
timesteps
[6.99999938344013, 56.99999497944106, 106.99999057544198, 156.9999861714429, 206.99998176744384, 256.99997736344477, 306.9999729594457, 356.9999685554466, 406.99996415144756, 456.9999597474485]
hydrogen bonds at each time
------
time 6.99999938344013
donor ARG28:HH22 acceptor GLU19:OE2
donor LYS29:HN acceptor VAL25:O
donor LEU30:HN acceptor VAL26:O
donor ALA32:HN acceptor ARG28:O
------
time 56.99999497944106
donor LYS7:HN acceptor LYS129:O
donor LYS29:HN acceptor VAL25:O
donor LEU30:HN acceptor VAL26:O
donor LYS94:HN acceptor THR79:O
------
time 106.99999057544198
donor LYS94:HN acceptor THR79:O
donor ARG95:HN acceptor LYS100:O
------
time 156.9999861714429
no hydrogen bonds found
------
time 206.99998176744384
donor LYS29:HN acceptor VAL25:O
donor ARG95:HN acceptor LYS100:O
------
time 256.99997736344477
donor LYS27:HZ2 acceptor ASN24:OD1
donor ARG28:HN acceptor ASN24:O
donor HSE33:HN acceptor LYS29:O
donor LYS100:HN acceptor ASN98:OD1
------
time 306.9999729594457
donor ARG95:HE acceptor ASP97:OD2
donor ARG95:HH21 acceptor ASP97:OD1
------
time 356.9999685554466
no hydrogen bonds found
------
time 406.99996415144756
donor LEU30:HN acceptor VAL26:O
------
time 456.9999597474485
no hydrogen bonds found
------
********************************
Earlier module, heavy distance setting, hydrogen bond results
count by time
[(  6.99999938, 2) ( 56.99999498, 2) (106.99999058, 0) (156.99998617, 0)
 (206.99998177, 1) (256.99997736, 2) (306.99997296, 2) (356.99996856, 0)
 (406.99996415, 0) (456.99995975, 0)]
timeseries
[[[470, 313, 'ARG28:HH22', 'GLU19:OE2', 2.607588520002456, 169.63272953839035], [522, 472, 'ALA32:HN', 'ARG28:O', 2.8994611587219636, 160.45028361463346]], [[90, 2074, 'LYS7:HN', 'LYS129:O', 2.93543583739373, 170.44653345938542], [496, 426, 'LEU30:HN', 'VAL26:O', 2.8527368634405255, 158.57545691660252]], [], [], [[474, 410, 'LYS29:HN', 'VAL25:O', 2.8345592279052743, 158.1246931272562]], [[445, 389, 'LYS27:HZ2', 'ASN24:OD1', 2.685707037489828, 157.44969204191966], [450, 394, 'ARG28:HN', 'ASN24:O', 2.959850019197658, 155.6029677709066]], [[1515, 1550, 'ARG95:HE', 'ASP97:OD2', 2.65474743821139, 174.83006310736693], [1521, 1549, 'ARG95:HH21', 'ASP97:OD1', 2.522656111553509, 172.55628678910588]], [], [], []]
timesteps
[6.99999938344013, 56.99999497944106, 106.99999057544198, 156.9999861714429, 206.99998176744384, 256.99997736344477, 306.9999729594457, 356.9999685554466, 406.99996415144756, 456.9999597474485]
hydrogen bonds at each time
------
time 6.99999938344013
donor ARG28:HH22 acceptor GLU19:OE2
donor ALA32:HN acceptor ARG28:O
------
time 56.99999497944106
donor LYS7:HN acceptor LYS129:O
donor LEU30:HN acceptor VAL26:O
------
time 106.99999057544198
no hydrogen bonds found
------
time 156.9999861714429
no hydrogen bonds found
------
time 206.99998176744384
donor LYS29:HN acceptor VAL25:O
------
time 256.99997736344477
donor LYS27:HZ2 acceptor ASN24:OD1
donor ARG28:HN acceptor ASN24:O
------
time 306.9999729594457
donor ARG95:HE acceptor ASP97:OD2
donor ARG95:HH21 acceptor ASP97:OD1
------
time 356.9999685554466
no hydrogen bonds found
------
time 406.99996415144756
no hydrogen bonds found
------
time 456.9999597474485
no hydrogen bonds found
------

 


Reply all
Reply to author
Forward
0 new messages