<< HBonds between fragments/chains >>

I. Camps

unread,

Sep 14, 2020, 2:53:49 PM9/14/20

to MDnalysis discussion

Hello,

I have a system with the following info: the main protein chain (chain name PROA), the ligand (chain name HETA), the solvent (chain name SOLV), and the ions (CLA for Cl and SOD for Na).

I want to compute the hydrogen bonds between the protein and the ligand. I played using different atom selection and segment names, but I am getting the same results.

I searched this group about how MDAnalysis treat the chains. Found the post that state that the chains are treat as segments (link), so I used the command bellow but in the output I got all the Hbonds, not only between the two set I am interested in.

As I am new to MDAnalysis, I would like to know if the output is right.

Also, how do I save to a file this output?

# HBond calculation
hbonds = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(universe=u, between=['segid HETA', 'segid PROA'])
hbonds.run()
hbonds.generate_table()
print(hbonds.table)

[( 0. , 1, 11709, 'SER', 0, 'HT1', 'TIP3', 3245, 'OH2', 1.83234625, 152.95088724) ( 0. , 2, 5778, 'SER', 0, 'HT2', 'TIP3', 1268, 'OH2', 1.75276436, 166.28757966) ( 0. , 3, 49, 'SER', 0, 'HT3', 'ASP', 3, 'OD1', 1.80132921, 151.18005171) ... (24.39521679, 27863, 1942, 'TIP3', 8629, 'H2', 'GLU', 116, 'OE1', 2.96604261, 120.7568776 ) (24.39521679, 27863, 1943, 'TIP3', 8629, 'H2', 'GLU', 116, 'OE2', 1.76321545, 161.65524683) (24.39521679, 27874, 1442, 'TIP3', 8633, 'H1', 'GLU', 85, 'OE1', 1.72845742, 170.76333328)]

Oliver Beckstein

unread,

Sep 15, 2020, 6:26:49 PM9/15/20

to mdnalysis-...@googlegroups.com

Hi,

On Sep 14, 2020, at 11:53 AM, I. Camps <ica...@gmail.com> wrote:

Hello,

I have a system with the following info: the main protein chain (chain name PROA), the ligand (chain name HETA), the solvent (chain name SOLV), and the ions (CLA for Cl and SOD for Na).

I want to compute the hydrogen bonds between the protein and the ligand. I played using different atom selection and segment names, but I am getting the same results.

I searched this group about how MDAnalysis treat the chains. Found the post that state that the chains are treat as segments (link),

Yes, if the input is a PDB with chain information then the chain labels are used for the segments. If you have a PSF as input, the segments in the PSF are directly used for segment labels in the MDAnalysis.

so I used the command bellow but in the output I got all the Hbonds, not only between the two set I am interested in.

You seem to be using the deprecated hbonds.HydrogenBondAnalysis which we are removing. Instead we suggest you start using the new hydrogenbonds.HydrogenBondAnalysis .

I also don’t think that hbonds.HydrogenBondAnalysis supports the “between” keyword argument (where did you see that this should work?) As far as I can tell, the code just ignores it so that’s probably why you don’t see any effect. If you really want to continue using it then you could take the output from hbonds.table and filter it. But I’d rather do this with the new hydrogenbonds.HydrogenBondAnalysis (see below).

As I am new to MDAnalysis, I would like to know if the output is right.

It’s correct but not what you wanted because it clearly contains other h-bonds/

Also, how do I save to a file this output?

As a general rule, you need to decide by yourself in which format you want to store data. That’s a decision that we leave to users. For instance, for a numpy array (such as the output from hbonds.table) I would use numpy.savetxt() or numpy.save(), which tells you more about how to load it again. (numpy.savetxt() can also save as a CSV file if you want that).

# HBond calculation
hbonds = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(universe=u, between=['segid HETA', 'segid PROA'])
hbonds.run()
hbonds.generate_table()
print(hbonds.table)

[( 0. , 1, 11709, 'SER', 0, 'HT1', 'TIP3', 3245, 'OH2', 1.83234625, 152.95088724) ( 0. , 2, 5778, 'SER', 0, 'HT2', 'TIP3', 1268, 'OH2', 1.75276436, 166.28757966) ( 0. , 3, 49, 'SER', 0, 'HT3', 'ASP', 3, 'OD1', 1.80132921, 151.18005171) ... (24.39521679, 27863, 1942, 'TIP3', 8629, 'H2', 'GLU', 116, 'OE1', 2.96604261, 120.7568776 ) (24.39521679, 27863, 1943, 'TIP3', 8629, 'H2', 'GLU', 116, 'OE2', 1.76321545, 161.65524683) (24.39521679, 27874, 1442, 'TIP3', 8633, 'H1', 'GLU', 85, 'OE1', 1.72845742, 170.76333328)]

I suggest you swittch to hydrogenbonds.HydrogenBondAnalysis. (In the development version (which will be MDAnalysis 2.0.0), it also has the between keyword but that’s not yet in the released 1.0.0).

With the released 1.0.0 version I would calculate all hydrogenbonds and then filter the output by going through the output https://docs.mdanalysis.org/1.0.0/documentation_pages/analysis/hydrogenbonds.html#output and only keeping those entries where the donor or acceptor atom id are in the two different segments. The HydrogenBondAnalysis.hbonds attribute is a numpy array so you can save it with numpy.savetxt().

To filter the hydrogen bonds from the data I would do the following. The idea is that for each hydrogen bond we check if the donor is in group A and the acceptor in group B or vice versa. Then we select only the matching hbonds from the data array. This makes use of numpy boolean indexing so if you’re not familiar with these, read more about them under https://numpy.org/doc/stable/user/basics.indexing.html#boolean-or-mask-index-arrays

import MDAnalysis as mda

import MDAnalysis.analysis.hydrogenbonds

import numpy as np

# if you want more information output, execute the next line

mda.start_logging()

u = mda.Universe(PSF, DCD) # provide a topology with bonds and charges!!

A = u.select_atoms(“segid PROA”)

B = u.select_atoms(“segid HETA”)

# calculate ALL h-bonds (even with water): will be slow

# update_selections=False speeds up the run (always safe to

# use when you have static selections or when you use the defaults as here)

hbonds = MDAnalysis.analysis.hydrogenbonds.HydrogenBondAnalysis(universe=u, update_selections=False)

hbonds.run(verbose=True)

# extract the atom indices of donors and acceptors from the H-bond results

# (see https://docs.mdanalysis.org/1.0.0/documentation_pages/analysis/hydrogenbonds.html#output)

donors = hbonds.hbonds[:, 1].astype(int)

acceptors = hbonds.hbonds[:, 3].astype(int)

# array with True for each A->B h-bond

# A.ix and B.ix are the 0-based indices of the atoms

AtoB = np.logical_and(np.isin(donors, A.ix), np.isin(acceptors, B.ix))

# array with True for each B->A h-bond

BtoA = np.logical_and(np.isin(donors, B.ix), np.isin(acceptors, A.ix))

# array with True for either A->B or B->A

AB = np.logical_or(AtoB, BtoA)

# get the H-bond data

hbonds_AB = hbonds.hbonds[AB]

# now analyze these hydrogen bonds

# for instance:

print(“Number of A-B h-bonds {} out of {} in total”.format(len(hbonds_AB), len(hbonds.hbonds)))

# histogram of distances

d_histo, d_edges = np.histogram(hbonds_AB[:, 4], density=True)

# histogram of angles

angles_histo, angles_edges = np.histogram(hbonds_AB[:, 5], density=True)

I only tested the above on a simple trajectory in the test data so you might have to play with the code a little bit. One disadvantage is that it’s slow because it calculates ALL hydrogen bonds. The upcoming 2.0.0 (no release date yet) will have the between keyword so this will be faster. Until then the above might be a work around.

Best,

Oliver

--

Oliver Beckstein (he/him)

orbe...@gmail.com

I. Camps

unread,

Sep 18, 2020, 1:45:44 PM9/18/20

to mdnalysis-...@googlegroups.com

Dear Oliver,

One more time, thank you for your help.

I moved to the developer MDAnalysis version...

After running you code, I got:

Number of A-B h-bonds 0 out of 4067861 in total
/home/icamps/anaconda3/lib/python3.8/site-packages/numpy/lib/histograms.py:908: RuntimeWarning: invalid value encountered in true_divide
  return n/db/n.sum(), bin_edges

But, making the Hbond analysis with VMD, I got:

I will try with the other system.

[]'s,

Camps

--
You received this message because you are subscribed to a topic in the Google Groups "MDnalysis discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mdnalysis-discussion/p1Q6GZuqlEg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/BA8B157B-4E76-4751-9825-DF56B78A99CE%40gmail.com.

Oliver Beckstein

unread,

Sep 18, 2020, 2:04:16 PM9/18/20

to mdnalysis-...@googlegroups.com

Hello Camps,

What version of MDAnalysis are you currently using (what does `print(mda.__version__)` show and how did you install it?). If it is the development branch then did you use then new between keyword (instead of the filtering of the results that I suggested for MDA 1.0.0)? In general, show the code that you’re using.

I’d start by checking that you can see H-bonds inside your domains. Then check that your selections actually properly select your domains of interest. You should always read and check code by thinking what should this variable contain/what should this function compute and then compare to what it actually does.

Comparing to other tools such as VMD is good but is only part of finding a solution because it’s not always clear what the differences between the different tools are (e.g., what is counted as a donor/acceptor).

Oliver

On Sep 18, 2020, at 10:45 AM, I. Camps <ica...@gmail.com> wrote:

Dear Oliver,

One more time, thank you for your help.

I moved to the developer MDAnalysis version...

After running you code, I got:
Number of A-B h-bonds 0 out of 4067861 in total
/home/icamps/anaconda3/lib/python3.8/site-packages/numpy/lib/histograms.py:908: RuntimeWarning: invalid value encountered in true_divide
  return n/db/n.sum(), bin_edges

But, making the Hbond analysis with VMD, I got:

You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/CAJXnfq-KP%3DoECX%3DicJAyKQJV0W6%2BYkyPEjPXdHB965jx5HbTwg%40mail.gmail.com.

I. Camps

unread,

Sep 21, 2020, 11:45:38 PM9/21/20

to MDnalysis discussion

Hello Oliver,

What version of MDAnalysis are you currently using (what does `print(mda.__version__)` show

It returns 2.0.0-dev0

and how did you install it?

I installed using the info from here: https://userguide.mdanalysis.org/1.0.0/contributing_code.html#create-virtual-environment

(but I installed not to a separate environment but to my base environment (because my Jupiter notebooks only work in the base environment)

If it is the development branch then did you use then new between keyword (instead of the filtering of the results that I suggested for MDA 1.0.0)?

Yes. I used:

hbonds = HBA(universe=u, donors_sel=None, hydrogens_sel=None, acceptors_sel=None, between=['segid HETA', 'segid PROA'])
hbonds.run()

and got as output:

<MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis at 0x15212191aac0>

But I am stuck here :(

I don't know how to extract any information now. Also, I tried to generate a table using:

hbonds.generate_table()

and got:

AttributeError: 'HydrogenBondAnalysis' object has no attribute 'generate_table'

I delay in answer you because I was reading the documentation and playing with the examples there.

Regards,

Camps

Oliver Beckstein

unread,

Sep 22, 2020, 1:25:44 PM9/22/20

to mdnalysis-...@googlegroups.com

Hi Camps,

On Sep 21, 2020, at 8:45 PM, I. Camps <ica...@gmail.com> wrote:

Hello Oliver,

What version of MDAnalysis are you currently using (what does `print(mda.__version__)` show
It returns 2.0.0-dev0

Good, that is the development version.

and how did you install it?
I installed using the info from here: https://userguide.mdanalysis.org/1.0.0/contributing_code.html#create-virtual-environment
(but I installed not to a separate environment but to my base environment (because my Jupiter notebooks only work in the base environment)

Good, glad you found the instructions.

(Btw, I have to install my jupyter notebook and ipython into the same environment as my MDAnalysis and then it works for me. But base should work, too.)

If it is the development branch then did you use then new between keyword (instead of the filtering of the results that I suggested for MDA 1.0.0)?
Yes. I used:

hbonds = HBA(universe=u, donors_sel=None, hydrogens_sel=None, acceptors_sel=None, between=['segid HETA', 'segid PROA’])

This line creates a new object, hbonds. The object “knows” how to run the analysis and it contains all the data.

hbonds.run()

This ran the analysis and modified hbonds.

and got as output:

<MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis at 0x15212191aac0>

That is the representation of the hbonds object, which is an instance of the class HydrogenBondAnalysis.

But I am stuck here :(

You now have to work with the hbonds object.

I don't know how to extract any information now. Also, I tried to generate a table using:
hbonds.generate_table()

and got:
AttributeError: 'HydrogenBondAnalysis' object has no attribute ‘generate_table'

generate_table() is not needed anymore and does not exist for the new HydrogenBondAnalysis.

Instead, get all data output in the array

hbonds.hbonds

The link above explains what it contains. It’s a M x 6 numpy array.In brief, each row of the array is a single hydrogen bond at a single time point (frame). It follows the “tidy data” philosophy which makes it easy to group and aggregate with a tool such as pandas (you can just turn the whole array into a DataFrame and work with it if you’re comfortable with pandas).

If you look at my previous email you see how to work with the data but the basic idea is that you pull out those h-bonds that you are interested in. For instance, to do simple statistics:

# histogram of distances

distances = hbonds.hbonds[:, 4]

d_histo, d_edges = np.histogram(distances, density=True)

d_midpoints = 0.5*(d_edges[1:] + d_edges[:-1])

# histogram of angles

angles = hbonds.hbonds[:, 5]

angles_histo, angles_edges = np.histogram(angles, density=True)

a_midpoints = 0.5*(angles_edges[1:] + angles_edges[:-1])

import matplotlib.pyplot as plt

ax1 = plt.subplot(121)

ax1.plot(d_midpoints, d_histo)

ax1.set_xlabel(“H-bond length (Å)”)

ax2 = plt.subplot(121)

ax2.plot(a_midpoints, a_histo)

ax2.set_xlabel(“H-bond angle (deg)”)

Ultimately, you have to work with the hbonds.hbonds array.

There are a few convenience methods (count_by_*()) that are documented for HydrogenBondAnalysis.

I delay in answer you because I was reading the documentation and playing with the examples there.

That’s an excellent start.

If you have suggestions for what documentation should contain please feel free to leave a comment at https://github.com/MDAnalysis/UserGuide/issues/105

Best,

Oliver

Regards,

Camps

Reply all

Reply to author

Forward