going from the automap indices to atom.index()?

30 views
Skip to first unread message

Saurabh Srivastava

unread,
Mar 9, 2012, 2:52:15 PM3/9/12
to indig...@googlegroups.com
Hi all,

The API does not document how to retrieve the AAM's number assignment for a given atom. I can see it in the rendered output, but could not find a call to retrieve it.

I.e., For a given atom a_i (lets say in a product molecule of a reaction) I can lookup what it is mapped to using reaction.atomMappingNumber(a_i). This call tells me what reactant atom a_i mapped to. But this does correspond to the atom.index() values of the reactant atoms. How do I go to the reactant index() values from the atomMappingNumber?

Regards,
saurabh

Mikhail Rybalkin

unread,
Mar 9, 2012, 4:59:01 PM3/9/12
to indig...@googlegroups.com
Hello  Saurabh, 

Atom-to-atom mapping can have arbitrary values. And such values can be used even multiple times:

[NH2:6][CH:1]1[CH2:2][CH2:3][CH2:4][CH2:5]1>>[CH2:3]1[CH2:4][CH2:5][CH:1]([CH2:2]1)[NH:6][CH:1]1[CH2:5][CH2:4][CH2:3][CH2:2]1

You need to iterate reactant atoms, and construct a mapping between their indices and atom mapping numbers.

Best regards,
Mikhail

Saurabh Srivastava

unread,
Mar 9, 2012, 7:03:54 PM3/9/12
to indig...@googlegroups.com
I understand that each mapping numbers can be reused multiple times. But it appears that there was a fundamental misunderstanding between my interpretation of what the AAM numbers meant. Your beautiful example clarifies it. Just to confirm: the AAM puts the molecules on each side of the reaction in "bins". E.g., the C's attached to N:6 in your example both go to bin 1, i.e., they are both labelled [CH:1]. And all atoms in a bin *on either side* form an equivalence class. Right? (My false interpretation was that the mapping number is an index into the atom indices on the other side.)

Also, automap does not map hydrogens (after we call reaction.unfoldHydrogens()). Is there a way of mapping the hydrogens or is that not possible in the current mapping algorithm?

Thanks a lot.
Saurabh

Savelyev Alexander

unread,
Mar 16, 2012, 8:27:58 AM3/16/12
to indigo-dev
Hello Saurabh,

Sorry for the late reply. I would not apply such a difficult
mathematical concept an equivalence class for an automap result.
Because, I think, not all conditions will be satisfied. I doubt about
a symmetry relation for reaction molecules. In the AAM engine there is
a priority for the reactant molecules (there will be a difference for
some cases if we swap reactants and products). Anyway, it would be
very interesting to examine algebraic properties and to view the
results from e.g. the group theory perspective.
As for the hydrogens mapping, you are right, the current algorithm
discards all the hydrogens in a reaction. But you can replace all the
hydrogens, say, with pseudo atoms, call automap() and then replace the
atoms back. The example below shows how to do this on java (the
similar for python and c#).

// list to store hydrogen atoms
LinkedList<IndigoObject> hydrogen_atoms = new
LinkedList<IndigoObject>();

reaction.unfoldHydrogens();
for(IndigoObject molecule : reaction.iterateMolecules()) {
for(IndigoObject atom : molecule.iterateAtoms()) {
if(atom.symbol().equals("H")) {
hydrogen_atoms.add(atom);
// pseudo atom a unique name
atom.resetAtom("Hydrogen");
}
}
}
reaction.automap("discard");
// reset back
for(IndigoObject h_atom : hydrogen_atoms) {
h_atom.resetAtom("H");
}

With best regards,
Alexander

Saurabh Srivastava

unread,
Mar 16, 2012, 12:08:06 PM3/16/12
to indigo-dev
Hi Alexander,

Thanks for the reply. Now I am all the more curious about the exact
algorithm you use for AAM. I looked at http://ggasoftware.com/opensource/resources#algorithms
and it is vague, and I am bad at hunting down the algorithm within the
code. Could you point me to a paper that describes the core concept of
how the AAM matching is computed. I understand you must have augmented
the algorithm significantly to make it work for indigo; but going back
to the source will help me get started.

Also, for H's I just implemented a simple wavefront heuristic to
assign AAM numbers to unmapped atoms, which seems to work for my
purposes. But thanks for the alternative way of assigning mappings to
them.

saurabh

On Mar 16, 5:27 am, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Savelyev Alexander

unread,
Mar 19, 2012, 8:02:58 AM3/19/12
to indigo-dev
Dear Saurabh,

Unfortunately, there is no description for the AAM algorithm at the
moment. Such a task exists in our todo list, but I can not predict
exactly the release date.
What prevents you to get started? Why do you think that the algorithm
is vague? The exact algorithm is used in couple with the approximate
algorithm. Also, for the performance purposes, the substructure search
is used before launching the mcs searching. The algorithm has a lot of
heuristics for avoiding a time hang.

With best regards,
Alexander

Saurabh Srivastava

unread,
Mar 19, 2012, 8:57:52 PM3/19/12
to indigo-dev
Hi Alexander, Thanks for the the reply.

I am not blocked at all. The AAM matching does a pretty good job and I
am building my algorithm on top of it.

I did not mean that the algorithm was vague; but that in
http://ggasoftware.com/opensource/resources#algorithms it is explicit
which algorithms are part of the AAM implementation.

But now from your reply here, I can guess that the last para is the
AAM matching algorithms part (of course with various heuristics to
steer clear of the difficult corner cases). I just want to know the
theoretical limitations of the algorithm implementation I am building
over (for citing appropriately, and also to know the limits of the
implementation built on top of it.)

saurabh

On Mar 19, 5:02 am, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Savelyev Alexander

unread,
Mar 25, 2012, 9:23:34 AM3/25/12
to indigo-dev
Hi Saurabh,

Theoretically, there is no limitation for input reactions. But
practically, there are some difficult reactions. The following
reactions may be considered as 'difficult':
- reactions contain molecules (on both sides) with a large atom
number (approximately more than 40-50 atoms)
- reactions contain a large reactants or products number
(approximately more than 5-7)
- reactions contain disconnected molecules (several components). The
issue will be resolved in the nearest future.
There is an iteration limit used in the internal mcs algorithms. Also,
there is an option ('aam-timeout') for the algorithm timeout. Thus, a
calculated AAM for 'difficult' reactions may be more or less
incomplete, but for the most cases the algorithm generates a complete
mapping. The algorithm was tested for a long time on huge reaction
dataset to reduce the chemical inaccuracies. But I should notice, that
there are improvement possibilities for the algorithm still. As
always, we will be happy to receive any defects or ideas to improve.

With best regards,
Alexander
Reply all
Reply to author
Forward
0 new messages