Reaction mapping exception

99 views
Skip to first unread message

Daniel Lowe

unread,
May 27, 2011, 11:18:15 AM5/27/11
to indigo-bugs
For the following fairly simple two reactants to two products reaction
the following exception is produced.

Indigo indigo = new Indigo();
IndigoObject rxn = indigo.createReaction();
rxn.addReactant(indigo.loadMolecule("CC(OC)=O"));
rxn.addReactant(indigo.loadMolecule("[Na+].[OH-]"));
rxn.addProduct(indigo.loadMolecule("CC(O)=O"));
rxn.addProduct(indigo.loadMolecule("C[O-].[Na+]"));
rxn.automap("discard");

Exception in thread "main" com.ggasoftware.indigo.IndigoException:
array: invalid index 1 (size=1)
at com.ggasoftware.indigo.Indigo.checkResult(Indigo.java:42)
at com.ggasoftware.indigo.IndigoObject.automap(IndigoObject.java:178)


On the topic of the reaction mapping I also noticed that if you have
say 3 reactants and 1 product that the mapper greedily uses atoms from
the reactants in the order they were added to the reaction even if
this does not necessarily lead to the most atoms being mapped onto the
product. I am quite interested in attempting to use reaction mapping
to determine that a given reactant in fact isn't a reactant and is in
fact a spectator hence the current behaviour is not optimal as even
spectator molecules usually have at least 1 atom in common with the
product.

Savelyev Alexander

unread,
Jun 1, 2011, 10:39:24 AM6/1/11
to indigo-bugs, Daniel Lowe
Hi Daniel,

First of all, thank you for your interest in the Atom Atom Mapping
improvement. I have fixed the specified bug. The fix will be available
in the upcoming release.

Also, at the moment Indigo does not map lone atoms (that is the
limitation of the underlying MCS engine that works with bonds). We are
going to fix that inconvenient behaviour soon

And the last but not least, the 'greed' mapping is used because of
very common reaction types, where several products can be result of
one reactant, for example:

[H][C@:2]12[S:6][C:7]([CH3:11])([CH3:12])[C@@H:3]([N:1]1[C:4](=[O:9])
[C@H:5]2[Br:10])[C:8](=[O:14])[O:13][CH3:15]>>[H][C@@:2]12[C@H:5]([Br:
10])[C:4](=[O:9])[N:1]1[C@@H:3]([C:8](=[O:14])[O:13][CH3:15])[C:7]
([CH3:11])([CH3:12])[S@@:6]2=O.[H][C@@:2]12[C@H:5]([Br:10])[C:4](=[O:
9])[N:1]1[C@@H:3]([C:8](=[O:14])[O:13][CH3:15])[C:7]([CH3:11])
([CH3:12])[S@:6]2=O

Also, there are well known dissociation reactions, for example:

[CH3:7][C:4](=[O:8])[C:1]1=[CH:3][CH:6]=[CH:9][CH:5]=[CH:2]1.[CH3:23]
[C:20]1=[CH:17][CH:12]=[C:10]([CH:13]=[CH:18]1)[S:11](=[O:15])(=[O:16])
[O:14][Tl]([O:14][S:11](=[O:15])(=[O:16])[C:10]1=[CH:12][CH:17]=[C:20]
([CH3:23])[CH:18]=[CH:13]1)[O:14][S:11](=[O:15])(=[O:16])[C:10]1=[CH:
12][CH:17]=[C:20]([CH3:23])[CH:18]=[CH:13]1>>[CH3:23][C:20]1=[CH:17]
[CH:12]=[C:10]([CH:13]=[CH:18]1)[S:11](=[O:15])(=[O:16])[O:14][CH2:7]
[C:4](=[O:8])[C:1]1=[CH:3][CH:6]=[CH:9][CH:5]=[CH:2]1

where reactants can contain the same fragment multiple times. I will
provide another AAM mode for you, which will forbid some
transformations, and it will be very simple to determine the fake
reactants, as you described

Daniel Lowe

unread,
Jun 14, 2011, 1:55:25 PM6/14/11
to indigo-bugs
Thank you for the bug fix. It would be great to hear when any more
progress in this area is made :-)

Daniel

On Jun 1, 3:39 pm, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Savelyev Alexander

unread,
Jul 8, 2011, 11:16:56 AM7/8/11
to indigo-bugs
We have fixed the bug with the mapping lone atoms. The new version
will be available in the upcoming indigo release. I want to clarify
something about "greed" mapping logic (the second part of the bug
description). Could you give examples of inconvenient reaction
mappings and with a correct behaviour in opposite
Sorry for the late reply.

Savelyev Alexander

unread,
Jul 14, 2011, 5:21:24 AM7/14/11
to indigo-bugs, Daniel Lowe
Firstly, thank you very much for the sent examples and for the
participation.
There was a significant bug in the algorithm logic. I think it had
appeared after the indigo library re-factoring. The AAM logic with
reactant permutation was not available. That is why, reaction mappings
depended on a reactant order. I have fixed the bug. Of course, it will
be available in the upcoming indigo release (I will write the exact
release in this post asap).
In your reactions also another thing was noticed. If reaction
molecules consist of decomposed components, AAM doesn't work in a
proper way. I believe, we will fix this bug soon.

Daniel Lowe

unread,
Jul 28, 2011, 1:33:11 PM7/28/11
to indigo-bugs
On the topic of reaction mapping exceptions. Here is another in indigo
1.1-alpha3

Indigo indigo = new Indigo();
IndigoObject reaction = indigo.createReaction();
reaction.addProduct(indigo.loadMolecule("C1CC1"));
reaction.addReactant(indigo.loadMolecule("C1CCC1C"));
reaction.automap("discard");

gives:
Exception in thread "main" com.ggasoftware.indigo.IndigoException:
pool: access to unused element 1
at com.ggasoftware.indigo.Indigo.checkResult(Indigo.java:42)
at com.ggasoftware.indigo.IndigoObject.automap(IndigoObject.java:178)

When is the next alpha release likely to be?

Savelyev Alexander

unread,
Jul 29, 2011, 10:28:24 AM7/29/11
to indigo-bugs
Thank you for the bug report. The bug was fixed. The new version is
already available in the 1.1-beta4.
PS. There was a bug related to the well-known "delta Y exchange"
problem. This is the EMCS algorithm weakness. It can appear while
searching MCS with 'triangle' molecules. This is a very rare type of
molecules thus the exception has not been noticed yet. We have fixed
the bug, but we warn you that AAM may not be as good as expected for
such a molecules kind (with 'triangles' for example 'C1CC1')

Daniel Lowe

unread,
Jul 30, 2011, 12:35:02 PM7/30/11
to indigo-bugs
Thank you for the new release. This release significantly (+19%)
increased the number of reactions for which a complete mapping of
atoms in the product could be found.

However, I also noticed that the AAM appears to be an order of
magnitude slower in this release as compared to 1.1-alpha3. Without
further investigation I cannot comment as to whether this increased
computational expense was to expected.

Summarising the outstanding limitations from my perspective:
Single atoms cannot be mapped
e.g. CC(=O)O + ClS(=O)Cl -> CC(=O)Cl

The more difficult case where the reaction involves a change in bond
order or charge is also outstanding but this can probably only be
tackled by adding greater configurability to determine the level of
match needed for an atom mapping to be made.

e.g. CC=O -> CCO will not identify that the oxygen in both molecules
is the same.

On Jul 29, 3:28 pm, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Savelyev Alexander

unread,
Aug 1, 2011, 4:08:59 AM8/1/11
to indigo-bugs
I am glad that you like the new version.
There are some notes about your requests.

1. Single atoms are not mapped now, but I am going to include such a
possibility - only for some hetero-atoms (not carbon), because the
mapping of single atoms is very dangerous and may lead to an incorrect
map.

2. Actually, the current AAM engine supports changes in bond order.
Moreover, a lot of attention was paid to this feature. An user should
set so-called REACTING CENTERS to work with a bond changes
possibility. There are several types of bonds in a reaction: NONE,
CENTER, UNCHANGE, CHANGE, MAKE OR BREAK, etc. (You can find a
description, for example, in the MDL formats help:
http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php).
In the reaction 'CC=O>>CCO' there should be center 'CHANGE' between
carbon and oxygen (the center can be set only for reactants - indigo
AAM engine supports 'lazy' reacting centers, but in the absolutely
correct reaction centers should be set on the both sides). If no
centers are set then the engine accepts such bonds as "UNCHANGE".
PS. An reacting center can not be passed through reaction smiles, thus
I send the RXN file with the same reaction below. If you load the
example reaction the expected mapping will be applied.

The RXN example:
$RXN V3000

The correct reaction without mapping, you can notice 'RXCTR=8'
string, which is means 'RC_CHANGE'

M V30 COUNTS 1 1
M V30 BEGIN REACTANT
M V30 BEGIN CTAB
M V30 COUNTS 3 2 0 0 0
M V30 BEGIN ATOM
M V30 1 C -4.4137 0.385 0 0
M V30 2 O -5.7474 -0.385 0 0
M V30 3 C -3.08 -0.385 0 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 3
M V30 2 2 1 2 RXCTR=8
M V30 END BOND
M V30 END CTAB
M V30 END REACTANT
M V30 BEGIN PRODUCT
M V30 BEGIN CTAB
M V30 COUNTS 3 2 0 0 0
M V30 BEGIN ATOM
M V30 1 C 5.5137 0.385 0 0
M V30 2 O 4.18 -0.385 0 0
M V30 3 C 6.8474 -0.385 0 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 3
M V30 2 1 1 2
M V30 END BOND
M V30 END CTAB
M V30 END PRODUCT
M END

Savelyev Alexander

unread,
Aug 1, 2011, 4:13:32 AM8/1/11
to indigo-bugs
The google inserted an incorrect extra line, so I resend the reaction.
$RXN V3000

Daniel Lowe

unread,
Aug 4, 2011, 2:10:49 PM8/4/11
to indigo-bugs
In the work I am doing currently I do not know a priori where the
reaction centres are. Is it possible to programmatically set the bond
type of a bond from "UNCHANGE" to one of the options defined in the
Mol3000 specfications?

On Aug 1, 9:08 am, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:
> I am glad that you like the new version.
> There are some notes about your requests.
>
> 1. Single atoms are not mapped now, but I am going to include such a
> possibility - only for some hetero-atoms (not carbon), because the
> mapping of single atoms is very dangerous and may lead to an incorrect
> map.
>
> 2. Actually, the current AAM engine supports changes in bond order.
> Moreover, a lot of attention was paid to this feature. An user should
> set so-called REACTING CENTERS to work with a bond changes
> possibility. There are several types of bonds in a reaction: NONE,
> CENTER, UNCHANGE, CHANGE, MAKE OR BREAK, etc. (You can find a
> description, for example, in the MDL formats help:http://accelrys.com/products/informatics/cheminformatics/ctfile-forma...).

Savelyev Alexander

unread,
Aug 5, 2011, 6:34:00 AM8/5/11
to indigo-bugs
I think, we will add a possibility the changing reacting centers (and
inversions) using our API soon.

Mikhail Rybalkin

unread,
Aug 13, 2011, 3:59:05 AM8/13/11
to indigo-bugs
Hello Daniel,

We have just uploaded an updated Indigo version 1.1-beta5 with
additional method to set and get reacting centers.
For your case you need to add to the all bonds "unchanged |
order_changed" reacting centers.
Code snippet:

IndigoObject rxn2 = indigo.loadReaction("CC=O>>CCO");

for (IndigoObject m: rxn2.iterateMolecules())
for (IndigoObject b: m.iterateBonds())
rxn2.setReactingCenter(b, Indigo.RC_UNCHANGED |
Indigo.RC_ORDER_CHANGED);

for (IndigoObject m: rxn2.iterateMolecules())
for (IndigoObject b: m.iterateBonds())
System.out.println(rxn2.reactingCenter(b));

rxn2.automap("DISCARD");
System.out.println(rxn2.smiles());

Result will be [CH3:1][CH:2]=[O:3]>>[CH3:1][CH2:2][OH:3]

Please note, that in the next version we may add Indigo.RC_***
constants into a separate enum for better readability, as it is done
in out wrapper for C#.
Alexander will answer your questions while I will be on vacation next
week.

With best regards,
Mikhail Rybalkin

Daniel Lowe

unread,
Aug 16, 2011, 1:31:19 PM8/16/11
to indigo-bugs
Thanks a lot for adding this functionality. This has allowed mappings
of quite a few more reactions especially reductions and ring forming
reactions.
As might be expected there a few false positives (not many though)
e.g. parts of aromatic rings being used as aliphatic chains but there
is sufficient control that if this proved a significant problem that
such bonds could be set to RC_UNCHANGED.

Regarding the single atom problem. I fully agree that general case
single atom mapping is a bad idea. I will probably write a special
case for my purposes which checks for a single unmapped atom that is a
halogen or oxygen and then look for an unmapped halogen/oxygen-Y where
Y is not carbon/nitrogen or something like that.

Daniel Lowe

unread,
Aug 21, 2011, 3:26:39 PM8/21/11
to indigo-bugs
Just to check is it correct that there is currently no way to get a
mapping for a reaction like:
reaction.addProduct(indigo.loadMolecule("CC1=C(C(=O)OC(C)
(C)C)C=C(C=C1)C(F)(F)F"));
reaction.addReactant(indigo.loadMolecule("CC(C)([O-])C.[K+]"));
reaction.addReactant(indigo.loadMolecule("CC1=C(C(=O)Cl)C=C(C=C1)C(F)
(F)F"));
without manually neutralising the charge on the oxygen prior to atom
mapping?

Savelyev Alexander

unread,
Aug 23, 2011, 6:24:21 AM8/23/11
to indigo-bugs
Yes, you are absolutely right. There is no way to handle the atom
charges changing in a reaction at the moment. But, I think, we will
add such a possibility soon (an option in the automap() method or
something like that).

Savelyev Alexander

unread,
Sep 15, 2011, 8:45:01 AM9/15/11
to indigo-bugs
The new options were added to the automap() method. Options are the
following:
// "ignore_charges" : do not consider atom charges while searching
// "ignore_isotopes" : do not consider atom isotopes while
searching
// "ignore_valence" : do not consider atom valence while searching
// "ignore_radicals" : do not consider atom radicals while
searching
The options are separated by spaces and are passed with standard
parameters for AAM ("discard", "alter", ..etc)
For example:
...
reaction.automap("discard ignore_charges ignore_isotopes")
...
The new version will be available in the next release.

Daniel Lowe

unread,
Sep 20, 2011, 6:27:21 AM9/20/11
to indigo-bugs
I think I may have stumbled upon another bug, although it appears to
be intermittent in nature (the same input doesn't necessarily
reproduce the bug) so I cannot as yet provide the specific input to
reproduce.
The following code gave the exception:
Exception in thread "main" com.ggasoftware.indigo.IndigoException:
core: can not access object #608980: red-black tree: at(): key not
found
at com.ggasoftware.indigo.Indigo.checkResult(Indigo.java:49)
at
com.ggasoftware.indigo.IndigoObject.setReactingCenter(IndigoObject.java:
305)

when performing reaction.setReactingCenter(b, Indigo.RC_UNCHANGED);
The catch statement had not been invoked.

The reason for changing the reaction centers back to their original
value is because the effect of Indigo.RC_ORDER_CHANGED is visible in
depictions of the molecules.

for (IndigoObject m: reaction.iterateMolecules()){
for (IndigoObject b: m.iterateBonds()){
reaction.setReactingCenter(b, Indigo.RC_UNCHANGED |
Indigo.RC_ORDER_CHANGED);
}
}
try{
reaction.automap("discard");
}
catch (Exception e) {
e.printStackTrace();
LOG.debug("Indigo reaction mapping failed", e);
return false;
}
finally {
for (IndigoObject m: reaction.iterateMolecules()){
for (IndigoObject b: m.iterateBonds()){
reaction.setReactingCenter(b, Indigo.RC_UNCHANGED);
}
}
}

Savelyev Alexander

unread,
Sep 21, 2011, 4:12:08 AM9/21/11
to indigo-bugs
It seems like exception is raised from the finally block. We will try
to reproduce the problem and to fix the issue.

Daniel Lowe

unread,
Sep 21, 2011, 4:22:03 PM9/21/11
to indigo-bugs
I have managed to reproduce this bug. Here is the code to reproduce
it:

Indigo indigo = new Indigo();
for (int i = 0; i < 30000; i++) {
System.out.println(i);
IndigoObject reaction = indigo.createReaction();
reaction.addProduct(indigo.loadMolecule("FC(C(=O)O)
(F)F.CC1(N(C(N(C1=O)C=1C=CC(=C(C1)NC(CN1CCSCC1)=O)OC(F)
(F)F)=O)CC1=CC=NC=C1)C"));

reaction.addReactant(indigo.loadMolecule("CC1(N(C(N(C1=O)C=1C=CC(=C(C1)NC(CCl)=O)OC(F)
(F)F)=O)CC1=CC=NC=C1)C"));
reaction.addReactant(indigo.loadMolecule("C1CSCCN1"));
for (IndigoObject m: reaction.iterateMolecules()){
for (IndigoObject b: m.iterateBonds()){
reaction.setReactingCenter(b, Indigo.RC_UNCHANGED |
Indigo.RC_ORDER_CHANGED);
}
}
try{
reaction.automap("discard");
}
catch (Exception e) {
e.printStackTrace();
}
finally {
for (IndigoObject m: reaction.iterateMolecules()){
for (IndigoObject b: m.iterateBonds()){
reaction.setReactingCenter(b, Indigo.RC_UNCHANGED);
}
}
}
}

Typically the counter will get to between 150 and 400 before the crash
occurs in the finally block.
e.g. Exception in thread "main"
com.ggasoftware.indigo.IndigoException: core: can not access object
#30756: red-black tree: at(): key not found

On Sep 21, 9:12 am, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Savelyev Alexander

unread,
Sep 22, 2011, 5:59:12 AM9/22/11
to indigo-bugs
Unfortunately, I can not reproduce the bug. Could you send your OS
parameters and the exact Indigo version (by calling indigo.version())?

Mikhail Rybalkin

unread,
Sep 22, 2011, 8:33:03 AM9/22/11
to indigo-bugs
Hello, Daniel

We have finally reproduced this issue and trying to fix it now.

Best regards,
Mikhail

On Sep 22, 1:59 pm, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

Daniel Lowe

unread,
Sep 23, 2011, 9:53:40 AM9/23/11
to indigo-bugs
The problem definitely can manifest on the 64-bit 1.6 JDK as well.
(This post will make a lot more sense assuming my previous post also
turns up)

Mikhail Rybalkin

unread,
Sep 23, 2011, 11:01:54 AM9/23/11
to indigo-bugs
Daniel,

We have fixed this issue. I have uploaded Java wrappers and have
called version "1.1-beta6-pre". You can download updated Java wrappers
from this page: http://ggasoftware.com/download/indigo_next (Java
wrappers points to an updated version, while all other links points to
1.1-beta5 version).

Official release of the version 1.1-beta6 with additional features and
fixed bugs will be announced on the next week.

The issue reason was again in that Java can finalize object while it
is used by id in the native code. For example, such issue is mentioned
in the SWIG manual: http://www.swig.org/Doc1.3/Java.html in "21.4.3.4
The premature garbage collection prevention parameter for proxy class
marshalling". We knew about this issue before and had some special
code to resolve it, but not all the cases were taken into account.

With best regards,
Mikhail

Daniel Lowe

unread,
Sep 23, 2011, 1:57:11 PM9/23/11
to indigo-bugs
Thanks a lot for the quick fix. I will begin testing with it now.

Daniel

On Sep 23, 4:01 pm, Mikhail Rybalkin <rybal...@ggasoftware.com> wrote:
> Daniel,
>
> We have fixed this issue. I have uploaded Java wrappers and have
> called version "1.1-beta6-pre". You can download updated Java wrappers
> from this page:http://ggasoftware.com/download/indigo_next(Java
> wrappers points to an updated version, while all other links points to
> 1.1-beta5 version).
>
> Official release of the version 1.1-beta6 with additional features and
> fixed bugs will be announced on the next week.
>
> The issue reason was again in that Java can finalize object while it
> is used by id in the native code. For example, such issue is mentioned
> in the SWIG manual:http://www.swig.org/Doc1.3/Java.htmlin "21.4.3.4

Daniel Lowe

unread,
Sep 25, 2011, 7:05:00 PM9/25/11
to indigo-bugs
Since upgrading I have not encountered the problem again.

In my testing results were identical to the previous version (with the
same options) with the exception that one mapping now fails due to a
valency change on phosphorus (I think this is a bug fix compared to
the previous version)

With ignore_charges and ignore_valence the number of mappable
reactions increased ~8% (1834 -->1976)
with two regressions one of which looks like a bug.

reaction.addProduct(indigo.loadMolecule("O1[C@@H]2[C@H]1C=1C(=CC=3C(=CC(=NC3C1)C)C)OC2(C)C"));
reaction.addReactant(indigo.loadMolecule("CC1(C=CC=2C(=CC=3C(=CC(=NC3C2)C)C)O1)C"));
reaction.addReactant(indigo.loadMolecule("Cl[O-]"));
reaction.addReactant(indigo.loadMolecule("O"));
reaction.automap("discard");

Succeeds due to mapping the water (is that intended???)

whilst
reaction.automap("discard ignore_charges");
fails to match ANYTHING to the epoxide oxygen.

The ideal result would be a mapping to the [O-] although I assume that
doesn't happen due to the 1 atom from a reactant limitation.

The other regression was caused by a fragment which was previously
unsuitable being used rather than one of the fragments being used
twice. This isn't really a bug per se. The input for this is below for
completeness:
reaction.addProduct(indigo.loadMolecule("[N+](=O)
([O-])C1=CC=C(C(=O)OCCCCCCOC(C2=CC=C(C=C2)[N+](=O)[O-])=O)C=C1"));
reaction.addReactant(indigo.loadMolecule("[N+](=O)
([O-])C1=CC=C(C(=O)O)C=C1"));
reaction.addReactant(indigo.loadMolecule("BrCCCCCCBr"));
reaction.addReactant(indigo.loadMolecule("O=CN(C)C"));//This ideally
should not be used by the AAM

What is the algorithm that is used to decide whether a fragment can be
mapped multiple times to the product?
e.g.
why is
reaction.addProduct(indigo.loadMolecule("CCCC.CCCC"));
reaction.addReactant(indigo.loadMolecule("CCCC"));
mappable

whilst:
reaction.addProduct(indigo.loadMolecule("CCC.CCC"));
reaction.addReactant(indigo.loadMolecule("CCC"));

(or more usefully)
reaction.addProduct(indigo.loadMolecule("Cl.Cl"));
reaction.addReactant(indigo.loadMolecule("Cl"));
isn't.

Allowing reactants to map multiple times to the product (and
definitely if they are just present unmodified) seems a highly useful
feature of Indigo's reaction mapping capabilities as otherwise an
implicit assumption is being enforced that all stoichiometry is 1:1

Savelyev Alexander

unread,
Sep 28, 2011, 5:48:28 AM9/28/11
to indigo-bugs
I have uploaded the bug fixes and it will be available in the next
release


> whilst
> reaction.automap("discard ignore_charges");
> fails to match ANYTHING to the epoxide oxygen.
>

Yes, there was a bug. in the end of the algorithm working the special
"check" method is called. The method checks for an incorrect atom
mapping, for example if a mapping conflicts with reacting centers. I
have fixed the method - now it considers the "ignore_*" flags. Thanks
for the bug report.

> The ideal result would be a mapping to the [O-] although I assume that
> doesn't happen due to the 1 atom from a reactant limitation.

Now the mapping matches with the assumed one.

> The other regression was caused by a fragment which was previously
> unsuitable being used rather than one of the fragments being used
> twice. This isn't really a bug per se.

No it is not a bug. The algorithm point is to map more atoms in a
reaction. Also, there is some privilege for reactants molecules. I can
suppose that your task is to determine unused reactants in a reaction.
Unfortunately, current algorithm can not do such a approach at the
moment.
But I can propose a pure algorithm, that can be done above the
"automap" method. You can remove a reactant in a cycle and look for
maximum mapping atoms in the reaction products part for each
iteration. If maximum is reached by removing one (or several)
reactants thus these reactants are not used in a reaction. The
algorithm was created in a minute but it can be improved.

>
> What is the algorithm that is used to decide whether a fragment can be
> mapped multiple times to the product?

If a fragment consists more than 3 atoms it can be multiplied. It is
not a good solution to multiple small fragments since it can corrupt a
mapping. But we can consider the feature request and, say, create an
option there user can determine the number of atoms [or set the atom
list allowed for multiplying].

Daniel Lowe

unread,
Sep 28, 2011, 8:43:57 AM9/28/11
to indigo-bugs
> I have uploaded the bug fixes and it will be available in the next
> release
Thanks for the quick fix. I also forgot to add that when the algorithm
is set to ignore_charges and ignore_valencies that a significant speed-
up is noticed. This somewhat surprised me as I expected these options
would mean that more potential mappings would need to be evaluated due
to greater flexibility in matching.

On Sep 28, 10:48 am, Savelyev Alexander <asavel...@ggasoftware.com>
wrote:

> No it is not a bug. The algorithm point is to map more atoms in a
> reaction. Also, there is some privilege for reactants molecules. I can
> suppose that your task is to determine unused reactants in a reaction.
> Unfortunately, current algorithm can not do such a approach at the
> moment.
> But I can propose a pure algorithm, that can be done above the
> "automap" method. You can remove a reactant in a cycle and look for
> maximum mapping atoms in the reaction products part for each
> iteration. If maximum is reached by removing one (or several)
> reactants thus these reactants are not used in a reaction. The
> algorithm was created in a minute but it can be improved.
Unfortunately not all reactants necessarily form a part of the product
e.g. a reactant may be responsible for a bond order change or the
removal of a group

I am quite happy with the performance of the algorithm as it is now. I
do not believe there are any significant classes of organic reaction
besides perhaps the formation of acid chlorides using thionyl chloride
that are not covered and the occurrences of false mappings is still
sufficiently low.

> If a fragment consists more than 3 atoms it can be multiplied. It is
> not a good solution to multiple small fragments since it can corrupt a
> mapping. But we can consider the feature request and, say, create an
> option there user can determine the number of atoms [or set the atom
> list allowed for multiplying].
I think in general that this is a reasonable heuristic. As an
exception though I would argue that if the EXACT molecule is present
on both sides of the reaction that mapping is appropriate. This would
be appropriate for salts such as [ClH]. Performing graph identity on
molecules with 3 atoms or less should be computationally cheap (e.g.
canonical smiles) so its something that could be considered.

Just to check one more thing. Is:
http://wwmm.ch.cam.ac.uk/~dl387/indigoMappingProblems/odduseofthesamereactanttwice.png
Intended behaviour? The actual reaction has an extra reactant.
The cyclic reactant appears to have been multiplied but different
parts of the multiplied reactant are used. I'm sure there are cases
where a reactant performs multiple roles in a reaction but I'm still a
bit dubious about this due to its potential for false positives
(although I suppose with stricter matching criteria such false
mappings would be less common).

Savelyev Alexander

unread,
Sep 29, 2011, 5:55:00 AM9/29/11
to indigo-bugs
> Thanks for the quick fix. I also forgot to add that when the algorithm
> is set to ignore_charges and ignore_valencies that a significant speed-
> up is noticed. This somewhat surprised me as I expected these options
> would mean that more potential mappings would need to be evaluated due
> to greater flexibility in matching.

There is a simple explanation of the speed-up. The AAM method uses
two algorithms: SUB and MCS. Exact substructures in a reaction appears
more frequently. Maximum common substructure algorithm also depends on
a scaffold size. Thus these options increase the number of possible
matchings in a reaction. Moreover, I have been thinking of changing
default options (for example set ignore_charge to default).

> Unfortunately not all reactants necessarily form a part of the product
> e.g. a reactant may be responsible for a bond order change or the
> removal of a group

Usually the automap is used for mapping standard database reactions.
Reactants which e.g. are responsible for a bond order change can be
regarded as catalysts (the RXN format supports catalyst molecules in a
reaction btw)

> I am quite happy with the performance of the algorithm as it is now. I
> do not believe there are any significant classes of organic reaction
> besides perhaps the formation of acid chlorides using thionyl chloride
> that are not covered and the occurrences of false mappings is still
> sufficiently low.

We are pleased that our product is beneficial.

> I think in general that this is a reasonable heuristic. As an
> exception though I would argue that if the EXACT molecule is present
> on both sides of the reaction that mapping is appropriate. This would
> be appropriate for salts such as [ClH]. Performing graph identity on
> molecules with 3 atoms or less should be computationally cheap (e.g.
> canonical smiles) so its something that could be considered.

Thank you for your suggest. We will add your request to the query list
and, possibly, add the new improvement in the nearest future.

> Just to check one more thing. Is:http://wwmm.ch.cam.ac.uk/~dl387/indigoMappingProblems/odduseofthesame...
> Intended behaviour? The actual reaction has an extra reactant.
> The cyclic reactant appears to have been multiplied but different
> parts of the multiplied reactant are used. I'm sure there are cases
> where a reactant performs multiple roles in a reaction but I'm still a
> bit dubious about this due to its potential for false positives
> (although I suppose with stricter matching criteria such false
> mappings would be less common).

Yes, this is intended behaviour. There are a bunch of examples where
such a algorithm heuristic makes correct mapping (where correct
mapping is the mapping taken from a handmade scientific database)
Reply all
Reply to author
Forward
0 new messages