R-Group Decomposition

111 views
Skip to first unread message

Gerhard en-Naser

unread,
Jan 4, 2012, 9:05:32 AM1/4/12
to indigo-general
Hi there,
I have just tried to use R-Group decomposition with a custom defined
core structure. However, results differ in two aspects from my
expectations. (All Queries are read from mol-Files. SMILES notation
only used for examples in this text)

1)
Results are incomplete. Is there a function to iterate over all
possible matchings of an R-Group-Decomposition? E.g. if scaffold
c1[cH]c[cH][cH][cH]1 is applied to target structure c1(-O)[cH]c(-N)
[cH][cH][cH]1, I would expect two results:
1. R1=N,R2=O
2. R1=O,R2=N.
I would also expect result sets from all hits, if target structure
contains the scaffold several times with different atoms

2)
IMHO Substructure search for Decomposition behaves a little bit
weird.
a) If atoms in query structure are blocked with hydrogen, to avoid R
allocation, target molecules must be equipped with explicit Hydrogen
(e.g. try above query with c(-O)1cc(-N)ccc1 as target)
b) On Query-Structures with R-Groups (c1(-R1)cc(-R2)ccc1, see
attached molfile) the R-Groups are not used (iteration over R-Group-
Fragments is empty for R1 and R2). I would like to define explicitly
the R-Group Number, so I tried this setting.

Perhaps other substructure search behaviour for R-Group Decomposition
would be better?
-Allow only Hydrogen or nothing for undefined neighbour in scaffold
structures (don’t cares in usual substructure search)
-Allow user to define R-Groups on core structures. R-Groups match any
atom, Hydrogen included.

I’m sorry for my terrible English. Hope, I could make myself
understandable.


Query as mol-file

SMMXDraw01041212192D

8 8 0 0 0 0 0 0 0 0999 V2000
11.7289 -10.4518 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.7712 -10.4512 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
12.7520 -9.8615 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.7712 -11.6327 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
11.7289 -11.6380 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
12.7545 -12.2222 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
10.7060 -9.8612 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
14.7940 -9.8606 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
6 4 1 0 0 0 0
5 6 2 0 0 0 0
2 3 1 0 0 0 0
1 5 1 0 0 0 0
4 2 2 0 0 0 0
3 1 2 0 0 0 0
1 7 1 0 0 0 0
2 8 1 0 0 0 0
M RGP 2 7 1 8 2
M END

Savelyev Alexander

unread,
Jan 8, 2012, 12:20:05 PM1/8/12
to indigo-general
Hello Gerhard,

>
> 1)
> Results are incomplete. Is there a function to iterate over all
> possible matchings of an R-Group-Decomposition? E.g. if scaffold
> c1[cH]c[cH][cH][cH]1 is applied to target structure  c1(-O)[cH]c(-N)
> [cH][cH][cH]1, I would expect two results:
>         1. R1=N,R2=O
>         2. R1=O,R2=N.
> I would also expect result sets from all hits, if target structure
> contains the scaffold several times with different atoms

Yep, the iteration through all the possible scaffold matchings should
be useful. We have already explored the similar problem with the
undefined match possibility. I think the new functionality will be
added in the next releases. I will let you know.

>
> 2)
> IMHO Substructure search for Decomposition behaves a little bit
> weird.
>         a) If atoms in query structure are blocked with hydrogen,  to avoid R
> allocation, target molecules must be equipped with explicit Hydrogen
> (e.g. try above query with c(-O)1cc(-N)ccc1 as target)

You are right, the hydrogen processing is skipped at the moment. I
think, a new behavior can simply be implemented in the nearest future.

>         b) On Query-Structures with R-Groups (c1(-R1)cc(-R2)ccc1, see
> attached molfile) the R-Groups are not used (iteration over R-Group-
> Fragments is empty for R1 and R2). I would like to define explicitly
> the R-Group Number, so I tried this setting.
>
> Perhaps other substructure search behaviour for R-Group Decomposition
> would be better?
>         -Allow only Hydrogen or nothing for undefined  neighbour in scaffold
> structures (don’t cares in usual substructure search)
>         -Allow user to define R-Groups on core structures. R-Groups match any
> atom, Hydrogen included.
>

Here I do not fully understand the issue. Can you specify some
examples for a core structure and expected results?

With best regards,
Alexander

Gerhard en-Naser

unread,
Jan 9, 2012, 6:41:00 AM1/9/12
to indigo-general
Hello Alexander,

> > Perhaps other substructure search behaviour for R-Group Decomposition
> > would be better?
> > -Allow only Hydrogen or nothing for undefined neighbour in scaffold
> > structures (don’t cares in usual substructure search)
> > -Allow user to define R-Groups on core structures. R-Groups match any
> > atom, Hydrogen included.
>
> Here I do not fully understand the issue. Can you specify some
> examples for a core structure and expected results?

thx for your reply. I will try to explain myself.
Please keep in mind that all substructure definition are in MDL-Mol
format, although written as SMILES in this text (and my bad English
also :-) ).
In R-Group decomposition you have to perform a substructure search as
first step.
I think that substructure search should behave different for plain
substructure search and substructure search for decomposition.
If a chemist uses c1ccccc1 in substructure search he wants to select
all structures which contain phenyl. Any atom outside the ring has no
effect on the result. ( c1c[Cl]cccc1, c12ccccc1cccc2 are ok).
But in contrast if a chemist defines c[R1]1cc[R2]ccc1 as core
structure for R-Group decomposition, I think he would expect a more
rigid behaviour. I think that he expect an exact match of the
substructure on target structure. So
c1ccccc1 (benzene), c1cc[Cl]ccc1 (Chlorobenzene), c[Cl]1cc[N]ccc1 (3-
Chlorbenzene)
are valid results. But any ring, equipped with more than two heavy
atoms (e.g. c[Cl]1cc[N]ccc[C]1 …) should not considered as core
during underlying substructure search.
During decomposition Rn-Groups are markers where variability is
allowed, Hydrogen or any other atom.
Target atoms which match unmarked atoms in core, must not have other
heavy atom neighbours than given by core structure.

During the last days I have tried to program this behaviour with
Indigo this way:
1. read core as query structure from accelry-draw molfile
2. loop over R-Groups, keep atom number and R-Number. Exchange R to
[H,A]
3. do a substructure search with the modified core on target
structure
4. loop over all possible mappings
4a) remove mappings where target atoms maps non-R-Group-atoms in core
and # neighbours are different.
4b) Generate a dictionary of Rn-atoms (n to atom-number) in target.
Remove core atoms from target.
Bind newly generated Rn-atoms to target atoms according to
dictionary.
4c) Use dfs to find all connected atoms for each Rn. Output
submolecule.

Unfortunately it is not possible to modify atoms in Query Structures
via resetAtom(). Do you have any idea how exchange in step 2 can be
done?

Thank you for your effort
Gerhard

Mikhail Rybalkin

unread,
Jan 29, 2012, 4:13:56 PM1/29/12
to indigo-general
Dear Gerhard,

Due to your request we have added methods for the changing
Query Structure. You can use addAtom and resetAtom on the
query molecule, and this methods accepts SMARTS expressions.
Also you can work with a particular constraints with
atom.removeConstraints, addConstraint, addConstraintOr, and
addConstraintNot methods.

Also you can just to modify an existing query atom by adding
an additional constraints to it with addConstraint method.

For example, for your case you need to add a substituents
constraint for a query atom:

atom.addConstraint("substituents", "3")

This is equivalent to appending "X3" to the SMARTS expression.

Other integer constraints are:

"atomic-number"
"charge"
"isotope"
"radical"
"valence"
"connectivity"
"total-bond-order"
"hydrogens"
"substituents"
"ring"
"smallest-ring-size"
"ring-bonds"
"rsite-mask",
"rsite"

Aromaticity constraint:

atom.addConstraint("aromaticity", "aromatic")
or
atom.addConstraint("aromaticity", "aliphatic")

Arbitrary SMARTS constraint:

atom.addConstraintOr("smarts", "[n, c]")

In addition there are the following methods:
atom.removeConstraints("<constraint>")
atom.addConstraintOr("smarts", "[n, c]")
atom.addConstraintNot("atomic-number", "6")

Best regards,
Mikhail

On Jan 9, 3:41 pm, Gerhard en-Naser

gen

unread,
Jan 30, 2012, 5:54:48 PM1/30/12
to indigo-general
Hello Mikhail,

thx for your really nice update. I'm pretty sure your new features
will give me the opportunity of an custom made R-Group-Decomposition.
I will give feedback with sources as as fast as possible. However this
may take a few days. Please be patient.
Thanks again for extending your code.

mederich...@gmail.com

unread,
Feb 3, 2012, 5:45:06 AM2/3/12
to indigo-general
Hi everybody,

We have also found this problem of non-iteration of all possible
scaffold matching.
So, we are interested by a new release which will fix it.

For example, this will be very interesting to select the Scaffold
Rgroup Decomposition which has the most number of Rsite (to have the
most detailed decomposition).

So let me now when this will be added to the next INDIGO release.

With best regards.

Médérich

Savelyev Alexander

unread,
Apr 13, 2012, 6:54:39 AM4/13/12
to indigo-...@googlegroups.com
Hi everybody,

The new indigo 1.1 beta11 version is available for downloading:

http://ggasoftware.com/download/indigo_next

The new version contains the new RGroup Decomposition API which allows
user to iterate over all scaffold matches. More detailed description can
be found here:

https://groups.google.com/group/indigo-general/browse_thread/thread/75281df2f70ec1a

If you have any question, please let us know.

With best regards,
Alexander


> Hi everybody,
>
> We have also found this problem of non-iteration of all possible
> scaffold matching.
> So, we are interested by a new release which will fix it.
>
> For example, this will be very interesting to select the Scaffold
> Rgroup Decomposition which has the most number of Rsite (to have the
> most detailed decomposition).
>
> So let me now when this will be added to the next INDIGO release.
>
> With best regards.
>

> M�d�rich

>>>>>> structures (don�t cares in usual substructure search)


>>>>>> -Allow user to define R-Groups on core structures. R-Groups match any
>>>>>> atom, Hydrogen included.
>>>>> Here I do not fully understand the issue. Can you specify some
>>>>> examples for a core structure and expected results?
>>>> thx for your reply. I will try to explain myself.
>>>> Please keep in mind that all substructure definition are in MDL-Mol
>>>> format, although written as SMILES in this text (and my bad English
>>>> also :-) ).
>>>> In R-Group decomposition you have to perform a substructure search as
>>>> first step.
>>>> I think that substructure search should behave different for plain
>>>> substructure search and substructure search for decomposition.
>>>> If a chemist uses c1ccccc1 in substructure search he wants to select
>>>> all structures which contain phenyl. Any atom outside the ring has no
>>>> effect on the result. ( c1c[Cl]cccc1, c12ccccc1cccc2 are ok).
>>>> But in contrast if a chemist defines c[R1]1cc[R2]ccc1 as core
>>>> structure for R-Group decomposition, I think he would expect a more
>>>> rigid behaviour. I think that he expect an exact match of the
>>>> substructure on target structure. So
>>>> c1ccccc1 (benzene), c1cc[Cl]ccc1 (Chlorobenzene), c[Cl]1cc[N]ccc1 (3-
>>>> Chlorbenzene)
>>>> are valid results. But any ring, equipped with more than two heavy

>>>> atoms (e.g. c[Cl]1cc[N]ccc[C]1 �) should not considered as core

Reply all
Reply to author
Forward
0 new messages