Thank you for the response.
> Hi @all,
> sorry for my late response.
>
>> As you can notice, a principal problem is the problem with generating
>> so named 'full scaffold' (a scaffold gathers all the Rsites together).
> Sorry I do not understand what 'full scaffold' means.
> A Scaffold generated by maximum common substructure search?
> If so, should these two tasks separated in different API?
> mols = []
> for smiles in structures:
> mol = indigo.loadMolecule(smiles)
> mols.append(mol)
> scaffolds = indigo.generateMCSS(mols,other options)
> (Perhaps generate MCSS which did not match any Molecule in mols, but a
> user defined fraction...)
No, a full scaffold is not a scaffold generated by MCSS. The scaffold
detection API will NOT be changed. MCSS is not considered in this topic.
The issue applies ONLY to the RGroup Decomposition.
There are two different types of input queries:
- user defined molecule with RGroups (e.g. c1(R1)cc(R2)ccc1 in your
example)
- simple query molecule, which can be passed from the Scaffold
Detection (e.g. c1ccccc1 in your example)
In the first case the full scaffold equals to the user defined molecule
itself and it can not be changed during the RGroup Decomposition.
In the second case the full scaffold should be generated by the library,
and it will be returned by the decomposedMoleculeScaffold() method.
I think the examples below will clarify the logic.
PS. I will use your smiles notation with R1, R2... atoms, which is not
supported by smiles readers, but if someone wants to read it, the
R<Number> string can be replaced by [*:<Number>] with mapping, e.g. R1
>> [*:1], R2 >> [*:2], etc... c1(R1)cc(R2)ccc1 >> c1([*:1])cc([*:2])ccc1
Query: c1(R1)cc(R2)ccc1
Molecule: c1(N)cc(O)ccc1
Full scaffold: c1(R1)cc(R2)ccc1
Query: c1ccccc1
Molecule: c1(N)cc(O)ccc1
Full scaffold: c1(R1)cc(R2)ccc1
As I mentioned before, the full scaffold gathers all the Rsites together
and should match all input molecules.
Query: C1CCNCC1
Molecule: NC1CCNCC1
Full scaffold: (R1)C1CCNCC1
Query: C1CCNCC1
Molecule1: OC1CCNCC1
Molecule2: C1CCNC(N)C1
Full scaffold: (R1)C1CCNC(R2)C1
The next step is the match iterating. I should notice, that for user
defined scaffolds (first case) there are no problems. But in the second
case (and all the API description in the topic above was for this case),
the full scaffold can be various for different matchings. The example
below shows such a possibility.
Query:C1CCCCC1
Molecule:C1CCC(CC1)C1CCC2OC2C1
Full scaffold(possibility 1): ([*:1])C1CCCCC1
Full scaffold(possibility 2): ([*:1])C1CCC2[*:2]C2C1
And we should give an opportunity to user for selecting the match . In
the example below we will select the possibility 2 with max RGroup count.
(PS. I have noticed the typo in my code examples in first letter -
inside the match iterating loop item-->q_match )
# iterate through all the structures
for smiles in structures:
try:
item = deco.processMolecule(indigo.loadMolecule(smiles))
max_r = 0
selected_match=None
# loop over all the matches
for q_match in item.iterateMatches()
# add match with maximum RGroup count
rg_mol = item.decomposedMoleculeWithRGroups()
if rg_mol.countRSites() > max_r:
max_r=rg_mol.countRSites()
selected_match=q_match
#add current match to the full scaffold (possibility2)
current match.addMatchToScaffold()
Suppose we have a second molecule. If we use the code above, the full
scaffold will be:
Query:C1CCCCC1
Molecule1:C1CCC(CC1)C1CCC2OC2C1
Molecule2:OC1CCCCC1O
Full scaffold: ([*:1])C1CC2[*:2]C2CC1([*:3])
But if we do not use the code above:
Query:C1CCCCC1
Molecule1:C1CCC(CC1)C1CCC2OC2C1
Molecule2:OC1CCCCC1O
Full scaffold: ([*:1])C1CCCCC1([*:2])
>
> I think it would be nice to know if results match different atom
> sets.
> I user only wants one result per atom set or one result at all, how
> about prioritize mappings/atom sets by the canonical smiles of the R-
> Groups.
> Eg:
> Query: c1(R1)cc(R2)ccc1
> Molecule: c1(-NH)cc(-OH)ccc1
> Results:
> Mapping 1 R1-NH R2-OH
> Mapping2 R1-OH R2-NH
> Generate canonical Smiles from each R-Group. Sort mappings/atom sets
> on R-Groups (R1 highest priority).
> for smiles in structures:
> # handle current molecule and handle exceptions
> try:
> #sorted list of R-Groups (R1,R2..Rn)
> rGroupList=deco.getRgroups()
> matches = deco.processMolecule(indigo.loadMolecule(smiles))
> #loop over all matches
> for atomSet in matches
> for match in atomSet
> s=��
> for rGroupName in rGroupList
> s=s + rGroupName + � :� +
> match.getCanonicalSmilesFor(rGroupName) + ' '
> print s
> #loop over first match on all atom sets
> for atomSet in matches
> s=��
> match=atomSet.getFirstMatch()
> for rGroupName in rGroupList
> s=s + rGroupName + � :� +
> match.getCanonicalSmilesFor(rGroupName) + ' '
> print s
> #get first match on first atomSet
> match=matches.getFirstAtomSet().getFirstMatch()
> s=��
> for rGroupName in rGroupList
> s=s + rGroupName + � :� +
> match.getCanonicalSmilesFor(rGroupName) + ' '
> print s
>
I think I understand what you meant. Yes, the algorithm will loop only
the matches that generate molecules with different canonical smiles
(considering RGroup number). Suppose we have the following example:
Query:C1CCCCC1
Molecule:NC1CCCCC1
There will be only one match:
(R1)C1CCCCC1, R1=N
And the algorithm will skip the matches C1(R1)CCCCC1, C1C(R1)CCCC1,
C1CC(R1)CCC1, ...etc because it is the same molecule
If we have the example:
Query:C1CCCCC1
Molecule:NC1CCCC(O)C1
There will be two matches:
(R1)C1CCCC(R2)C1, R1=OH, R2=NH2
(R1)C1CCCC(R2)C1, R1=NH2, R2=OH
The last thing, your code contains
#sorted list of R-Groups (R1,R2..Rn)
rGroupList=deco.getRgroups()
If we have simple scaffold query (without RGroups), we can not get the
RGroups, because we do not know them all at the current iteration.
The same example can be implemented using the given API:
Query: c1(R1)cc(R2)ccc1
Molecule: c1(-NH)cc(-OH)ccc1
# loop over all the structures
for smiles in structures:
try:
item = deco.processMolecule(indigo.loadMolecule(smiles))
# get decomposed molecules (first match)
rg_mol = item.decomposedMoleculeWithRGroups()
# add all the matches
item.addAllMatchesToScaffold()
# iterate over all the matches
for q_match in item.iterateMatches()
# get decomposed molecules (current match)
rg_mol = q_match.decomposedMoleculeWithRGroups()
s=''
# iterate over RGroups
for rg in rg_mol.iterateRGroups():
s=s+'R' + str(rg.index()) + ':'
if rg.iterateRGroupFragments().hasNext():
rg_frag = rg.iterateRGroupFragments().next()
#print canonical smiles for a Rgroup fragment
s = s + rg_frag.canonicalSmiles() = '\n'
except Exception,e:
# error handlers
If I have missed something please let me know.
With best regards,
Alexander
The new decomposition algorithm and the API is under development at the
moment. Unfortunately, the user defined scaffold is not supported in the
current version. I will inform about the release in this topic.
With best regards,
Alexander
> Hi all,
>
>> I should notice, that for user
>> defined scaffolds (first case) there are no problems.
> I don't manage to define my own RgroupScaffold for Rgroup
> decomposition:
> IndigoObject RgScaff = session.loadQueryMolecule("C1CCNCC1[*:
> 1]C1=CC=CC=C1");
> rGroupDecomposition = session.decomposeMolecules(RgScaff ,
> indigoMolList);
>
> I Obtain this Exception:
> com.ggasoftware.indigo.IndigoException: R-Group deconvolution: no
> embeddings obtained
> at com.ggasoftware.indigo.Indigo.checkResult(Indigo.java:57)
> at com.ggasoftware.indigo.Indigo.decomposeMolecules(Indigo.java:421)
>
> My Scaffold should match on all molecules in indigoMolList.
> Maybe my query molecule is not correct.
>
> Someone could help me?
>
> With best regards.
>
> M�d�rich.
We are glad to represent the new RGroup Decomposition API. The Indigo
library version (1.1-beta11) with the new API is already available for
downloading.
http://ggasoftware.com/accept?file=indigo-1.1-beta11%2Findigo-java-1.1-beta11-universal.zip
PS: http://ggasoftware.com/download/indigo_next will be updated soon.
There are several important changes:
- iterate decomposition matches support
- user-defined scaffolds support
- memory usage improvement for the decomposition
All the declared issues were resolved. But the version is in a beta
testing, therefore, we are happy to receive any responses or comments.
Below, I will provide the example scripts on python.
PS. There are changes in function names since the first letter.
simple usage
-----------------------------------------------------------------------------------------------------------
# prepate query scaffold
scaffold = indigo.loadQueryMolecule("some structure")
# init decomposition
deco = indigo.createDecomposer(scaffold)
# iterate over all the structures (an iterator may be non-obvious)
for smiles in structures:
# handle current molecule
item = deco.processMolecule(indigo.loadMolecule(smiles))
# get decomposed molecules and add Rsites to full scaffold
high_mol = item.decomposedMoleculeHighlighted()
rg_mol = item.decomposedMoleculeWithRGroups()
# get full scaffold with Rsites in the end of iteration
full_scaf = deco.decomposedMoleculeScaffold()
-----------------------------------------------------------------------------------------------------------
usage with iterating all matches
-----------------------------------------------------------------------------------------------------------
...
# preparations
# iterate over all the structures
for smiles in structures:
# handle current molecule and handle exceptions
try:
item = deco.processMolecule(indigo.loadMolecule(smiles))
# iterate over all the decompositions
for q_match in item.iterateDecompositions()
# get decomposed molecules (current match)
rg_mol = q_match.decomposedMoleculeWithRGroups()
# add current match
deco.addDecomposition(q_match)
except Exception,e:
# error handlers
...
-----------------------------------------------------------------------------------------------------------
User defined scaffold with predefined RSites is supported by the
algorithm. You can load scaffold from a molfile. The SMILES format is
not supported at the moment. But I think, we will add SMILES support in
the nearest future (e.g. with the ...[*:1]...[*:2]... notation)
example with user-defined scaffold
-----------------------------------------------------------------------------------------------------------
# prepate query scaffold (e.g. '(R1)C1CCCC(R2)C1')
scaffold = indigo.loadQueryMoleculeFromFile("query_mol")
# init decomposition
deco = indigo.createDecomposer(scaffold)
# load molecule
mol = indigo.loadMolecule('NC1CCCC(O)C1')
# create deco item
item = deco.processMolecule(indigo.loadMolecule(smiles))
# iterate over all the decompositions
for q_match in item.iterateDecompositions()
# get decomposed molecule (current match)
rg_mol = q_match.decomposedMoleculeWithRGroups()
# print molfile
print(rg_mol.molfile())
-----------------------------------------------------------------------------------------------------------
In the example above there will be two matches:
(R1)C1CCCC(R2)C1, R1=OH, R2=NH2
(R1)C1CCCC(R2)C1, R1=NH2, R2=OH
If you have any questions, please let us know.