I just came across the same issue. In researching it, I found Casey
Wood's previous post about it, which appears to be unresolved.
I am using Indigo to process records from ChEMBL. I found already
that Indigo doesn't accept the bad stereochemistry in the ChEMBL 13
data set, so I used RDKit to convert the SDF files into SMILES, then
use Indigo to process those SMILES.
I found two ways where Indigo could not parse RDKit's generated SMILES.
One is that RDKit supports aromatic Te as a SMILES extension, and
RDKit perceives [te] in 18 structures out of 1,000,000+.
The other way is in the two structures CHEMBL1616388 and CHEMBL1357894.
The SMILES are I1c2ccccc2c3ccccc13 and Cl.I1c2ccccc2c3ccccc13 respectively.
Indigo raises an exception in mol.aromatize() saying
>>> mol = indigo.loadMolecule("I1c2ccccc2c3ccccc13")
>>> mol.aromatize()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "indigo core", line 3, in indigoAromatize_wrapper
File "/Users/dalke/ftps/indigo-python-1.1-rc-universal/indigo.py", line 1071, in _checkResult
raise IndigoException(Indigo._lib.indigoGetLastError())
indigo.IndigoException: 'element: bad valence on I having 2 drawn bonds, charge 0, and 0 radical electrons'
I don't know where those structures come from, and I don't know how
iodine is supposed to have a valence of 2. I can only say that the
ChEMBL data set contains a couple of SMILES strings which Indigo cannot
handle.
If you all think this is a structure error, and that the chemistry is truly
impossible, then I'll notify ChEMBL about it. I don't know enough chemistry
to be able to be certain.
In any case, large data sets contain strange chemistries, and it would
be nice if there was some way to remove the safeties, as it were, and
allow bad chemistry to go through.
Cheers,
Andrew
da...@dalkescientific.com