Input Smiles from V1 and V2 scores of Docking Experiments

1 view
Skip to first unread message

soseback

unread,
Feb 6, 2008, 12:34:46 PM2/6/08
to UsefulChem
A question for Rajarshi-

We have been planning our experiments from the output SMILES generated
from your docking algorithm, and it gets to be a very long and
tedious process because the scores give us the SMILES of the predicted
Ugi product, which then has to be analyzed to determine which
aldehydes, amines, carboxylic acids, and isocyanides were used. So we
were wondering, since the program uses input SMILES in order to create
the proposed structures, is there a way to get the program to lgive us
back the SMILES of the reagents that it used for each Ugi product?

This would save our researchers an immense amount of time, and
eliminate the difficulties we encounter whilst naming the organic
compounds. We have had numerous instances when we look in the stock
room for a chemical using its IUPAC name that we figured out from
looking at the structure of the Ugi product, only to find that the
bottle is under its common name!

Thanks!
Shannon

Rajarshi

unread,
Feb 6, 2008, 1:37:45 PM2/6/08
to UsefulChem
A quick hack resulted in http://www.chembiogrid.org/cheminfo/fpdock/rgts.txt

This is a (large!) plain text file with 5 columns:

serial num, acid smiles, amine smiles, aldehyde smiles, isonitrile
smiles

Note that the SMILES have a ring closure symbol at the end (%90, %91
etc). To get the original smiles, you should just add the
corresponding group.

So for example, the amine entry represented by C%91 can be rewritten
to CN and the aldehyde written as c1ccc(cc1)%92 would be rewritten as
c1ccc(cc1)C=O

You can probably do a quick search and replace: %91 -> N, %92 -> C=O,
%93 -> [N+]#[C-] and %90 -> C(=O)O

So for a given docking result, look up the serial number in the 'Name'
column and use the serial number to look up the regeants.

Jean-Claude Bradley

unread,
Feb 6, 2008, 1:49:17 PM2/6/08
to usefu...@googlegroups.com
That was pretty fast Rajarshi - I'll have to make sure you are not a bot at the BCCE :)

It looks like you have over 77K compounds here so maybe that's not the right library

Shannon is referring to library 3:
http://usefulchem.wikispaces.com/UClib003

giving docking results:
http://usefulchem.wikispaces.com/D-EXP014


Rajarshi

unread,
Feb 6, 2008, 3:14:08 PM2/6/08
to UsefulChem


On Feb 6, 1:49 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
> That was pretty fast Rajarshi - I'll have to make sure you are not a bot at
> the BCCE :)
>
> It looks like you have over 77K compounds here so maybe that's not the right
> library
>
> Shannon is referring to library 3:http://usefulchem.wikispaces.com/UClib003

Aah. The regeant list for the 71K compounds is at
http://www.chembiogrid.org/cheminfo/fpdock/003/rgts.txt

BTW, the SMILES for 2-chloro-5-nitrobenzaldehye should be C(=O)c1cc([N
+](=O)[O-])ccc1Cl


Jean-Claude Bradley

unread,
Feb 6, 2008, 3:27:59 PM2/6/08
to usefu...@googlegroups.com
Thanks Rajarshi - I updated Library 003 accordingly
http://usefulchem.wikispaces.com/UClib003

Shannon - let us know if this helps
--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry

soseback

unread,
Feb 6, 2008, 4:43:14 PM2/6/08
to UsefulChem
That does help thank you so much!

One thing Dr Bradley that I'm just noticing is that it seems that the
list we have been working from since last term with the ranks is
different than what you just posted? This is the list we have been
working from http://usefulchem.wikispaces.com/D-EXP014 the V1 list of
compounds, more specifically the V1B google document.

I just compared several of the ranks on that V1 list to the output
SMILES corresponding in library 3 and they are not the same. So I
guess this means that having the input SMILES according to the library
3 list does not help us, because it would be more work to figure out
which ones on that list correspond to the different rank on the V1
list.

Let me know what you think.

On Feb 6, 3:27 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
> Thanks Rajarshi - I updated Library 003 accordinglyhttp://usefulchem.wikispaces.com/UClib003
>
> Shannon - let us know if this helps
>

Rajarshi

unread,
Feb 6, 2008, 5:33:30 PM2/6/08
to UsefulChem

> On Feb 6, 2008, at 4:43 PM, soseback wrote:

> One thing Dr Bradley that I'm just noticing is that it seems that the
> list we have been working from since last term with the ranks is
> different than what you just posted? This is the list we have been
> working from http://usefulchem.wikispaces.com/D-EXP014 the V1 list of
> compounds, more specifically the V1B google document.

I can confirm that the text file at
http://showme.physics.drexel.edu/mirza/DEXP014-V1A.txt matches my
ranked score data file at http://www.chembiogrid.org/cheminfo/fpdock/003/cons-rank-v1.txt

The V1B google doc should match the first 1637 lines of the above file
- I haven't checked rigorously but the first 2 SMILES match

> I just compared several of the ranks on that V1 list to the output
> SMILES corresponding in library 3

Where is the output SMILES that you're looking at ?

For example if I look at the 1st ranked compound in V1, I get it's
name as 50539 and it's SMILES as

CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(CCC)C(=O)CNC(=O)c4ccccc4

If you then look at http://www.chembiogrid.org/cheminfo/fpdock/003/rgts.txt
and lookup the row for 50539 the rgts are

acid amine aldehyde isonitrile
O=C(NC%90)c1ccccc1 CCC%91 c1ccc2c(c1)cc(c3ccccc23)%92 CCCC%93

If you join these reagents to get the Ugi product it matches the
SMILES for 50539 in the score file.

Also, if you're working with various subsets of the library I strongly
recommend that you retain the name of the compound (basically it's
serial number that I generated). Thus the V1B document should have the
serial number as a column - other wise it will be very painful later
on to check stuff with the original raw data

-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
After a number of decimal places, nobody gives a damn.

Rajarshi

unread,
Feb 6, 2008, 5:38:22 PM2/6/08
to UsefulChem


On Feb 6, 4:43 pm, soseback <shannon.oseb...@gmail.com> wrote:
> That does help thank you so much!
>
> One thing Dr Bradley that I'm just noticing is that it seems that the
> list we have been working from since last term with the ranks is
> different than what you just posted? This is the list we have been
> working fromhttp://usefulchem.wikispaces.com/D-EXP014 the V1 list of
> compounds, more specifically the V1B google document.
>
> I just compared several of the ranks on that V1 list to the output
> SMILES corresponding in library 3 and they are not the same.

Aah, I think I know what's going on.

First, I assume that when you talk about output SMILES you're looking
at

http://showme.physics.drexel.edu/mirza/UClib003.txt

If so, yes, the SMILES for say the 1st ranked compound (50539) do
differ in the V1B spreadsheet and the above file. But they represent
the same molecule. You can confirm this by pasting the SMILES at
http://www.daylight.com/daycgi/depict

The thing is that none of the SMILES are canonical, that's why they
may differ.

soseback

unread,
Feb 6, 2008, 5:40:03 PM2/6/08
to UsefulChem
Oh Ok that helps a lot, I was not aware of the serial numbers, we have
just been going by the numerical rank of the list, then I guess I was
comparing that to your serial numbers which of course don't match.
Thank you for clearing that up and we will definitely have to add the
serial number to that V1B document. Thank you so much for the input
smiles, it will make our experiments so much easier!


On Feb 6, 5:33 pm, Rajarshi <rajarshi.g...@gmail.com> wrote:
> > On Feb 6, 2008, at 4:43 PM, soseback wrote:
> > One thing Dr Bradley that I'm just noticing is that it seems that the
> > list we have been working from since last term with the ranks is
> > different than what you just posted? This is the list we have been
> > working fromhttp://usefulchem.wikispaces.com/D-EXP014 the V1 list of
> > compounds, more specifically the V1B google document.
>
> I can confirm that the text file athttp://showme.physics.drexel.edu/mirza/DEXP014-V1A.txtmatches my
> ranked score data file athttp://www.chembiogrid.org/cheminfo/fpdock/003/cons-rank-v1.txt
>
> The V1B google doc should match the first 1637 lines of the above file
> - I haven't checked rigorously but the first 2 SMILES match
>
> > I just compared several of the ranks on that V1 list to the output
> > SMILES corresponding in library 3
>
> Where is the output SMILES that you're looking at ?
>
> For example if I look at the 1st ranked compound in V1, I get it's
> name as 50539 and it's SMILES as
>
> CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(CCC)C(=O)CNC(=O)c4ccccc4
>
> If you then look athttp://www.chembiogrid.org/cheminfo/fpdock/003/rgts.txt

Jean-Claude Bradley

unread,
Feb 6, 2008, 7:02:04 PM2/6/08
to usefu...@googlegroups.com
Shannon - yes there are many numbers and many projects so we have to keep communicating to keep it straight.
Because we can have many different docking studies by many different people on the same libraries, we keep track of the libraries here:
http://usefulchem.wikispaces.com/Libraries  (notice UClib004 with only water soluble Ugi products is pending - could use some help on that :)
and the docking runs on these various libraries all have D-EXP numbers and link back to the original libraries:
http://usefulchem.wikispaces.com/combiugi

Rajarshi is right that we should use the IDs in the original libraries to be consistent because SMILES can be written in different ways and InChIs can be too long.  But I wonder why not use InChIKeys as the identifers - Rajarshi any thoughts?


Tony at ChemSpider

unread,
Feb 6, 2008, 7:13:04 PM2/6/08
to UsefulChem
Using InChIKeys will certainly remove the SMILES differences but don't
forget that you cannot then convert the InChIKeys back to structures
and you will need to keep a table of InChiKeys and the original InChI
strings or SMILES so that you have the relationship captured
somewhere. You could batch deposit the structures onto ChemSPider and
we can issue the iNChIKeys to you (maybe they are already on
ChemSPider)? Best wishes


On Feb 6, 7:02 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
> Shannon - yes there are many numbers and many projects so we have to keep
> communicating to keep it straight.
> Because we can have many different docking studies by many different people
> on the same libraries, we keep track of the libraries here:http://usefulchem.wikispaces.com/Libraries (notice UClib004 with only water
> soluble Ugi products is pending - could use some help on that :)
> and the docking runs on these various libraries all have D-EXP numbers and
> link back to the original libraries:http://usefulchem.wikispaces.com/combiugi
>
> Rajarshi is right that we should use the IDs in the original libraries to be
> consistent because SMILES can be written in different ways and InChIs can be
> too long. But I wonder why not use InChIKeys as the identifers - Rajarshi
> any thoughts?
>
> On Feb 6, 2008 5:40 PM, soseback <shannon.oseb...@gmail.com> wrote:
>
>
>
>
>
> > Oh Ok that helps a lot, I was not aware of the serial numbers, we have
> > just been going by the numerical rank of the list, then I guess I was
> > comparing that to your serial numbers which of course don't match.
> > Thank you for clearing that up and we will definitely have to add the
> > serial number to that V1B document. Thank you so much for the input
> > smiles, it will make our experiments so much easier!
>
> > On Feb 6, 5:33 pm, Rajarshi <rajarshi.g...@gmail.com> wrote:
> > > > On Feb 6, 2008, at 4:43 PM, soseback wrote:
> > > > One thing Dr Bradley that I'm just noticing is that it seems that the
> > > > list we have been working from since last term with the ranks is
> > > > different than what you just posted? This is the list we have been
> > > > working fromhttp://usefulchem.wikispaces.com/D-EXP014the V1 list of

Rajarshi

unread,
Feb 6, 2008, 7:21:30 PM2/6/08
to UsefulChem

On Feb 6, 7:02 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
>
> Rajarshi is right that we should use the IDs in the original libraries to be
> consistent because SMILES can be written in different ways and InChIs can be
> too long. But I wonder why not use InChIKeys as the identifers - Rajarshi
> any thoughts?

We could use InChI keys - but given that the compounds started with
serial ID's, it'd be tedious to reprocess the raw data to include the
keys. Since all the raw data uses the serial numbers, it's convenient
to talk in terms of them.

Also serial numbers would be the easiest to type out :)

Jean-Claude Bradley

unread,
Feb 6, 2008, 7:30:29 PM2/6/08
to usefu...@googlegroups.com
Using ChemSpider was part of that equation :)
I think we have that whole 71K library already in there, if I remember correctly

But even if they were not in ChemSpider, couldn't Rajarshi's algorithm calculate them from the SMILES?
We would just have to use the InChIKeys in every table from that point forward.


On Feb 6, 2008 7:13 PM, Tony at ChemSpider <tony...@gmail.com> wrote:

Using InChIKeys will certainly remove the SMILES differences but don't
forget that you cannot then convert the InChIKeys back to structures
and you will need to keep a table of InChiKeys and the original InChI
strings or SMILES so that you have the relationship captured
somewhere. You could batch deposit the structures onto ChemSPider and
we can issue the iNChIKeys to you (maybe they are already on
ChemSPider)? Best wishes






--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry
Drexel University

Jean-Claude Bradley

unread,
Feb 6, 2008, 7:35:49 PM2/6/08
to usefu...@googlegroups.com
True no point in redoing these - well for the next docking run we can consider it maybe
--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry

Rajarshi

unread,
Feb 6, 2008, 7:59:16 PM2/6/08
to UsefulChem


On Feb 6, 7:30 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
>
> But even if they were not in ChemSpider, couldn't Rajarshi's algorithm
> calculate them from the SMILES?
> We would just have to use the InChIKeys in every table from that point
> forward.

No need for my algorithm - just run OpenBabel on the SMILES and you
can get the InChI keys
Reply all
Reply to author
Forward
0 new messages