Ugi Master Table issues

9 views
Skip to first unread message

Jean-Claude Bradley

unread,
Apr 2, 2008, 6:55:50 AM4/2/08
to usefu...@googlegroups.com
Several issues have been brought up by people doing modeling on the results from the Ugi Master Table.  Just to make sure that everyone is involved in the conversation I'll address some points here:

1) The table itself can be accessed publicly at:
http://spreadsheets.google.com/pub?key=plwwufp30hfpUERhse9y5Kw
On the wiki, http://usefulchem.wikispaces.com the link can be found from the CombiUgi page on the left navigation bar

2) Although anyone can access that page in HTML format it is more convenient to manipulate info in edit mode.  Scroll to the bottom and click on "Edit this page" - this will let you sort on the page itself, add data or export as Excel or other format.  You need to be invited as a collaborator to the GoogleDoc to do this - just ask me for an invite if you don't have access yet.

3) Khalid has just finished double checking the data for completion and correctness.  If anything doesn't seem right please let us know (and fix it if possible).  GoogleDocs now tracks changes for cells in spreadsheets.

4) Under the precipitation column, we have added the value "Reactants insoluble" in addition to YES and NO.  Rajarshi has decided to merge the reactants insoluble with NO in his model because there are not many of these entries yet (about 10).  This makes sense from the standpoint that they are both unsuccessful Ugi reactions but for different reasons.  That is fine - lets just remember to be explicit when comparing models.

5) Under the precipitation column some entries are marked "YES" and some "YES?".  I initially used the question mark because it was not unambiguously clear from one picture that there was a precipitate.  This only applies to one image though and should not be used for any other indication.  In order to indicate that the Ugi product has not been characterized fully, just leave the yield column blank until it is.  The modelers are ignoring the question mark anyway so lets stop using it altogether.

6) There are many entries now that have no values under the precipitation column - specifically those for EXP174.  All that means is that Khalid is about to do those experiments and will fill them in as data comes in.  The only caution there is that if some of the experiments are aborted because of errors, the corresponding rows should be removed from the table.

7) John M has reported some problems with extra spaces getting added to the precipitation column when exporting to Excel.  He was able to fix these manually - I don't think I have run into that problem though.  If anyone has advice about this please let us know.

8) With more people contributing to the project - for example:
http://usefulchem.blogspot.com/2008/04/were-gonna-ugi-all-night.html
we want to make sure that there is good quality control on this table.

9) Finally, for anyone interested in modeling - we want to predict which of the following compounds (predicted by Rajarshi to dock with falcipain-2 V2 receptor and the next ones on our list to make) are likely to form precipitates:

Cc1ccc(o1)CN(C(c2cccc(c2O)OC)C(=O)NC(C)(C)C)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cc3ccccc3c4c2cccc4)N(C)C(=O)c5c(cc(cc5O)O)O
CC(C)(C)NC(=O)C(c1cccc(c1O)OC)N(Cc2ccccc2)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3cccc(c3)Cl)C(=O)Cc4ccc5c(c4)OCO5
CC(C)(C)NC(=O)C(c1ccc2ccccc2c1)N(C)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc3ccccc3c2)N(Cc4ccco4)C(=O)Cc5ccc(c(c5)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cc(cc(c2)OC)OC)N(C)C(=O)c3c(cc(cc3O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3ccccc3)C(=O)Cc4ccc5c(c4)OCO5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(c(c2)O)O)N(C)C(=O)c3c(cc(cc3O)O)O
CC(C)(C)NC(=O)C(c1ccc(cc1)N(C)C)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CC(C)(C)NC(=O)C(c1cc2ccccc2c3c1cccc3)N(C)C(=O)CCCC4=CC=C5C=Cc6cccc7c6C5C4C=C7
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3ccccc3)C(=O)Cc4ccc(cc4)Cl
Cc1ccc(cc1)C(C(=O)NC(C)(C)C)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CC(C)(C)NC(=O)C(c1ccc(cc1)C(F)(F)F)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CCCCNC(=O)C(c1ccccc1O)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CCCCNC(=O)C(c1ccc(c(c1)O)O)N(C)C(=O)C(Cc2c[nH]c3c2cccc3)NC(=O)OC(C)(C)C
CC(C)(C)NC(=O)C(c1cc2ccccc2c3c1cccc3)N(Cc4ccco4)C(=O)c5cccc(c5O)O
CCCCNC(=O)C(c1cc(cc(c1)OC)OC)N(Cc2ccccc2)C(=O)C#Cc3ccccc3
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc3ccccc3c2)N(Cc4ccc(o4)C)C(=O)Cc5ccc(c(c5)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)O)N(C)C(=O)Cc3ccc(c(c3)O)O
CC(C)(C)NC(=O)C(c1ccc(c(c1)O)O)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CCCCNC(=O)C(c1ccc(cc1)O)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)O)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)Cl)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
CCCCCCCCCC(=O)N(c1ccccc1Cl)C(c2cccc(c2O)OC)C(=O)NCS(=O)(=O)c3ccc(cc3)C
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
CC(C)(C)NC(=O)C(c1cccc(c1)OC)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)C(C(=O)NCS(=O)(=O)c2ccc(cc2)C)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
CC(C)(C)NC(=O)C(c1ccc(cc1)O)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2)N(Cc3ccc(o3)C)C(=O)Cc4ccc(c(c4)O)O
CC(C)(C)NC(=O)C(c1ccc(c(c1)OC)OC)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)OC)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(C)C(=O)c4cc(ccc4O)[N+](=O)[O-]
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(Cc3ccco3)C(=O)CNC(=O)c4ccccc4
CCCCNC(=O)C(c1ccccc1O)N(C)C(=O)C(Cc2c[nH]c3c2cccc3)NC(=O)OC(C)(C)C
CC(C)(C)NC(=O)C(c1ccc(cc1)OC)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(c(c2)OC)OC)N(C)C(=O)Cc3ccc(c(c3)O)O
CCCN(C(c1ccc(cc1)C)C(=O)NCS(=O)(=O)c2ccc(cc2)C)C(=O)Cc3ccc(c(c3)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc3ccccc3c2)N(C)C(=O)c4c(cc(cc4O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(c(c2)O)O)N(c3ccccc3)C(=O)C(=C)C
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)Cl)N(Cc3ccc(o3)C)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(o1)CN(C(c2cc3ccccc3c4c2cccc4)C(=O)NC(C)(C)C)C(=O)Cc5ccc(c(c5)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)OC)N(C)C(=O)c3c(cc(cc3O)O)O
CCC(c1ccccc1)C(=O)N(c2cccc(c2)Cl)C(c3ccc(c(c3)O)O)C(=O)NCS(=O)(=O)c4ccc(cc4)C
CCCCNC(=O)C(c1cccc(c1)O)N(C)C(=O)C(Cc2c[nH]c3c2cccc3)NC(=O)OC(C)(C)C
Cc1ccc(o1)CN(C(c2cc(cc(c2)OC)OC)C(=O)NC(C)(C)C)C(=O)C3(CC(C(C(C3)O)O)O)O
CCCCNC(=O)C(c1ccc(c(c1)OC)OC)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)OC)N(Cc3ccc(o3)C)C(=O)Cc4ccc(c(c4)O)O
CCCC(=O)N(c1cccc(c1)Cl)C(c2ccccc2O)C(=O)NCS(=O)(=O)c3ccc(cc3)C
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3cccc(c3)Cl)C(=O)C(c4ccccc4)O
CCCCNC(=O)C(c1ccc(c2c1cc(cc2)OC)OC)N(C)C(=O)c3cc(ccc3O)[N+](=O)[O-]
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3ccccc3)C(=O)Cc4cc(ccc4OC)OC
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)OC)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(Cc4ccc(o4)C)C(=O)c5c(cccc5O)O
CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(Cc4ccco4)C(=O)c5c(cc(cc5O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(C)C(=O)C#Cc3ccccc3
CCCCC(CC)CCC(C)N(C(c1ccc(cc1)N(C)C)C(=O)NCS(=O)(=O)c2ccc(cc2)C)C(=O)C3(CC(C(C(C3)O)O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2C)N(C)C(=O)c3ccc(cc3)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(Cc3ccco3)C(=O)C#Cc4ccccc4
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2c(cc(cc2C)C)C)N(c3cccc(c3)Cl)C(=O)C(c4ccccc4)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cc(cc(c2)OC)OC)N(C)C(=O)c3ccc(cc3)O
CCCN(C(c1ccccc1O)C(=O)NCS(=O)(=O)c2ccc(cc2)C)C(=O)Cc3ccc(c(c3)O)O
CCCCCCN(C(c1ccc(c(c1)O)O)C(=O)NCS(=O)(=O)c2ccc(cc2)C)C(=O)C=C(C)C
CC(C)(C)NC(=O)C(c1cc(cc(c1)OC)OC)N(Cc2ccccc2)C(=O)C3(CC(C(C(C3)O)O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)O)N(C)C(=O)c3c(cc(cc3O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)O)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2O)OC)N(C)C(=O)Cc3ccc(c(c3)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc3ccccc3c2)N(c4ccccc4)C(=O)c5c(cc(cc5O)O)O
CCCCNC(=O)C(c1ccccc1C)N(c2cccc(c2)Cl)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2)N(Cc3ccccc3)C(=O)C(c4ccccc4)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)C(F)(F)F)N(C)C(=O)c3c(cccc3O)O
CC(C)(C)NC(=O)C(c1ccccc1)N(C)C(=O)C(Cc2c[nH]c3c2cccc3)NC(=O)OC(C)(C)C
CC(C)(C)NC(=O)C(c1ccc(c(c1)O)O)N(c2ccccc2)C(=O)Cc3ccc(c(c3)O)O
CCCCNC(=O)C(c1cc2ccccc2c3c1cccc3)N(Cc4ccco4)C(=O)C(Cc5ccccc5)O
CCCCNC(=O)C(c1ccc(c(c1)O)O)N(CCC)C(=O)C(c2ccccc2)c3ccccc3
CCCCNC(=O)C(c1cccc(c1)OC)N(Cc2ccc(o2)C)C(=O)c3c(cc(cc3O)O)O
CCCCCCN(C(c1ccccc1O)C(=O)NC(C)(C)C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CC(C)(C)NC(=O)C(c1cc2ccccc2c3c1cccc3)N(Cc4ccco4)C(=O)c5c(cc(cc5O)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)OC)N(c3ccccc3)C(=O)c4c(cc(cc4O)O)O
CC(C)(C)NC(=O)C(c1ccc2ccccc2c1)N(c3ccccc3Cl)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(c(c2)O)O)N(Cc3ccc(o3)C)C(=O)Cc4ccc(c(c4)O)O
CCCCNC(=O)C(c1ccccc1)N(c2ccccc2)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
CCCCNC(=O)C(c1cccc(c1)OC)N(Cc2ccccc2)C(=O)C(c3ccccc3)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(c3cccc(c3)Cl)C(=O)Cc4cc(ccc4OC)OC
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)O)N(C)C(=O)c3ccc(cc3)O
Cc1ccc(c(c1)C)C(C(=O)NC(C)(C)C)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
CCCCNC(=O)C(c1ccc(c(c1)O)O)N(Cc2ccc(o2)C)C(=O)C(c3ccccc3)c4ccccc4
CCCCCC(=O)N(c1cccc(c1)Cl)C(c2ccccc2O)C(=O)NCS(=O)(=O)c3ccc(cc3)C
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(c(c2)O)O)N(C)C(=O)c3ccc(cc3)[N+](=O)[O-]
CCCCNC(=O)C(c1c(cccc1Cl)C)N(C)C(=O)C(Cc2c[nH]c3c2cccc3)NC(=O)OC(C)(C)C
CCCCCCCCCC(=O)N(c1ccccc1Cl)C(c2cccc(c2)OC)C(=O)NCS(=O)(=O)c3ccc(cc3)C
CCCCCC(=O)N(c1ccccc1)C(c2ccccc2O)C(=O)NCS(=O)(=O)c3ccc(cc3)C
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cc(cc(c2)OC)OC)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2c(cccc2Cl)C)N(c3cccc(c3)Cl)C(=O)C(c4ccccc4)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2cccc(c2)OC)N(C)C(=O)c3ccc(cc3)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccccc2O)N(Cc3ccco3)C(=O)Cc4ccc(c(c4)O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)O)N(C)C(=O)c3cccc(c3O)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)OC)N(c3ccccc3Cl)C(=O)c4ccc(cc4)O
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)Cl)N(C)C(=O)C3(CC(C(C(C3)O)O)O)O




--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry
Drexel University

http://drexel-coas-elearning.blogspot.com
http://drexel-coas-talks-mp3-podcast.blogspot.com/
http://usefulchem.blogspot.com

Rajarshi

unread,
Apr 2, 2008, 12:12:53 PM4/2/08
to UsefulChem
I rebuilt a set of models (LDA, recursive prtitioning, random forest).
Individually none are prticularly good, so the final results are
derived from an ensemble model.

I considered experiements that had a "reactants insoluble"
classification as NO. The ensemble prediction on the 65 compound
training set is

ens.pred
depv No Yes
No 43 2
Yes 5 15


When I predict the new set of molecules, I get a number of YES's but
I'll only report those that were predicted to be soluble by 2 or more
models (out of 3).

Molecules: 6,14,19,72,83,87

smiles id
CC(C)
(C)NC(=O)C(c1ccc2ccccc2c1)N(C)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6 6
Cc1ccc(cc1)C(C(=O)NC(C)(C)C)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5
14
CCCCNC(=O)C(c1cc(cc(c1)OC)OC)N(Cc2ccccc2)C(=O)C#Cc3ccccc3 19
Cc1ccc(cc1)S(=O)(=O)CNC(=O)C(c2ccc(cc2)C(F)(F)F)N(C)C(=O)c3c(cccc3O)O
72
CCCCNC(=O)C(c1ccccc1)N(c2ccccc2)C(=O)CCCC3=CC=C4C=Cc5cccc6c5C4C3C=C6
83
Cc1ccc(c(c1)C)C(C(=O)NC(C)
(C)C)N(C)C(=O)CCCC2=CC=C3C=Cc4cccc5c4C3C2C=C5 87


Jean-Claude Bradley

unread,
Apr 5, 2008, 11:13:03 AM4/5/08
to usefu...@googlegroups.com
Rajarshi,
I summarized your predictions on the blog
http://usefulchem.blogspot.com/2008/04/ugi-precipitation-predictions.html

By the way - the SMILES from the original libraries has the phenanthrene ring partially hydrogenated.  In the master table I think all the entries are fixed.  So there may be some error with respect to the predictions but I don't expect the differences to be very large.

Khalid Mirza

unread,
Apr 5, 2008, 11:43:40 AM4/5/08
to usefu...@googlegroups.com
I think its one of the central carbons of the PYRENE rings which is saturated. They were corrected in the master table (both in the component [1-pyrenebutyric acid] slot and in the corresponding Ugi products.
--
Khalid Mirza
Graduate Student
The Bradley Research Group
Department of Chemistry
Drexel University
usefulchem.wikispaces.com

Jean-Claude Bradley

unread,
Apr 5, 2008, 11:48:45 AM4/5/08
to usefu...@googlegroups.com
Thanks Khalid
Reply all
Reply to author
Forward
0 new messages