Several issues have been brought up by people doing modeling on the results from the Ugi Master Table. Just to make sure that everyone is involved in the conversation I'll address some points here:
2) Although anyone can access that page in HTML format it is more convenient to manipulate info in edit mode. Scroll to the bottom and click on "Edit this page" - this will let you sort on the page itself, add data or export as Excel or other format. You need to be invited as a collaborator to the GoogleDoc to do this - just ask me for an invite if you don't have access yet.
3) Khalid has just finished double checking the data for completion and correctness. If anything doesn't seem right please let us know (and fix it if possible). GoogleDocs now tracks changes for cells in spreadsheets.
4) Under the precipitation column, we have added the value "Reactants insoluble" in addition to YES and NO. Rajarshi has decided to merge the reactants insoluble with NO in his model because there are not many of these entries yet (about 10). This makes sense from the standpoint that they are both unsuccessful Ugi reactions but for different reasons. That is fine - lets just remember to be explicit when comparing models.
5) Under the precipitation column some entries are marked "YES" and some "YES?". I initially used the question mark because it was not unambiguously clear from one picture that there was a precipitate. This only applies to one image though and should not be used for any other indication. In order to indicate that the Ugi product has not been characterized fully, just leave the yield column blank until it is. The modelers are ignoring the question mark anyway so lets stop using it altogether.
6) There are many entries now that have no values under the precipitation column - specifically those for EXP174. All that means is that Khalid is about to do those experiments and will fill them in as data comes in. The only caution there is that if some of the experiments are aborted because of errors, the corresponding rows should be removed from the table.
7) John M has reported some problems with extra spaces getting added to the precipitation column when exporting to Excel. He was able to fix these manually - I don't think I have run into that problem though. If anyone has advice about this please let us know.
9) Finally, for anyone interested in modeling - we want to predict which of the following compounds (predicted by Rajarshi to dock with falcipain-2 V2 receptor and the next ones on our list to make) are likely to form precipitates:
I rebuilt a set of models (LDA, recursive prtitioning, random forest).
Individually none are prticularly good, so the final results are
derived from an ensemble model.
I considered experiements that had a "reactants insoluble"
classification as NO. The ensemble prediction on the 65 compound
training set is
ens.pred
depv No Yes
No 43 2
Yes 5 15
When I predict the new set of molecules, I get a number of YES's but
I'll only report those that were predicted to be soluble by 2 or more
models (out of 3).
By the way - the SMILES from the original libraries has the phenanthrene ring partially hydrogenated. In the master table I think all the entries are fixed. So there may be some error with respect to the predictions but I don't expect the differences to be very large.
On Wed, Apr 2, 2008 at 12:12 PM, Rajarshi <rajarshi.g...@gmail.com> wrote:
> I rebuilt a set of models (LDA, recursive prtitioning, random forest). > Individually none are prticularly good, so the final results are > derived from an ensemble model.
> I considered experiements that had a "reactants insoluble" > classification as NO. The ensemble prediction on the 65 compound > training set is
> ens.pred > depv No Yes > No 43 2 > Yes 5 15
> When I predict the new set of molecules, I get a number of YES's but > I'll only report those that were predicted to be soluble by 2 or more > models (out of 3).
I think its one of the central carbons of the PYRENE rings which is saturated. They were corrected in the master table (both in the component [1-pyrenebutyric acid] slot and in the corresponding Ugi products.
On Sat, Apr 5, 2008 at 11:13 AM, Jean-Claude Bradley <
> By the way - the SMILES from the original libraries has the phenanthrene > ring partially hydrogenated. In the master table I think all the entries > are fixed. So there may be some error with respect to the predictions but I > don't expect the differences to be very large.
> On Wed, Apr 2, 2008 at 12:12 PM, Rajarshi <rajarshi.g...@gmail.com> wrote:
> > I rebuilt a set of models (LDA, recursive prtitioning, random forest). > > Individually none are prticularly good, so the final results are > > derived from an ensemble model.
> > I considered experiements that had a "reactants insoluble" > > classification as NO. The ensemble prediction on the 65 compound > > training set is
> > ens.pred > > depv No Yes > > No 43 2 > > Yes 5 15
> > When I predict the new set of molecules, I get a number of YES's but > > I'll only report those that were predicted to be soluble by 2 or more > > models (out of 3).
> -- > Jean-Claude Bradley, Ph. D. > E-Learning Coordinator for the College of Arts and Sciences > Associate Professor of Chemistry > Drexel University
> I think its one of the central carbons of the PYRENE rings which is > saturated. They were corrected in the master table (both in the component > [1-pyrenebutyric acid] slot and in the corresponding Ugi products.
> On Sat, Apr 5, 2008 at 11:13 AM, Jean-Claude Bradley < > jeanclaude.brad...@gmail.com> wrote:
> > Rajarshi, > > I summarized your predictions on the blog
> > By the way - the SMILES from the original libraries has the phenanthrene > > ring partially hydrogenated. In the master table I think all the entries > > are fixed. So there may be some error with respect to the predictions but I > > don't expect the differences to be very large.
> > On Wed, Apr 2, 2008 at 12:12 PM, Rajarshi <rajarshi.g...@gmail.com> > > wrote:
> > > I rebuilt a set of models (LDA, recursive prtitioning, random forest). > > > Individually none are prticularly good, so the final results are > > > derived from an ensemble model.
> > > I considered experiements that had a "reactants insoluble" > > > classification as NO. The ensemble prediction on the 65 compound > > > training set is
> > > When I predict the new set of molecules, I get a number of YES's but > > > I'll only report those that were predicted to be soluble by 2 or more > > > models (out of 3).
> > -- > > Jean-Claude Bradley, Ph. D. > > E-Learning Coordinator for the College of Arts and Sciences > > Associate Professor of Chemistry > > Drexel University