info for CombiUgi collaborators (Crowdsourcing Chemistry)

3 views
Skip to first unread message

Jean-Claude Bradley

unread,
Dec 26, 2007, 3:22:34 PM12/26/07
to usefu...@googlegroups.com
I recently posted a request for collaborators (and contributors of any type really) on a simple exercise of predicting which products should precipitate from our Ugi reactions:
http://usefulchem.blogspot.com/2007/12/chemistry-crowdsourcing-with-open.html

Rajarshi has posted some comments that point to some difficulties in currently doing this in the format presented in the post.  I am going to move the conversation to the UsefulChem mailing list, where it is easier to hammer out the details and get this working.

The first request is very reasonably - where are the SMILES lists?

We do have the UsefulChem identifiers of where the compounds are made and the common names of the starting materials in the master table but that is pretty unfriendly for Rajarshi and other people just wanting to crunch numbers.  So I've added a few more columns to the table, including  one for the SMILES of the final product.  We should also add 4 more columns for the SMILES of the starting materials (aldehyde, acid, amine and isonitrile). 

The SMILES should not be that difficult to find in most cases because each molecule in the lab wiki should be linked to a blog entry or a ChemSpider entry with the SMILES data.  But the wiki pages have not all been properly updated so there may be some missing.  Anyway, our lab members should update the table so nobody else has to worry about any of this. 

Khalid, you can probably do this more easily - if you are around could you fill in the missing data?

The master table is here:
http://spreadsheets.google.com/pub?key=plwwufp30hfpUERhse9y5Kw

Anyone who would like to edit it just ask me.

Thanks!

--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry
Drexel University

http://drexel-coas-elearning.blogspot.com
http://drexel-coas-talks-mp3-podcast.blogspot.com/
http://usefulchem.blogspot.com

Jean-Claude Bradley

unread,
Dec 30, 2007, 6:51:47 PM12/30/07
to usefu...@googlegroups.com
OK - I updated the master table so that all starting materials have SMILES (ChemSpider made that easy from the names)
I also normalized the chemical names so that when you sort them the same compounds have exactly the same names.

Rajarshi - I didn't have time yet to update the Ugi products SMILES - can you do that on your end to do your QSAR model since you have written the algorithm already? 

The "published version" of the master table is here:
http://spreadsheets.google.com/pub?key=plwwufp30hfpUERhse9y5Kw

Anyone can access it but it is a little awkward in that format since it is not possible to sort or change column widths.  Hitting control-A then copying and pasting to Excel should work though.  It is much easier to use the "edit" mode which is in the bottom right of the published version page.  However, you need to be a collaborator on that GoogleDoc to do that - if anyone who doesn't have access wants it just ask.

So the basic problem comes down to this:

In the master table we have 10 Ugi products positive for precipitation and 20 negative
Can we come up with a model (QSAR) to predict the behavior of the 71,442 products in library3
http://usefulchem.wikispaces.com/UClib003
based on the molecular descriptors of the 30 products in the master table?

Rajarshi

unread,
Dec 30, 2007, 7:29:13 PM12/30/07
to UsefulChem
On Dec 30, 6:51 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:

> Rajarshi - I didn't have time yet to update the Ugi products SMILES - can
> you do that on your end to do your QSAR model since you have written the
> algorithm already?

Ok, I should be able to get that done - it'll probably be after the
5th Jan, though

> In the master table we have 10 Ugi products positive for precipitation and
> 20 negative
> Can we come up with a model (QSAR) to predict the behavior of the 71,442
> products in library3http://usefulchem.wikispaces.com/UClib003
> based on the molecular descriptors of the 30 products in the master table?

Well, it's certainly doable. The question that must be asked is: does
it make sense to use the model which is built on 30 examples to
predict 71K other compounds. Now, given that the 71K compounds are
derived from a small library of reagents, the diversity on the final
library may not be too high (an interesting side project), in which
case predictions may be reliable.

But building a model with 30 examples is tricky - it restricts us to
linear regression, which may not be too bad since, modeling physical
properties is (usually) a simpler task than biological properties. But
it's something that we will need to see.

Of course, more data will always be much appreciated :)

Jean-Claude Bradley

unread,
Dec 30, 2007, 7:45:26 PM12/30/07
to usefu...@googlegroups.com
Rajarshi - after the 5th is fine - thanks!
If the whole list of 71K is too ambitious, you can take a subset - maybe just all the Ugi products that can be made from the starting materials already in the master table?  It would be preferable for you to select a reaction space where have better confidence and can suggest key reactions that would challenge your model efficiently.  When the students come back from break they can include those reactions in addition to the falcipain-2 targets.

Jean-Claude Bradley

unread,
Jan 1, 2008, 9:10:31 PM1/1/08
to usefu...@googlegroups.com
Rajarshi,
I am assuming that you are going to use molecular descriptors for your QSAR analysis.  Why not put those descriptors directly in the master table?  That way others could more readily apply their own models.
It turns out that ChemSpider automatically calculates a number of these descriptors and for factors that would affect solubility in methanol I would think that logP, number of H-bonding sites, polar surface area are among those of interest.    Which ones were you going to use?
For example see:
http://www.chemspider.com/RecordView.aspx?id=21105565

It would be nice at some point to have these automatically generated on a large scale but it might be tricky to make all of them public.  That may in fact be a deciding point on which descriptors we choose to work with to remain fully open at all times. 

But in the meantime, Tony agreed that we could manually take 30-50 values for the descriptors calculated on the ChemSpider pages.  Maybe he can comment on the licensing restrictions beyond that.  Still that should be enough to get us started here.

So I created another column for a link to the ChemSpider entry and a column for LogP.  I'm asking all of our UsefulChem lab members (or anyone with an interest in contributing) who are available to help fill out this table.  It will require creating records in ChemSpider in most cases and that is pretty intuitive now.

Egon Willighagen

unread,
Jan 1, 2008, 9:24:03 PM1/1/08
to usefu...@googlegroups.com
On Jan 2, 2008 3:10 AM, Jean-Claude Bradley

<jeanclaud...@gmail.com> wrote:
> I am assuming that you are going to use molecular descriptors for your QSAR
> analysis. Why not put those descriptors directly in the master table? That
> way others could more readily apply their own models.

That is impossible unless the model was build using those descriptors.
A model build with descriptors from software X version Y, cannot be
used with a model build from descriptors calculated with software X2
version Z.

> It turns out that ChemSpider automatically calculates a number of these
> descriptors and for factors that would affect solubility in methanol I would
> think that logP, number of H-bonding sites, polar surface area are among
> those of interest.

I was told in the past (from the source), that these values are *not*
open data... so, you cannot use those. This may have changed since
then, though... consequently, those values cannot be used in ONS.
Antony, if you are listening in... please give an update on that.

> Which ones were you going to use?
> For example see:
> http://www.chemspider.com/RecordView.aspx?id=21105565
>
> It would be nice at some point to have these automatically generated on a
> large scale but it might be tricky to make all of them public. That may in
> fact be a deciding point on which descriptors we choose to work with to
> remain fully open at all times.

That's where CDK/JOELib comes in...

> But in the meantime, Tony agreed that we could manually take 30-50 values
> for the descriptors calculated on the ChemSpider pages. Maybe he can
> comment on the licensing restrictions beyond that. Still that should be
> enough to get us started here.

Ah, there's the confirmation of my above comment... :)

BTW, don't expect much from a QSAR/QSPR model of <100 values... or
hardly anything serious at all. So, I see a strong conflict here...

> So I created another column for a link to the ChemSpider entry and a column
> for LogP. I'm asking all of our UsefulChem lab members (or anyone with an
> interest in contributing) who are available to help fill out this table. It
> will require creating records in ChemSpider in most cases and that is pretty
> intuitive now.

I really suggest using the CDK/JOELib for this... possibly via Bioclipse:

http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joelib-extension.html

If Rajarshi does not do it before I do, I'll probably have time to do
this late next week.

But again, without at least some 100 compounds, these models will most
likely be practically meaningless... too much numerical freedom to
make a regression... let alone, any meaningful correlation... not with
these kind of general descriptors, anyway...

Egon

--
----
http://chem-bla-ics.blogspot.com/

Jean-Claude Bradley

unread,
Jan 1, 2008, 9:56:48 PM1/1/08
to usefu...@googlegroups.com
Egon - thanks for the quick feedback

On Jan 1, 2008 9:24 PM, Egon Willighagen <egon.wil...@gmail.com> wrote:

That is impossible unless the model was build using those descriptors.
A model build with descriptors from software X version Y, cannot be
used with a model build from descriptors calculated with software X2
version Z.
Really?  Aren't the descriptors like logP and polar surface area independent of any statistical analysis?  Or are you saying that those are not the descriptors actually used in QSAR software packages?


BTW, don't expect much from a QSAR/QSPR model of <100 values... or
hardly anything serious at all. So, I see a strong conflict here...
 
Well that's why I'm having the discussion with people who have experience.  The experiments are fairly quick to perform once students get the hang of it.  Hopefully we'll reach that 100 mark this term. 


> So I created another column for a link to the ChemSpider entry and a column
> for LogP.  I'm asking all of our UsefulChem lab members (or anyone with an
> interest in contributing) who are available to help fill out this table.  It
> will require creating records in ChemSpider in most cases and that is pretty
> intuitive now.

I really suggest using the CDK/JOELib for this... possibly via Bioclipse:

http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joelib-extension.html
Awesome.  Yes the more we can get calculations from CDK the better


If Rajarshi does not do it before I do, I'll probably have time to do
this late next week.
Thanks!


But again, without at least some 100 compounds, these models will most
likely be practically meaningless... too much numerical freedom to
make a regression... let alone, any meaningful correlation... not with
these kind of general descriptors, anyway...
At least if we have the models in place we can keep running them as we get more experiments done.


Rajarshi

unread,
Jan 1, 2008, 10:27:00 PM1/1/08
to UsefulChem

On Jan 1, 2008, at 9:10 PM, Jean-Claude Bradley wrote:

Rajarshi,
I am assuming that you are going to use molecular descriptors for your
QSAR analysis.

Correct

Why not put those descriptors directly in the master table? That way
others could more readily apply their own models.

I could do that, but beyond rebuilding the model with those compounds
I don't see much utility. More importantly, without access to the
actual descriptor program (code), you can't do much with those values
for new compounds

It turns out that ChemSpider automatically calculates a number of
these descriptors and for factors that would affect solubility in
methanol I would think that logP, number of H-bonding sites, polar
surface area are among those of interest. Which ones were you going
to use?

I was going to use some version of logP, at least initially - however
I'm always a little wary of logP since many are parametrized and once
a compound goes beyond the parametrization, the results become
unreliable. Furthermore I think the ChemSpider logP is based on (or
actually is) the ACD logP which is based on a statistical model - I
would rather not use the results of a statistical model to build
another statistical model.

I will be using the CDK descriptors - and include constitutional and
topological descriptors (since we're working with SMILES) and run them
through a feature selection algorithm to get several models, which I
can then investigate

It would be nice at some point to have these automatically generated
on a large scale but it might be tricky to make all of them public.
That may in fact be a deciding point on which descriptors we choose to
work with to remain fully open at all times.

We can certainly calculate the descriptors for all the compounds and
hopefully sometime soon I'll update our descriptor web service and
provide REST like wrappers (this was something that Ola had requested)
- so it should be usable.


But in the meantime, Tony agreed that we could manually take 30-50
values for the descriptors calculated on the ChemSpider pages. Maybe
he can comment on the licensing restrictions beyond that. Still that
should be enough to get us started here.

The CDK calculates a wide variety of well known descriptors, so I'll
go with that


-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------

Q: What is a dyslexic, agnostic, insomniac?
A: Someone who lays awake at night wondering if there really is a dog!

Rajarshi

unread,
Jan 1, 2008, 10:30:25 PM1/1/08
to UsefulChem

On Jan 1, 2008, at 9:24 PM, Egon Willighagen wrote:


On Jan 2, 2008 3:10 AM, Jean-Claude Bradley
<jeanclaud...@gmail.com> wrote:

BTW, don't expect much from a QSAR/QSPR model of <100 values... or
hardly anything serious at all. So, I see a strong conflict here...

Well, if you stick to OLS, less than a hundred compounds is not too
bad. Granted 30 is pushing it, but the caveat is that one must make
predictions very carefully, keeping in mind the domain of the model

-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Psychology is merely producing habits out of rats.

Rajarshi

unread,
Jan 1, 2008, 10:36:26 PM1/1/08
to UsefulChem

On Jan 1, 2008, at 9:56 PM, Jean-Claude Bradley wrote:


> Really? Aren't the descriptors like logP and polar surface area independent of any statistical analysis?

logP calculations are done either by parametrization (atom or group
additive versions such as CLogP or XlogP) or else by statistical
models (the ACD logP)

> Or are you saying that those are not the descriptors actually used in QSAR software packages?

QSAR packages use one of the above types.

>Well that's why I'm having the discussion with people who have experience. The experiments are fairly quick to perform once >
students get the hang of it. Hopefully we'll reach that 100 mark this
term.

Excellent

-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Eureka!
-- Archimedes

Tony at ChemSpider

unread,
Jan 1, 2008, 10:56:42 PM1/1/08
to UsefulChem
Couple of comments....

logP as calculated by ACD/logP is an octanol/water partition
coefficient. Doesn't mean that you can use the number as an input to
some model but methanol is not octanol for sure...and try and get
methanol and water to form two layers the way that octanol and water
do. For fun read this one... http://pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/2005/45/i02/abs/ci049643e.html
. Rick PREDICTED shifts in methanol using ACD/CNMR predictor, then
said they would be similar to those in octanol. Then he PREDICTED the
shifts in water using ACD/CNMR predictor. Then he compared predicted
shifts in octanol (or methanol in this case) compared with predicted
shifts in water and used the differences in order to form a
correlation with LogP. Aagagahahah!

So...not sure for the reason to use logP at all. It's not about
solubility.....it's about partitioning.

In any case...regarding logP values...I believe they are the best
commercially available and likely outperform any Open Source
applications today.. http://www.acdlabs.com/products/phys_chem_lab/logp/competit.html
.

It's free... http://www.acdlabs.com/download/logp.html but the batch
version is not. The algorithms are proprietary but there is enough
general info out there.... http://www.acdlabs.com/download/publ/qsarpost1.pdf

In regards to using the values from ChemSpider...yes, they can be
used. I've covered this on the blog a number of times. See the FAQ:

RIGHTS TO USE

Am I allowed to use the data I find on ChemSpider in a publication ?

Yes, all data displayed onscreen at ChemSpider can be used in a
publication.

May I use your service in my teaching class ?

Absolutely. We would especially like the academic community to benefit
from the information available on ChemSpider.

May I download the data and use it in my own database(s)?

You have limited rights in this regard. You can only assemble a
database of 5000 structures or less, and their associated properties,
from our database without our permission. You can download up to 1000
structures per day from the website. Please contact us at
feedbackATchemspiderDOTcom to request an extension outside this
constraint. We are willing to provide the ENTIRE database of
ChemSpider structures at your request - the file will consist of InChI
Strings, InChIKeys and ChemSpider IDs. These constraints are under
regular review so please feel free to engage us in conversation.






On Jan 1, 9:24 pm, "Egon Willighagen" <egon.willigha...@gmail.com>
wrote:
> On Jan 2, 2008 3:10 AM, Jean-Claude Bradley
>
> http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joeli...

Tony at ChemSpider

unread,
Jan 1, 2008, 11:02:19 PM1/1/08
to UsefulChem
Different logP prediction algorithms from different groups will give
different values. For the common molecules...octane, benzene, blah,
blah, they will all be within error but "meaningful compounds" will
likely be very different. Also, take care with logP versus logD
distinctions. Most people forget that. LogP numbers whether from the
CDK or from ChemSpider (either XlogP, AlogPS or ACD/LogP (we have
THREE logP predictors on there!) can be part of your descriptor feed
and its all a matter of whether you are after accurate predictions or
segregation. As I recall you are trying to derive segregation -
precipitates or not.

On Jan 1, 9:56 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:
> Egon - thanks for the quick feedback
>
> On Jan 1, 2008 9:24 PM, Egon Willighagen <egon.willigha...@gmail.com> wrote:
>
>
>
> > That is impossible unless the model was build using those descriptors.
> > A model build with descriptors from software X version Y, cannot be
> > used with a model build from descriptors calculated with software X2
> > version Z.
>
> Really? Aren't the descriptors like logP and polar surface area independent
> of any statistical analysis? Or are you saying that those are not the
> descriptors actually used in QSAR software packages?
>
>
>
> > BTW, don't expect much from a QSAR/QSPR model of <100 values... or
> > hardly anything serious at all. So, I see a strong conflict here...
>
> Well that's why I'm having the discussion with people who have experience.
> The experiments are fairly quick to perform once students get the hang of
> it. Hopefully we'll reach that 100 mark this term.
>
>
>
>
>
> > > So I created another column for a link to the ChemSpider entry and a
> > column
> > > for LogP. I'm asking all of our UsefulChem lab members (or anyone with
> > an
> > > interest in contributing) who are available to help fill out this table.
> > It
> > > will require creating records in ChemSpider in most cases and that is
> > pretty
> > > intuitive now.
>
> > I really suggest using the CDK/JOELib for this... possibly via Bioclipse:
>
> >http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joeli...
>
> Awesome. Yes the more we can get calculations from CDK the better
>
>
>
> > <http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joeli...><http://spreadsheets.google.com/pub?key=plwwufp30hfpUERhse9y5Kw>

Egon Willighagen

unread,
Jan 2, 2008, 4:39:33 AM1/2/08
to usefu...@googlegroups.com
On Jan 2, 2008 3:56 AM, Jean-Claude Bradley

<jeanclaud...@gmail.com> wrote:
> Egon - thanks for the quick feedback
>
> On Jan 1, 2008 9:24 PM, Egon Willighagen <egon.wil...@gmail.com> wrote:
> >
> > That is impossible unless the model was build using those descriptors.
> > A model build with descriptors from software X version Y, cannot be
> > used with a model build from descriptors calculated with software X2
> > version Z.
>
> Really? Aren't the descriptors like logP and polar surface area independent
> of any statistical analysis? Or are you saying that those are not the
> descriptors actually used in QSAR software packages?

Theoreticallly they are... practically they are not. LogP in
particular often itself is a QSPR model.
Even surface area requires the use of (numerically) identical atomic
radii... and of course
things like atom type perception, etc, etc One simple minor bug in any
of these will change the values.

This is why projects like the BODR and the chemoinformatics algorithm
ontology are important.

> > BTW, don't expect much from a QSAR/QSPR model of <100 values... or
> > hardly anything serious at all. So, I see a strong conflict here...
>
> Well that's why I'm having the discussion with people who have experience.
> The experiments are fairly quick to perform once students get the hang of
> it. Hopefully we'll reach that 100 mark this term.

Indeed.

> > > So I created another column for a link to the ChemSpider entry and a
> > > column for LogP. I'm asking all of our UsefulChem lab members (or anyone
> > > with an interest in contributing) who are available to help fill out this table.
> > > It will require creating records in ChemSpider in most cases and that is
> > > pretty intuitive now.
> >
> > I really suggest using the CDK/JOELib for this... possibly via Bioclipse:
> >
> http://chem-bla-ics.blogspot.com/2007/10/more-qsar-in-bioclipse-joelib-extension.html
>
> Awesome. Yes the more we can get calculations from CDK the better

It is possible to integrate Dragon too, but without funding I do not
have time for that.

Egon Willighagen

unread,
Jan 2, 2008, 4:45:28 AM1/2/08
to usefu...@googlegroups.com
On Jan 2, 2008 4:56 AM, Tony at ChemSpider <tony...@gmail.com> wrote:
> In regards to using the values from ChemSpider...yes, they can be
> used. I've covered this on the blog a number of times. See the FAQ:

They are covered by the same rules/copyright as any other data?
So, UsefulChem can set up a ACD/Labs LogP database for any compound
they are interested in, under an OpenData license?

That's useful for the CDK too, for benchmark purposes. Tony, what
about the experimental values? ACD/Labs extracted a database from
literature I assume?

Tony at ChemSpider

unread,
Jan 2, 2008, 10:34:01 AM1/2/08
to UsefulChem
In my opinion they are covered under the same rules yes. The values
are predicted values available online for review and can be used by
UsefulChem to set up a database of compounds and predicted ACD/LogP
values. I will confirm this is all acceptable to ACD/Labs (I'd be
shocked if it wasn't) offline in a conversation with JC and the
PhysChem product manager ...lives just down the road from JC.

Experimental values for the derivation of the algorithms were both
extracted from the literature (the data are VERY carefully curated by
experts) and also from collaborations:
Version 11 added a lot of new data
http://www.acdlabs.com/products/phys_chem_lab/logp/new.html

New Model for LogP Prediction

A generation model for prediction of logP is introduced which combines
data for ~12,000 compounds from ACD/Labs' internal training set with
new experimental data from >13,000 pharmaceutical lead compounds. All
internal fragments and incremental values were recreated using the new
combined dataset to provide improved predictions for compounds of
pharmaceutical interest, without sacrificing quality of prediction for
non-pharmaceuticals. Changes to the logP model also impact logD and
solubility modules due to the inclusion of logP in these models.


On Jan 2, 4:45 am, "Egon Willighagen" <egon.willigha...@gmail.com>
wrote:

Jean-Claude Bradley

unread,
Jan 2, 2008, 8:00:55 PM1/2/08
to usefu...@googlegroups.com
On Jan 1, 2008 11:02 PM, Tony at ChemSpider <tony...@gmail.com> wrote:

Different logP prediction algorithms from different groups will give
different values. For the common molecules...octane, benzene, blah,
blah, they will all be within error but "meaningful compounds" will
likely be very different. Also, take care with logP versus logD
distinctions. Most people forget that. LogP numbers whether from the
CDK or from ChemSpider (either XlogP, AlogPS or ACD/LogP (we have
THREE logP predictors on there!) can be part of your descriptor feed
and its all a matter of whether you are after accurate predictions or
segregation. As I recall you are trying to derive segregation -
precipitates or not.

Yes - the objective is to come up with a model to predict precipitation - or at least suggest which reactions we should carry out next to efficiently get to a reliable model.  I don't know what the relationship should be between logP and solubility in methanol.  I threw it out there as a starting point.  From what I gather from the discussion so far it is difficult to predict which descriptors are relevant for solubility.  I look forward to seeing how Rajarshi tackles this and if anything at all useful can be derived from 30 experiments.


--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry
Drexel University

Tony at ChemSpider

unread,
Jan 2, 2008, 10:13:11 PM1/2/08
to UsefulChem
Solubility is not easy to predict. The pharma industry has been
looking for a a good aqueous solubility predictor for years. I was
involved in the development of one for a few years -
http://www.acdlabs.com/products/phys_chem_lab/aqsol/

There are others out there but I am not aware of a good open source
one and certainly not one for methanol since there isn't a good
training set to work from...the measurement of solubility is
inherently problematic.

Since you are looking for soluble/insoluble it's easier but the
training set you have is small. I look forward to seeing Rajarshi's
progress with it. Since Greg is so close to you you might be
interested in his opinion on this since he spent years in the pharma
industry and has led the Aq Sol product development since joining ACD/
Labs. I promise you he won't be trying to sell you something...he
deeply cares about academia and was teaching at a local college in
Philly (and still teaches there once in awhile) and will be more than
willing to help.

On Jan 2, 8:00 pm, "Jean-Claude Bradley"
<jeanclaude.brad...@gmail.com> wrote:

Jean-Claude Bradley

unread,
Jan 3, 2008, 6:54:24 AM1/3/08
to usefu...@googlegroups.com

On Jan 2, 2008 10:13 PM, Tony at ChemSpider <tony...@gmail.com> wrote:

Solubility is not easy to predict. The pharma industry has been
looking for a a good aqueous solubility predictor for years. I was
involved in the development of one for a few years -
http://www.acdlabs.com/products/phys_chem_lab/aqsol/

I certainly welcome your thoughts then as we progress on this.

There are others out there but I am not aware of a good open source
one and certainly not one for methanol since there isn't a good
training set to work from...the measurement of solubility is
inherently problematic.
 
That means this is not a trivial problem and, even if we don't get to a perfect solution, perhaps others can benefit from everyone's open contributions.


Since you are looking for soluble/insoluble it's easier but the
training set you have is small. I look forward to seeing Rajarshi's
progress with it. Since Greg is so close to you you might be
interested in his opinion on this since he spent years in the pharma
industry and has led the Aq Sol product development since joining ACD/
Labs. I promise you he won't be trying to sell you something...he
deeply cares about academia and was teaching at a local college in
Philly (and still teaches there once in awhile) and will be more than
willing to help.

Sure I would be happy to get together with Greg - thanks for the introduction!




--
Jean-Claude Bradley, Ph. D.
E-Learning Coordinator for the College of Arts and Sciences
Associate Professor of Chemistry
Drexel University

Reply all
Reply to author
Forward
0 new messages