Descriptor Calculation

94 views
Skip to first unread message

Paul Gleeson

unread,
May 22, 2012, 9:42:06 PM5/22/12
to licss...@googlegroups.com
Dear Kevin,
I have had a play with the tool and I am very impressed. It is a very powerful, free/open spreadsheet tool that can do a similar job as expensive cheminformatics software. While I have not tried large datasets, I am sure the program will be particularly useful for academics and students, and I will certainly try to incorporate it into my lectures/labs. Some thoughts I have are:

(1) Would it be possible to output the results from the descriptor calculation as numerical data. I notice that currently the results will write both text and numerical values in the same cell (i.e. ALOGP gives "ALogP: 0.0; ALogp2: 0.0; AMR: 0.0". Could this be split over 3 columns so that additional manipulation is needed to allow values to be plotted for example?
(2) Would it be possible to add an option to calculate all descriptors or a basic set of descriptors. At present I think one needs to manually select them all.
(3) Looking to the future, would it be very difficult to create the same functionality, but as an Excel addin, with a standalone menu option on the menu bar (like chemaxon software). The fact that one has to alter the VBA security setting to get the software to work initially is no more difficult than adding a custom addin to Excel.

Ps, in Excel 2007 one appears to needs to save the workbook as a .xlsm file to allow the macros to be saved in the spreadsheet. Otherwise the functionality is not present when one reopens the spreadsheet.  

Kind regards,

Paul

Kevin Lawson

unread,
May 23, 2012, 4:39:38 AM5/23/12
to licss...@googlegroups.com

On Wednesday, May 23, 2012 2:42:06 AM UTC+1, Paul Gleeson wrote:
Dear Kevin,
I have had a play with the tool and I am very impressed. It is a very powerful, free/open spreadsheet tool that can do a similar job as expensive cheminformatics software. While I have not tried large datasets, I am sure the program will be particularly useful for academics and students, and I will certainly try to incorporate it into my lectures/labs. Some thoughts I have are:
Thanks very much for your +ve feedback, Paul

(1) Would it be possible to output the results from the descriptor calculation as numerical data. I notice that currently the results will write both text and numerical values in the same cell (i.e. ALOGP gives "ALogP: 0.0; ALogp2: 0.0; AMR: 0.0". Could this be split over 3 columns so that additional manipulation is needed to allow values to be plotted for example?
 
In principle this would be do-able and a good idea - I will have a look at it.  Incidentally, have you explored the 'Insert CDK Descriptor Formula for Selection' button?  This inserts a formula to calculate any of the CDK descriptors which you can fill-down a column and then Copy and Paste-Special Values to convert to raw numbers
 
(2) Would it be possible to add an option to calculate all descriptors or a basic set of descriptors. At present I think one needs to manually select them all.
 
I think that ideally it would be good to provide check-boxes for each.  It would be a bit fiddly to program however - I'll take a look
 
(3) Looking to the future, would it be very difficult to create the same functionality, but as an Excel addin, with a standalone menu option on the menu bar (like chemaxon software). The fact that one has to alter the VBA security setting to get the software to work initially is no more difficult than adding a custom addin to Excel.
 
Actually, it would have been much easier to program it all as an addin but there were particular reasons why I didn't want to do that:
  • A key design criterium for me was that the enabled workbooks would be shareable, without any manual installation on each user's part.  The present system achieves that because only the creator of the workbook needs to have programatic access to the vba project model.  For people with whom it is shared, installation takes place seamlessly - either via a network installation (if present) or via the internet.  I envisaged, for example, corporate users being able to share with each other (almost whatever the corporate security settings were) and with collaborators.  Automatic installation of addins isn't possible as far as I know (except with code on each workbook which brings us back to the same issue)
  • Addins place a load on Excel generally and not just their target spreadsheets.  For example, to provide features like the structures appearing when hovering over chart data points it would probably be necessary to track and react to all mouse movements across all workbooks.  Similarly, the structure-handling software in some Excel addins requires interception of Excel's main calculation routine which slows things down.  The 'LI' in LICSS stands for LIghtweight - LICSS is not meant to be a 'selfish' program!
Btw: I wrote more about LICSS design philosophy in a paper for J Chem Info which you may be interested in: http://www.jcheminf.com/content/4/1/3

Ps, in Excel 2007 one appears to needs to save the workbook as a .xlsm file to allow the macros to be saved in the spreadsheet. Otherwise the functionality is not present when one reopens the spreadsheet. 
 
Yes, that's true, 'though I usually use the old XLS format for backwards compatability
 
Best wishes
 
Kevin

Noel O'Boyle

unread,
May 23, 2012, 4:48:45 AM5/23/12
to licss...@googlegroups.com
On 23 May 2012 09:39, Kevin Lawson <klaws...@gmail.com> wrote:
>
> On Wednesday, May 23, 2012 2:42:06 AM UTC+1, Paul Gleeson wrote:

>> (2) Would it be possible to add an option to calculate all descriptors or
>> a basic set of descriptors. At present I think one needs to manually select
>> them all.
>
>
> I think that ideally it would be good to provide check-boxes for each.  It
> would be a bit fiddly to program however - I'll take a look

Just FYI Kevin, I have some (Python) code in Cinfony that might be of
interest here. If you look at the function _getdescdict() at line 54
of cdk.py (https://github.com/cinfony/cinfony/blob/master/cinfony/cdk.py#L54)
it returns a Python dictionary of CDK descriptor names and their
associated classes.

- Noel

Kevin Lawson

unread,
May 23, 2012, 4:56:56 AM5/23/12
to licss...@googlegroups.com
Thanks for the info, Noel - cool! - I probably should do something like this instead of the manual copy of class names I have used up to now - Kevin

Kevin Lawson

unread,
May 24, 2012, 12:05:43 PM5/24/12
to licss...@googlegroups.com

On Wednesday, May 23, 2012 2:42:06 AM UTC+1, Paul Gleeson wrote:
(1) Would it be possible to output the results from the descriptor calculation as numerical data. I notice that currently the results will write both text and numerical values in the same cell (i.e. ALOGP gives "ALogP: 0.0; ALogp2: 0.0; AMR: 0.0". Could this be split over 3 columns so that additional manipulation is needed to allow values to be plotted for example?
 
I have just put up a new version of LICSS2.2 on the project site which does exactly this.  You can update your existing enabled sheets in the usual way: run EnableChemicalSpreadsheetV2.2.xls and select just the workbook to update (not any worksheet or chart) before pressing OK.  You should get a "Code Modules updated only" message after which you can save the new file version and close/reopen it etc.  All feedback greatfully received

Kevin Lawson

unread,
May 25, 2012, 5:33:13 AM5/25/12
to licss...@googlegroups.com
Dear All
On Wednesday, May 23, 2012 2:42:06 AM UTC+1, Paul Gleeson wrote:
(1) Would it be possible to output the results from the descriptor calculation as numerical data. I notice that currently the results will write both text and numerical values in the same cell (i.e. ALOGP gives "ALogP: 0.0; ALogp2: 0.0; AMR: 0.0". Could this be split over 3 columns so that additional manipulation is needed to allow values to be plotted for example?
(2) Would it be possible to add an option to calculate all descriptors or a basic set of descriptors. At present I think one needs to manually select them all.
 
I have now implemented both these features in the 2.2 development version of LICSS.  You can choose multiple descriptors to calculate from a dialog box.  Each element of the returned descriptor array comes back in its own column - as numbers (if relevant).
 
Really interested in any feedback on this - download the LICSS-2.2-FullInstaller.exe file to try...
 
Best wishes

Kevin

Paul Gleeson

unread,
May 28, 2012, 10:06:50 PM5/28/12
to licss...@googlegroups.com
Dear Kevin,
I just got round to trying the new functionality and it works fine. Many thanks.

I noticed what i think is a descriptor calculation issue though. I used a dummy sheet to test the program and I noticed that for benzene and pyridine the alogp value comes back as zero. I think there are two issues here. The first issue is why does it fail for this simple molecule (should be 2 or so based on AlogP) which suggests the program has not run correctly, and (b) why are such descriptor failures (or null values like pKa) not reported with a blank cell, "error" or "N/A" value, rather than 0.

Kind regards,

Paul

ID Smiles ALOGP:ALogP ALOGP:ALogp2  ALOGP:AMR ALOGP activity
1 c1ccccc1 0 0 0
ALogP: 0.0; ALogp2: 0.0; AMR: 0.0
5
2 c1cccnc1 0 0 0
ALogP: 0.0; ALogp2: 0.0; AMR: 0.0
6
3 CCCCCN -1.551 2.405601 23.5678 ALogP: -1.5510000000000002; ALogp2: 2.4056010000000003; AMR: 23.567800000000002 4
4 CO(=O)CCCN -0.9357 0.875534 23.6977 ALogP: -0.9356999999999999; ALogp2: 0.8755344899999997; AMR: 23.697700000000005 2
5 c1ccccc1CCC(=O)O -0.1426 0.020335 16.6339 ALogP: -0.14260000000000025; ALogp2: 0.020334760000000073; AMR: 16.6339 7
6 c1ccccn1CC 0.3659 0.133883 12.8885 ALogP: 0.3659000000000002; ALogp2: 0.13388281000000016; AMR: 12.8885 8
7 NCC -0.3972 0.157768 14.2126 ALogP: -0.3971999999999998; ALogp2: 0.15776783999999983; AMR: 14.2126 9

Kevin Lawson

unread,
May 30, 2012, 4:01:34 AM5/30/12
to licss...@googlegroups.com
Hi Paul
 
Glad the functionality works - it was surprisingly easy to implement!
 
All the descriptors are calculated through the open source CDK (Chemical Development Kit) java libraries.  When I originally spotted that there appeared to be issues with ALogP, I checked that the LICSS code was returning the intended values by comparing with Rajarshi Guha's downloadable tool to calculate CDK descriptors: http://rguha.net/code/java/cdkdesc.html. I have now calculated ALogP for > 1000 cmpds and, in all cases, the returned values are identical using the two methods.  So the problem appears to be with the CDK ALogP implementation (or, perhaps, with the way structures are prepared for it).  I will make a posting to the CDK User group: http://sourceforge.net/mailarchive/forum.php?thread_name=a882e48b0903240934u682b0f6qc6039f4f64500f64%40mail.gmail.com&forum_name=cdk-user about this issue and will report back here with responses.
 
Best wishes
 
Kevin
 
LICSS uses the CDK

Kevin Lawson

unread,
Jun 1, 2012, 4:27:44 AM6/1/12
to licss...@googlegroups.com
Dear All
Feedback from Nina Jeliazkova on behalf of CDK:

> Kevin,

> 

> Better not use ALogP at all.  Back at the list you could find reports

> it is not performing well. I think it was decided to be removed /

> deprecated.

> 

 

Here is the comparison I did some time ago.  LogKow is an experimental value from an ECOSAR training set.

http://tinyurl.com/c93hee9

 

XLogP is much better http://tinyurl.com/d6belhs

 

That particular report was on OpenTox dev list , not on the CDK list, sorry.

 

Regards,

Nina

=>I will remove ALogP from LICSS until/unless they recode it.
 
Best wishes, Kevin
Reply all
Reply to author
Forward
0 new messages