QSAR – Descriptor Generation, Classification Models, and Diverse Selection

11 views
Skip to first unread message

Andrew Orry

unread,
Jun 5, 2025, 3:19:52 PMJun 5
to MolSoft ICM Knowledge Base
Q: Can ICM generate numerical descriptors for QSAR modeling?

A: Yes. Go to Chemistry → Calculate Properties. There are several categories of descriptors:

  • Individual (e.g., MolLogP, MolWeight, etc.)

  • Grouped descriptors like Atom Counts, Bond Counts, and Topological Descriptors

These generate multiple columns in the table. You can mix numerical descriptors with fingerprints when building a model. For example, the MolSoft solubility model (MolLogS) was built using MolLogP + binary ECFP4 fingerprints.

Documentation: https://molsoft.com/gui/calculate-properties.html


Q: Does ICM support classification models (e.g., active/inactive)?

A: Yes. Use Random Forest for classification.
Ensure your training column contains integers (e.g., 1 = active, 0 = inactive). The "Learn" dialog will allow you to train a classifier using the selected descriptors.
ICM also supports neural network training via command-line in a special Linux package.

Documentation: https://molsoft.com/gui/learning.html


Q: How can I select a diverse set of compounds based on activity?

A: Use the clustering tool in the following way:

  1. Select the activity column.

  2. Choose Table → Clustering and check Keep Distance Matrix.

  3. Use the slider to define the number of clusters.

  4. Right-click on the table and choose Select Centers by…

  5. In the dialog, select From Clustering Distance.
    This will select representative compounds from each cluster for a diverse output selection.

Documentation: 
Reply all
Reply to author
Forward
0 new messages