How to quantify the diversity of a database?

18 views
Skip to first unread message

Andrew Orry

unread,
Jan 5, 2025, 10:10:55 PM1/5/25
to MolSoft ICM Knowledge Base
Q. How to quantify the diversity of a database?
A.

To quantify the diversity of a compound database, one effective approach is to cluster the dataset and calculate the ratio of the number of clusters to the total number of compounds. This provides a simple measure of diversity, where a higher ratio indicates greater diversity.

Calculate the Diversity Ratio:

  • Count the total number of clusters formed at the chosen threshold.
  • Compute the diversity ratio as:

A larger ratio suggests that the dataset contains a broader range of unique chemical structures.

Example:  (replace Xdict with your table name below)


 make tree Xdict "UPGMA" split="cl" label="%NAME_;" name=""
 Nof( Unique( Sort( Split( Xdict.cluster 0.5 )))) / Real(Nof(Xdict))

Reply all
Reply to author
Forward
0 new messages