To quantify the diversity of a compound database, one effective approach is to cluster the dataset and calculate the ratio of the number of clusters to the total number of compounds. This provides a simple measure of diversity, where a higher ratio indicates greater diversity.
Calculate the Diversity Ratio:
A larger ratio suggests that the dataset contains a broader range of unique chemical structures.
Example: (replace Xdict with your table name below)