How do I select diverse subset of K compounds from a larger set?

5 views
Skip to first unread message

Andrew Orry

unread,
Oct 26, 2021, 7:35:28 PM10/26/21
to MolSoft ICM Knowledge Base
Q: How do I select diverse subset of K compounds from a larger set?

A: The following code will select K representatives (centroids) from table 't' and copy them to table 't_subset'

#
K = 1000    # set to number of representatives you want to extract
# perform clustering
make tree t full "UPGMA" matrix split="cl"
#
I_out = Split( t.cluster K )  # split into K clusters
I_out = Index( t.cluster center r_out )   # uses r_out from the above

t_subset = t[I_out]   # your subset 
Reply all
Reply to author
Forward
0 new messages