Oct 26, 2021, 7:35:28 PM10/26/21
to MolSoft ICM Knowledge Base
Q: How do I select diverse subset of K compounds from a larger set?
A: The following code will select K representatives (centroids) from
table 't' and copy them to table 't_subset'
K = 1000 # set to number of representatives you want to extract
# perform clustering
make tree t full "UPGMA" matrix split="cl"
I_out = Split( t.cluster K ) # split into K clusters
I_out = Index( t.cluster center r_out ) # uses r_out from the above
t_subset = t[I_out] # your subset