1) Fingerprints can be calculated like below:
read table mol "file1.sdf" name="t1"
# default molsoft similarity fingerprints
add column t1 name="fp_def" Descriptor( t1.mol )
# or ECFP4
add column t1 name="fp_ecfp4" Descriptor( t1.mol
Collection("ATMAP" "cd,h" "SIZE" 2048 "BOMAP" "bt" "LEN" 9999
"TYPE" "ecfp" "ECFPITER" 3 "BINARY" yes ) )
# after that point structures ('mol') can be removed and table can
be saved as ICB for exchange
delete t1.mol
write binary t1 delete "t1.icb"
# similar operations can be done with other set 't2'
2) To analyze the similarity between two table with calculated
fingerprints
read binary "t1.icb"
read binary "t2.icb"
M_out = Distance( t1.fp_def t2.fp_def )
or
M_out = Distance( t1.fp_ecfp4 t2.fp_ecfp4 )
# for each row find minimal distance to other set
add column t1 Min( Transpose(M_out) ) name="min_di"
add column t2 Min( M_out ) name="min_di"
# calculate "overlap"
Nof( t1.min_di < 0.5 ) # number of compounds in t1 which have
similar (di<0.5) in t2
Nof( t2.min_di < 0.5 ) # number of compounds in t2 which have
similar (di<0.5) in t1