How to create and compare chemical fingerprints.

23 views
Skip to first unread message

Andrew Orry

unread,
Jun 11, 2021, 7:22:19 PM6/11/21
to MolSoft ICM Knowledge Base
Q. I want to share some chemicals with my collaborator but because of IP issues I do not want them to see the 2D structures - can this be done using Chemical Fingerprints?
A.

1) Fingerprints can be calculated like below:


read table mol "file1.sdf" name="t1"

# default molsoft similarity fingerprints
add column t1 name="fp_def" Descriptor( t1.mol )

# or ECFP4
add column t1 name="fp_ecfp4" Descriptor( t1.mol Collection("ATMAP" "cd,h" "SIZE" 2048 "BOMAP" "bt" "LEN" 9999 "TYPE" "ecfp" "ECFPITER" 3 "BINARY" yes  ) )

# after that point structures ('mol') can be removed and table can be saved as ICB for exchange

delete t1.mol
write binary t1 delete "t1.icb"

# similar operations can be done with other set 't2'


2) To analyze the similarity between two table with calculated fingerprints

read binary "t1.icb"
read binary "t2.icb" 

M_out = Distance( t1.fp_def t2.fp_def )
or
M_out = Distance( t1.fp_ecfp4 t2.fp_ecfp4 )

# for each row find minimal distance to other set
add column t1 Min( Transpose(M_out) ) name="min_di"
add column t2 Min( M_out ) name="min_di"

# calculate "overlap"
Nof( t1.min_di < 0.5 )   # number of compounds in t1 which have similar (di<0.5) in t2
Nof( t2.min_di < 0.5 )   # number of compounds in t2 which have similar (di<0.5) in t1


Reply all
Reply to author
Forward
0 new messages