Using SPLIF with sklearn Algorithms

16 views
Skip to first unread message

mate...@gmail.com

unread,
Jun 12, 2019, 7:08:30 PM6/12/19
to Open Drug Discovery Toolkit Community
Hi,

I'm looking to use ODDT's implementation of SPLIF to train an sklearn SVM Classifier. The classifier requires an input numpy array of shape (n_samples, n_features).

The problem I'm having is that ODDT's SPLIF calculation returns an array where:  # of rows = # of atoms, for each molecule / ligand.

Because each ligand has a different number of atoms, the number of rows returned will be different for each ligand, thus each ligand will have a different number of features when converted to a 2D-Numpy array to feed the sklearn classifier.

Is there a way to work around this so that each ligand in a given set will have the same number of features (like with PLEC calculation)?

Thanks,
Mateo

Maciek Wójcikowski

unread,
Jun 17, 2019, 3:36:18 PM6/17/19
to mate...@gmail.com, Open Drug Discovery Toolkit Community
Hi Mateo,

As you noticed the `SimpleInteractionFingerprint` are complex fingerprint, because they were intended to be used for similar conformers clustering, not for cross protein model training, as I understan your task. 

There are also `InteractionFingerprint` which return 8bits per residue in a protein  (some refer to it as PLIF). Since the number and type of residues are different across structures you cannot use them for your application. There are `SimpleInteractionFingerprint` which does what you ask - bins amino acid types together regardless on the position in sequence.

This is one of the exact reasons why we developed PLEC, and I hope they serve you well. you can simulate to some extent how SPLIF encodes interaction by setting both depths to 1: `PLEC(ligand, protein, depth_ligand=1, depth_protein=1)`. It will not be exactly the same, because SPLIF only has depth=1 environment, and PLEC will contain depth=0 too, but I would not think that is a big difference.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


--
You received this message because you are subscribed to the Google Groups "Open Drug Discovery Toolkit Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oddt+uns...@googlegroups.com.
To post to this group, send email to od...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oddt/259ee31e-3dde-455a-a5dc-e82e584457c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages