Question about NNScore implementation

16 views

Skip to first unread message

rav...@protonmail.com

unread,

Mar 8, 2018, 5:06:50 PM3/8/18

to Open Drug Discovery Toolkit Community

Hello,

Sorry to bother you, but I have a question about the difference between NNScore in ODDT and the standalone version. You mentioned before that the version of NNScore implemented in ODDT performs better, and does not use pdbqt files. I was wondering what the difference is between the two models in terms of the features used, or architecture. Where can I look at the source code for these changes?

Thanks,

Ricardo Avila

University of Texas at El Paso

Maciek Wójcikowski

unread,

Mar 9, 2018, 4:10:37 AM3/9/18

to rav...@protonmail.com, Open Drug Discovery Toolkit Community

Hi Ricardo,

The main two components are the descriptor generator (Binana, https://github.com/oddt/oddt/blob/master/oddt/scoring/descriptors/binana.py#L14) and the scoring function itself (https://github.com/oddt/oddt/blob/master/oddt/scoring/functions/NNScore.py#L22). I have followed the NNScore papers as closely as possible in our implementation, but some minor changes might occur, like secondary structure determination, or acceptor/donor definitions. The biggest difference might not be obvious at first glance and it is the training dataset.

Original NNScore uses undefined set of structures from PDBBind refined set, Binding Moad and additional decoys selected from NCI DTP set docked to all proteins with low affinity (-4 - 0 kcal/mol, again undefined). I've reached out to Jacob Durant asking if he's willing to share the training set, but I've received no feedback. ODDT's implementation uses plain refined set, any latest supported version, so 2016 for now.

The other difference is the Autodock Vina scores. ODDT uses internal implementation of those, which makes the biggest difference in terms of speed and removes the need to convert and save ligands in PDBQT files. I've received reports that also this scoring performed slightly better for some users.

The last difference is subtle but I'd also like to disclose it. ODDT uses scikit-learn's neural network implementation + plus normalization (https://github.com/oddt/oddt/blob/master/oddt/scoring/models/regressors.py), where first versions of ODDT and NNScore uses ffnet. I didn't liked the extra dependency and it was problematic at times for users to compile, but I found that scikit-learn's MLPRegressor is extremely efficient and performs most times better than ffnet. Last point there is that the original NNScore caps training set during the normalization (at 15% as I remember correctly) to make a better extrapolation, while ODDT has no such restriction and uses StandardScaler instead.

I've done a comparison between ODDT and the original NNscore on DUD-E while working on RF-Score-VS and our implementation performed slightly better, although mean Enrichment Factor for both were around the RF-Score v3 performance, see here https://www.nature.com/articles/srep46710/figures/2

I hope that helps. Let me know should you have further questions.

----
Pozdrawiam, | Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

--
You received this message because you are subscribed to the Google Groups "Open Drug Discovery Toolkit Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oddt+unsubscribe@googlegroups.com.
To post to this group, send email to od...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oddt/c62c5701-59c8-4504-b207-546d435f290a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages