Hi Ricardo,
Original NNScore uses undefined set of structures from PDBBind refined set, Binding Moad and additional decoys selected from NCI DTP set docked to all proteins with low affinity (-4 - 0 kcal/mol, again undefined). I've reached out to Jacob Durant asking if he's willing to share the training set, but I've received no feedback. ODDT's implementation uses plain refined set, any latest supported version, so 2016 for now.
The other difference is the Autodock Vina scores. ODDT uses internal implementation of those, which makes the biggest difference in terms of speed and removes the need to convert and save ligands in PDBQT files. I've received reports that also this scoring performed slightly better for some users.
The last difference is subtle but I'd also like to disclose it. ODDT uses scikit-learn's neural network implementation + plus normalization (
https://github.com/oddt/oddt/blob/master/oddt/scoring/models/regressors.py), where first versions of ODDT and NNScore uses ffnet. I didn't liked the extra dependency and it was problematic at times for users to compile, but I found that scikit-learn's MLPRegressor is extremely efficient and performs most times better than ffnet. Last point there is that the original NNScore caps training set during the normalization (at 15% as I remember correctly) to make a better extrapolation, while ODDT has no such restriction and uses StandardScaler instead.
I've done a comparison between ODDT and the original NNscore on DUD-E while working on RF-Score-VS and our implementation performed slightly better, although mean Enrichment Factor for both were around the RF-Score v3 performance, see here
https://www.nature.com/articles/srep46710/figures/2
I hope that helps. Let me know should you have further questions.