Dear oddt developers/users;
I am very new in this field and try to use oddt to run a job for my target protein docked with a series of ligands to analyse the results by machine learning. I have installed anaconda and oddt module, however I am not sure where exactly I need to introduce the target protein, ligands and docked poses and where I need to use the library and v2007 database that we need to upload it and also where we should use the default database and information in the oddt toolkit that it seems is v2016. Actually I cant recognise them by reading the chapter book and also the published papers regarding this interesting area of computer science. I appreciate that very much if you could correct me by editing my below command please;
PS
The only command that shows an error is the last one.
Commands in anaconda/python:
import oddt
import pandas as pd
data = pd.read_csv(oddt.__path__[0] + "/scoring/functions/RFScore/rfscore_descs_v1.csv")
training_data = data[data['2016_refined'] & ~data['2016_core']]
features = training_data.iloc[:, -36:].values
activity = training_data['act'].values
target = next(oddt.toolkit.readfile('pdb', 'E:/oddt_machineLearning/PDBbind/PDBID/carLDM40.pdb'))
target.target = True
ligands = list(oddt.toolkit.readfile('mol2', 'E:/oddt_machineLearning/PDBbind/PDBID/allLigs_210.mol2'))
activities = pd.read_csv('E:/oddt_machineLearning/PDBbind/PDBID/carLDM40_scores.csv')
from oddt.datasets import pdbbind
dataset = pdbbind('E:/oddt_machineLearning/PDBbind/v2007/',version=2007,default_set='refined')
activity = dataset.activities
from oddt.scoring.functions import rfscore
desc_gen = rfscore(version=1).descriptor_generator
features = desc_gen.build(ligands, target)
from oddt.scoring.models.regressors import randomforest
model = randomforest(n_estimators=500)
model.fit(features, activities)
testing_data = data[data['2016_core']]
testing_features = testing_data.iloc[:, -36:].values
testing_activity = testing_data['act'].values
model.score(testing_features, testing_activity)
from oddt.scoring import scorer
scoring_function = scorer(model, desc_gen, score_title='my_custom_score')
target = next(oddt.toolkit.readfile('pdb', 'E:/oddt_machineLearning/PDBbind/PDBID/carLDM40.pdb'))
docked_poses = list(oddt.toolkit.readfile('sdf', 'E:/oddt_machineLearning/PDBbind/PDBID/carLDM40_dockedligs.sdf'))
scoring_function.set_protein(target)
scores = scoring_function.predict(docked_poses)
scoring_function.save('my_sf.pkl')
oddt_cli –score_file = my_sf.pkl ('sdf', 'E:/oddt_machineLearning/PDBbind/project/carLDM40_dockedligs.sdf') –protein ('pdb', 'E:/oddt_machineLearning/PDBbind/project/carLDM40.pdb') -o ('csv', 'E:/oddt_machineLearning/PDBbind/project/scores.csv')
Best,
Shirin