Hi All,
I am running selene_cli.py on enhancer sequences from tissue of my interest. There are ~36000 enhancers, and I am considering 600bp from the peak center. I am running selene with three different intervals using deeperdeepsea architecture
1. Custom intervals (All open chromatin regions from the same tissue, and enhancers are a subset of these open chromatin regions) n=133,931.
2. Random sampler
3. Deepsea TF intervals.
When I ran selene_cli.py, the output (selene_sdk.train_model.validation.txt) is as follows. :
loss average_precision roc_auc
2.1934749383945018e-05 1.0 NA
1.025205165205989e-05 1.0 NA
6.6757424974639434e-06 1.0 NA
5.006802894058637e-06 1.0 NA
RandomSampler.yml:
loss average_precision roc_auc
0.0259966566034127 0.047024421248050424 0.8781358803654997
0.02558650181721896 0.05514831338163669 0.8890238953359162
0.024295823980821297 0.07944684755956975 0.9072231938391968
0.024937484529567882 0.06972641161781534 0.9126959874902972
0.023475201040739194 0.08703433436639303 0.9105172974661733
loss average_precision roc_auc
2.1934749383945018e-05 1.0 NA
1.025205165205989e-05 1.0 NA
6.6757424974639434e-06 1.0 NA
Its still running but I am wondering why the deepsea TF intervals and custom regions give NAs in AUC.
And the precision is low but the AUC is improved with deeperdeepsea. My ultimate aim is to do in silico mutagenesis on eQTLs from the same tissue. So I am wondering what would be a good precision to consider the results are reliable.
Thanks,
Goutham A