Running code on danQ.py

37 views
Skip to first unread message

Dat Duong

unread,
Dec 18, 2019, 6:57:33 PM12/18/19
to Selene (sequence-based deep learning package)

Hi, I saw that there is the danQ.py at https://github.com/FunctionLab/selene/blob/master/models/danQ.py.

I would like to run the danQ.py on the original Deepsea result (in 2015) using this newest code. What do you recommend I should modify in this newest code? 

I see that in the paper says we can edit https://selene.flatironinstitute.org/overview/cli.html#multiple-file-sampler, and so I tried and (rather lucky) was able to get this configuration to work with the selene_cli.py. 


Here's my command CUDA_VISIBLE_DEVICES=3,4 python3 -u ../../../selene_cli.py ./train_deepsea_mat_danq.yml --lr=0.0001


Here's the key part in the yml script (sorry for not pasting the whole script, but I didn't change anything very much, only the top few lines). 


ops: [train, evaluate]

model: {

    path: /local/datdb/selene/models/danQ.py,

    class: DanQ,

    class_args: {  # DanQ.__init__ inputs

        sequence_length: 1000,

        n_genomic_features: 919

    },

    non_strand_specific: mean

}



Unfortunately, my loss values do not make sense; for example, the danQ.py loss does not constantly decrease. In my result folder, I look at the file selene_sdk.train_model.train.txt. I see the values as below


loss               

0.4693177044391632 

0.8484362363815308 

0.482471764087677  

0.5153568983078003 

0.5148870944976807 

0.5938114523887634 

0.8780328035354614 

loss               

loss               

loss               

0.05610937252640724


I assume that each number is the loss for each epoch (or each iteration of 64000 steps). As you see, the loss is not constantly decreasing. Would you be able to help me understand what is the exact meaning of this output? Should I decrease learning rate? Should I play around with different values of batch size? 


Thanks.

Kathy Chen

unread,
Dec 19, 2019, 10:17:45 AM12/19/19
to Selene (sequence-based deep learning package)
Hi Dat,

Thanks for your question. 

Yes, the loss is reported after every n_steps * batch_size samples trained. The loss output appends to the existing file every time, so if you see multiple `loss` headers it's because you've re-run the training a few times & the job failed/ended before the first time loss was reported in that run. As for the loss value - that very last run, with loss reported as 0.06 seems promising to me. Did you continue that run and does it seem to decrease further? 

It is worth noting that we have also seen somewhat unstable loss progression in our training runs, where overall it may decrease but we do observe some fluctuations up and down throughout the training period.

You can experiment with learning rate, increase batch size, and maybe try different optimizers (modifying `danQ.py` directly, i.e. you can try using DeepSEA's optimizer) to see if that helps too.

Let me know if I can clarify any of this further for you! 

Dat Duong

unread,
Dec 20, 2019, 3:54:11 AM12/20/19
to Selene (sequence-based deep learning package)
Thanks. I sent you a reply in my gmail by using "reply all" but I guess it didn't show up here. 
Reply all
Reply to author
Forward
0 new messages