Integrating Selene-built Model with ExPecto

47 views
Skip to first unread message

Nicolas Barbera

unread,
Sep 27, 2021, 3:14:16 PM9/27/21
to Selene (sequence-based deep learning package)
Hello,

I was thinking of using a tissue-specific CNN trained via Selene and integrating it into the ExPecto analysis pipeline (effectively, replacing the DeepSea model in the current set-up with my own CNN model, but keeping the downstream spatial feature transformation and expression prediction the same). Is this possible with the current code? Or, are the output model parameters from Selene not compatible with what the ExPecto pipeline needs?

Thank you! 

Jian Zhou

unread,
Oct 25, 2021, 12:05:07 PM10/25/21
to Selene (sequence-based deep learning package)
Sorry for the late reply! Using Selene model with the ExPecto pipeline is possible but there will be a few steps and modifications needed. The Selene trained model should be generally compatible with the ExPecto code, with the exception that the sequence encoding is different (ExPecto uses A: [0,0,0,1] G: [0,0,1,0], C: [0,1,0,0], T: [1,0,0,0];  while Selene uses  A: [0,0,0,1] C: [0,0,1,0], G: [0,1,0,0], T: [1,0,0,0] by default; ) so you would need to modify ExPecto code to use the Selene sequence encoding.   More importantly, the ExPecto models need to be trained and training data need to be generated with the new CNN model (the train.py in ExPecto repo can be used for training but the 'Xreducedall.2002.npy' file needs to be remade because the CNN is updated. The code for making the training data was not provided in the ExPecto repo but you should be able to follow discussions and code in this issue to generate the training data https://github.com/FunctionLab/ExPecto/issues/9 ).

Thank you,
Jian

Jian Zhou

unread,
Oct 25, 2021, 12:24:07 PM10/25/21
to Selene (sequence-based deep learning package)
Sorry correction to the sequence encodings I mentioned from last message, it should actually be (ExPecto uses A: [1,0,0,0] G: [0,1,0,0], C: [0,0,1,0], T: [0,0,0,1];  while Selene uses  A: [1,0,0,0] C: [0,1,0,0], G: [0,0,1,0], T: [0,0,0,1] by default; ) instead.
Jian

Reply all
Reply to author
Forward
0 new messages