questions about variant effect prediction

38 views
Skip to first unread message

Stephen Wang

unread,
Aug 6, 2020, 3:35:08 PM8/6/20
to Selene (sequence-based deep learning package)

Hi All,

I have some questions when I would like to use Selene to do variant effect prediction.

1, I ran the same codes in the variant effect prediction part of the tutorial except setting "use_cuda=False":

This gave me two files: "25k_example_variants_abs_diffs.tsv" and "25k_example_variants_abs_diffs.NA".

The file "25k_example_variants_abs_diffs.tsv" is quite different from your result provided in the example. It has three more columns: "strand", "ref_match", and "contains_unk". And the scores are also different. My results look like this:

error.png

Do you know what's wrong? 

2, And it seems that these three more columns of the tsv file cause an error in the following visualization step. The "load_variant_abs_diff_scores" function will read scores from the 5th column.  But the score columns start from the 9th column now. Could you please check the codes?

3, If I want to use the HeartENN model to do variant effect prediction, how to specify the HeartENN model architecture in the input parameter? And is there a pipeline of calculating HeartENN scores utilizing the pre-trained model provided by the HeartENN paper?

Many thanks,

Stephen


Kathy Chen

unread,
Aug 7, 2020, 10:41:53 AM8/7/20
to Selene (sequence-based deep learning package)
Hi Stephen,

Thanks so much for your post! 

1. The tutorials are only compatible with the version of Selene that was specified at the time of publication. Updates to both Selene and newer versions of PyTorch has resulted in large differences in the variant effect prediction results (and yes, the output has more columns), so you would need to use an older version of Selene to run and reproduce the tutorials. Apologies for this! We really should get those updated but our dev team is currently very very small and we haven't been able to get around to it... 

2. Thanks for pointing this out, that is definitely a bug because we did not update the visualization steps when we updated the output from variant effect prediction. I will create an issue on Selene's github and try to get that resolved. In the meantime, if you'd like to run the vis on a fork of Selene and modify the code yourself (or email me and we can work on this together), we can make some small modifications to get that working and then possibly merge that modification into the master branch in a future release.

3. Can you email me directly to discuss running HeartENN with Selene? (kc...@princeton.edu) I am working on releasing some code to run HeartENN with Selene; however, HeartENN models were created with a much older version of Selene and we are still trying to work out some details surrounding that. Since that repo isn't public/finalized yet, I don't want to attach example YAML files here but I can email them to you.

Thanks again!
Kathy
Reply all
Reply to author
Forward
0 new messages