Hi Weichen,
Thank you for your message!
You cannot input a FASTA file to `vep_cli.py` but you can modify the script and the corresponding configuration YAML file(s) to output FASTA file predictions. Here's some changes I would suggest:
```
def run_config(config_yml):
configs = load_path(config_yml, instantiate=False)
configs["prediction"]["input_path"] = arguments["<fasta>"] // take as input now a FASTA file
configs["prediction"]["output_dir"] = arguments["<output-dir>"]
parse_configs_and_run(configs)
run_config("configs/mouse_fasta.yml")
run_config("configs/human_fasta.yml")
```
In the `configs`, make a copy of each of the seqweaver .yml files and make one change: remove the `variant_effect_prediction: { ... }` section entirely and instead use
```
prediction: {
output_format: hdf5
}
```
Then you can take the difference between ref and alt FASTAs to get the SeqWeaver variant effect prediction difference scores (note DIS would be an additional step).
For your second question:
Seqweaver makes predictions for the center bin given an input sequence of 1kb, so I would recommend centering each variant. If the sequence extends outside of gene regions, you can replace the non-transcribed base pairs with 'NNNN's in the FASTA files you generate.
If you use Selene with an input VCF file, Selene doesn't have the capability to replace those bases with 'N's so there will probably be some noise added; in practice, far away sequences don't make too much impact on high impact variants so it shouldn't make too much of a difference.
Please let me know if there's anything I can clarify further.
Thanks!
Kathy