Hi Jessica,
I have to update the documentation on GitHub, but I have implemented what you requested.
As in the original inference, first, it searches for homologs of the transcription factor(s) using BLAST+, and then it compares the predicted DNA-binding domains (DBDs) of the transcription factor(s) and the homologs. Finally, the script returns transcription factor-homolog pairs whose pairwise DBD percentage of sequence identity (i.e., DBD %ID) is above a certain threshold (
from this manuscript). To skip the BLAST+ search for homologs, thereby performing an inference as in
the previous manuscript, use the option "
--no-blast".
(JASPAR-profile-inference) oriol@gpurtx-2:~/JASPAR-inference-tool$ ./infer_homolog.py --threads 32 ./examples/human/JUN_HUMAN.fa ./examples/human/human.fa
100%|████████████████████| 1/1 [00:00<00:00, 32.28it/s]
100%|████████████████████| 20601/20601 [00:19<00:00, 1080.40it/s]
100%|████████████████████| 1/1 [00:00<00:00, 1.53it/s]
Query Target E-value Query Start-End Target Start-End DBD %ID
sp|P05412|JUN_HUMAN sp|P05412|JUN_HUMAN 0.0 1-331 1-331 1.0
sp|P05412|JUN_HUMAN sp|P17535|JUND_HUMAN 2.37e-81 60-331 87-347 0.891
sp|P05412|JUN_HUMAN sp|P17275|JUNB_HUMAN 6.47e-70 1-331 1-347 0.828
Let me know if you have any questions.