Hello PWMScan team,
I am a young bioinformatician from the Lausanne University Hospital (CHUV) in Switzerland and I want to use your tool PWMScan.
More precisely, I want to find all location, on the mouse genome (mm39), for the list of all transcription factor from the HOCOMOCO V12 database.
I wanted to automatize this task for all TF of the database HOCOMOCO, but I didn't find any API to do it. Is there a another way to do it for the whole database? Also I wanted to use the latest version of genome assembly of the mice mm39 but it didn't appear on your list. I try to upload the full genome (2.8G) but it's to big. Would it also be possible to put mm39?
Best regards
Maëlick Brochut
Dear Maëlick
We plan to add mm39 to the PWMScan server but I can't promise you that this will happen soon. Also, we don't have an API currently.
However, you could use the PWMScan software locally under Linux or MacOS. Download the latest version from:
https://gitlab.sib.swiss/EPD/pwmscan
You will need two programs, matrix_scan and matrix_prob.
matrix_scan scans a sequence library with a single position weight matrix (PWM). The HOCOMOCO V2 CORE PWM library is available from:
https://epd.expasy.org/ftp/pwmlib/hocomocov12_core_matrix_logodds.mat
You will need to split this library file
into individual files containing one matrix. The files will have the following
format:
>log-odds matrix AHR.H12CORE.0.P.B: alength= 4 w= 10 n= 0 bayes= 0 E= 0
-51 29 64 -97
-42 53 -72 26
-130 -91 59 64
-109 -209 -146 155
-497 -190 175 -158
-639 198 -797 -427
-480 -697 198 -539
-697 -10000 -10000 200
-797 -10000 200 -10000
-76 158 -497 -137
(If you need help with the splitting, we can do it for you)
The program matrix_scan requires a cut-off value in raw score units. You can determine the raw score corresponding to the Pval = 0.00001 using matrix_prob:
matrix_prob -e 0.00001 -b 0.29,0.21,0.21,0.29 AHR.H12CORE.0.P.B.mat
(returns: SCORE : 1305)
You can then run matrix_scan as follows:
matrix_scan -m AHR.H12CORE.0.P.B.mat -c 1305 < mm39.fna > ..
This took 30 seconds on my MacBook Pro.
I hope this was helpful. Don't hesitate to contact me again of you run into problems or if you need clarifications on some points.
Good luck,
Philipp
Hello,
As I haven't heard back from you, I wanted to follow up on my previous email regarding the possibility of updating the mouse genome assembly to mm39 and how to automate the process for multiple requests.
Best regards,
Maëlick brochut
Everything works perfectly with your instructions, and it runs really fast!
Thanks a lot for this great tool and for being responsive to my questions.
Have a nice day.