Hi,
I was trying tu run AME with `--method linreg` to find out if there is any motif enriched in a target set of FASTA sequences, but it keeps failing. I have created a couple test files (test.fa and test.meme, see further below). This is what I have tried so far:
- meme_4.12.0/bin/ame --verbose 4 --method linreg -oc foo_bar test.fa test.meme
ame Option: method - linreg
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.
M2: +MA0002.2 - Seq: chr1157283876157285044 Rankings[0] - pwm: 0.00000000 prank: 4 f: 0.80000000 frank: 1
M2: +MA0002.2 - Seq: chr213670229836702881 Rankings[1] - pwm: 0.00000000 prank: 3 f: 0.60000000 frank: 2
M2: +MA0002.2 - Seq: chr213679253536793047 Rankings[2] - pwm: 2.00000000 prank: 1 f: 0.50000000 frank: 3
M2: +MA0002.2 - Seq: chr213685838936858790 Rankings[3] - pwm: 0.00000000 prank: 2 f: 0.20000000 frank: 4
M2: +MA0002.2 - Seq: chr213690970836910399 Rankings[4] - pwm: 0.00000000 prank: 5 f: 0.10000000 frank: 5
FATAL: The bit pattern 10xxxxxx is illegal for the first byte of a UTF-8 multibyte.
AME (Analysis of Motif Enrichment): Compiled on Jul 31 2017 at 15:55:07
------------------------------
Copyright (c) XXXXX, 2009.
Command line
ame --verbose 4 --method linreg -oc foo_bar test.fa test.meme
In partition maximization mode.
Threshold p-value for reporting results: 0.05
1. LinReg MSE of motif � (g) top 42012832 seqs: 1.6304e-322 (m: 2.0760e-316 b: 0.0000e+00)
- Conclusion and next step: it looks like the program doesn't like special characters in the motif names (which is extracted from JASPAR_CORE_2016_vertebrates.meme). Remove .2 from motif name and run again
- meme_4.12.0/bin/ame --verbose 4 --method linreg -oc foo_bar test.fa test.CLEANED.meme
ame Option: method - linreg
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.
M2: +MA0002 - Seq: chr1157283876157285044 Rankings[0] - pwm: 0.00000000 prank: 4 f: 0.80000000 frank: 1
M2: +MA0002 - Seq: chr213670229836702881 Rankings[1] - pwm: 0.00000000 prank: 3 f: 0.60000000 frank: 2
M2: +MA0002 - Seq: chr213679253536793047 Rankings[2] - pwm: 2.00000000 prank: 1 f: 0.50000000 frank: 3
M2: +MA0002 - Seq: chr213685838936858790 Rankings[3] - pwm: 0.00000000 prank: 2 f: 0.20000000 frank: 4
M2: +MA0002 - Seq: chr213690970836910399 Rankings[4] - pwm: 0.00000000 prank: 5 f: 0.10000000 frank: 5
Segmentation fault (core dumped)
- output ame.txt content: empty
- Conclusion and next step: not sure if the change moved things forward or backwards. Try --method spearman with original motif file
- /data/reddylab/software/meme_4.12.0/bin/ame --verbose 4 --method spearman -oc foo_bar test.fa test.meme
ame Option: method - spearman
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.
M2: +MA0002.2 - Seq: chr1157283876157285044 Rankings[0] - pwm: 0.00000000 prank: 4 f: 0.80000000 frank: 1
M2: +MA0002.2 - Seq: chr213670229836702881 Rankings[1] - pwm: 0.00000000 prank: 3 f: 0.60000000 frank: 2
M2: +MA0002.2 - Seq: chr213679253536793047 Rankings[2] - pwm: 2.00000000 prank: 1 f: 0.50000000 frank: 3
M2: +MA0002.2 - Seq: chr213685838936858790 Rankings[3] - pwm: 0.00000000 prank: 2 f: 0.20000000 frank: 4
M2: +MA0002.2 - Seq: chr213690970836910399 Rankings[4] - pwm: 0.00000000 prank: 5 f: 0.10000000 frank: 5
Elapsed wall clock time: 1501599366 seconds
Elapsed CPU time: 0.000000 seconds
AME (Analysis of Motif Enrichment): Compiled on Jul 31 2017 at 15:55:07
------------------------------
Copyright (c) XXXXX, 2009.
Command line
ame --verbose 4 --method spearman -oc foo_bar test.fa test.meme
Not in partition maximization mode. Fixing partition at 5.
Threshold p-value for reporting results: 0.05
Spearman MSE of motif +MA0002.2 top 5 seqs: 0
1. Spearman Rho of motif MA0002.2 RUNX1 (BBYTGTGGTTT) top 5 seqs: 0.0000e+00
- Conclusion: Spearman rank works fine (and so does fisher, BTW). Something wrong with `linreg`.
Do you have any idea on where the error might be? I'm happy to provide more info if needed. Following the 2010 paper, it seems that lineal regression is the best method overall, so if possible I would like to be able to run it.
Thanks a lot!
Alex
test.fa
>chr1157283876157285044 0.8
ACAGGAGGATAATCCCATCAGAATACTCCAGGATTCCAACAGtatcttagtctatttagctactataataaaataccttagatgtataaatcactgggcaatttgtatataatagaaatttatttctcacagtctggaggctgagaagtccaatatcaaggcaccagtggattcaggatctggtaaggcagattcagggtctcttcatctcaccttcagctccatagatggcatcttattgtgacatcatcacatagcagagggggaaaCACTGCACTCAGTGGTGAAAGGGGAGAACAGTATGTCCTCACATGATGGAACGTGGGAACACCATGTCCTTACATTATGGAAGCAGGCAAcactgcgtcctcacatggtggaaggggcaaTACTGTGTCCTCACATGATGGAAAGCAGGAACACCGTGTCCTCACATGGTGGAAGGGGAAATACTGTGTCCTCACATGATGGAAAGTGGGAAcactgtgtcctcacatggtggaaggggcaaTactgtgtcctcacatggtggaaggggcaaTACTCTGTCATTACATAGTGGAAGCAGGGAACACTATGTCCTTACATAATGGAAGGAGAAACACCATGTCCTCATATGTTGGAAGTGGGGAACACCATGTCCTCATATGGAGAAAGGGGGTAAGACTATGTCCTCACATGGTGGAAGGGAGCAACACTGTGTCCTCACATGGTGAAAGGCAGGAACATCATGTCCTTATATGGTGGAAGGGGGAAACACAATGTCTTCACATGATCCaaggcaggaacactgtgtcctcacatggtggaagTGGGGAACActgtattctcacatggtggaaggaggaacactgtgccctcacatggttgaagtgacagaagggcaaaaggggtgacactgtcctcaagccattttatgagtaccaatcccattcttgggacctaattacctcctaaaacccggctcttcttcatgcttttgcattggggattaagttccaacatgaaacatccaaaccatagcaACGGTCTTCAGTGACCATTGTCATGaggggaaaagataggagctttggaatcatccagtgttggatctaaggctttgctctgccaccaactagttttgAATACTGTTGACCTATACCAACTATAGTAGGTTCTTATAACAATT
>chr213670229836702881 0.6
GGCACTCTGGAAGGGAGCTGTTTGGCCCTAGAGTTTTGGAAAGGGCCCTGAACCTGTTCGGTCCCCCTCGGAAAGGGAAGGGAGCAGTGGCTTAGTCCCTCCCTCCTCCATTCGTGCAATGCCTGGGGTAGGGGTAGACCTGGAGCCGGTGGACTCATATCCTTGGAATTCGTCAGGACAGCTGCTCCGGGGCCTTGGCCCTCAGTCAGTCTGGGGCTGAGGAGTAGGGAAGCTGGGAACTTGGGGCAGAGGAAGAAGATGCGTTTAGAAAGACCTCCATTATGCAAACTGGAGTCCATTTATGCAAACTGGTCACCCTTCCAGTAGCTCCAAAGAGTGGCAGTGGAGTGGCATCTTGATTGATTTAACCTCTTCTCAGGGGACCTGGGTCTGCGAGGGAGGATATGGCTGCGGGGTTGGAATAGGATCTGTCTGAGCTGCCAGGGTCAGGGTGGTGGCCCTAGGGAGGTTTTAGGGCCAGGGTGGTCCCGGGCTGTGGCAGGGGCTCTCAGATCGCCTCGGGCTCTCAGCTGCAAGGTGAAAAATACCATGAGGAATTGATCTGCCAAGGGCGGTCTTGTCT
>chr213679253536793047 0.5
AGCCAATTGTGCTACGTCAGTCCAGAGTGAGCTGCTACTTATGATTTCATTTCCTCCTTAGACTGATGGGAGCAATGACAatccccatttgtagatgaggcacctgctcagagaactgaagttactcatccagggtcacatggctgtcaagaggcatgcccagtacCTATGACAAACATTCATCCCCAAGTAACGAAGCCAACAACCTGTGTCTCACTCCAGAACCACAGAGCTGTTACAACACGTGGTGCCTCCTGAGCCAGCAGCCAGGGGCAAGAACGAGAGGATGCAGGGAGAGGCTCTGGGGAAAGGGCTGTGTCCCCTCAGTCATGTGACCCCCAGGCTTTCTAGAGTGACTCATGGTGAGGACAGCAAAGATCTCAAAATGCTGTCTAGACCTTAACTCTGCCCATCACCCCATCCATTTGGTTTTGTTTCCCTTTCATTTGCATGTGTAGTCACATGCTCAGCATCCTTGGGGGCAGCTGCATCTGTGGTCCTCCAGAACCCCA
>chr213685838936858790 0.2
CCAAGAGATGCCATGACTGATAAAACTGAGGTTGGAAAGATCAATGTCTTACAGAAAACAGAAGATTGGGGGGAAACGATTGAAACAGCACTGAATATGTTTATGAAATGCatccactcactcaacaaatatttaccgtgcctctgtgtcatccaagttccagggatacagcagtgagcaaagttcctgactttgtaaggtttaagtgatagagacagagacagatgttaaataacgcagtacatcaagggctgaggaggccttgaagaatagtggatggagggtgcctgcagtgaggtggtggctgctgttgtcaggctgggagggagcccgccctgccatgaggacctctgggcatctgcaggagacagggagcaggccatgcagctgtgcggggcaag
>chr213690970836910399 0.1
cagactcccaaagtgctgggtttacaggtgtgggccacATCACTTTAGAGCGGTTCCACCCCTGTTACTTGTCTTATCTGCTAAATTTACCCAGCTAACAAGATTCAGTGTTGAATTCAAAATATTTCTTATGATCTCCATGTATGTTTTGGAGCTTCTTTTCCCAAGAAACACCAAATAATTGATCTTAACCCATAAAGGTTTTCTATCTCAGAGACACATTTGCCTCATTGCAAATAATAAGATTAGTTGTGCTTCAGTCATAACATCGTAAGTTGCTAGGCAGCCAGAAGTGAGTTAATCAGAAAATGAATCATCAAAGACATGGTACGCAATCTCCTTTTATGGGTGTCTCATTCTTAAAACACTTTCAAGGAAGCAGAAGCAAATATTTGTCTACTCGTATGGAATTTATATGACGCAATCCAATTCTTACTATTGGTATTTGACACGCAAAATACTCACACGCAATTTCATGTTCGCTTGGTGTCTTCACTCACAGGACAAAGCCAGATGGCACAGCAAATCTTTATAATCAAAAGATGGCACTCCTGggccgggcgcagtggctaatgcctgtaatcccggcactttgggaggctgaggtgggtggatcacctgaggtcaggagtttgagaccagcctggccaacatggtgaaaccccatctctactaaaaaatacaaaaatta
test.meme
MEME version 4
ALPHABET= ACGT
strands: + -
Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000
MOTIF MA0002.2 RUNX1
letter-probability matrix: alength= 4 w= 11 nsites= 2000 E= 0
0.143500 0.248000 0.348000 0.260500
0.117000 0.242500 0.233500 0.407000
0.061500 0.536000 0.074500 0.328000
0.028500 0.000000 0.003500 0.968000
0.000000 0.037500 0.936000 0.026500
0.043500 0.063500 0.035000 0.858000
0.000000 0.000000 0.993500 0.006500
0.008500 0.021000 0.924000 0.046500
0.005000 0.200000 0.125500 0.669500
0.065500 0.231500 0.040500 0.662500
0.250000 0.079000 0.144500 0.526500
test.CLEANED.meme
MEME version 4
ALPHABET= ACGT
strands: + -
Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000
MOTIF MA0002 RUNX1
letter-probability matrix: alength= 4 w= 11 nsites= 2000 E= 0
0.143500 0.248000 0.348000 0.260500
0.117000 0.242500 0.233500 0.407000
0.061500 0.536000 0.074500 0.328000
0.028500 0.000000 0.003500 0.968000
0.000000 0.037500 0.936000 0.026500
0.043500 0.063500 0.035000 0.858000
0.000000 0.000000 0.993500 0.006500
0.008500 0.021000 0.924000 0.046500
0.005000 0.200000 0.125500 0.669500
0.065500 0.231500 0.040500 0.662500
0.250000 0.079000 0.144500 0.526500