AME --method linreg error(s)

77 views
Skip to first unread message

Alejandro Barrera

unread,
Aug 1, 2017, 11:08:02 AM8/1/17
to MEME Suite Q&A
Hi,

I was trying tu run AME with `--method linreg` to find out if there is any motif enriched in a target set of FASTA sequences, but it keeps failing. I have created a couple test files (test.fa and test.meme, see further below). This is what I have tried so far:

  • meme_4.12.0/bin/ame --verbose 4 --method linreg -oc foo_bar test.fa test.meme
    • trace log:
ame Option: method - linreg
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.

M2: +MA0002.2 - Seq: chr1157283876157285044 Rankings[0] -       pwm: 0.00000000 prank: 4        f: 0.80000000   frank: 1
M2: +MA0002.2 - Seq: chr213670229836702881 Rankings[1] -        pwm: 0.00000000 prank: 3        f: 0.60000000   frank: 2
M2: +MA0002.2 - Seq: chr213679253536793047 Rankings[2] -        pwm: 2.00000000 prank: 1        f: 0.50000000   frank: 3
M2: +MA0002.2 - Seq: chr213685838936858790 Rankings[3] -        pwm: 0.00000000 prank: 2        f: 0.20000000   frank: 4
M2: +MA0002.2 - Seq: chr213690970836910399 Rankings[4] -        pwm: 0.00000000 prank: 5        f: 0.10000000   frank: 5
FATAL: The bit pattern 10xxxxxx is illegal for the first byte of a UTF-8 multibyte.
    • output ame.txt content:
AME (Analysis of Motif Enrichment): Compiled on Jul 31 2017 at 15:55:07
------------------------------
Copyright (c) XXXXX, 2009.

Command line
ame --verbose 4 --method linreg -oc foo_bar test.fa test.meme

In partition maximization mode.

Threshold p-value for reporting results: 0.05
1. LinReg MSE of motif  � (g) top 42012832 seqs: 1.6304e-322 (m: 2.0760e-316 b: 0.0000e+00)
    • Conclusion and next step: it looks like the program doesn't like special characters in the motif names (which is extracted from JASPAR_CORE_2016_vertebrates.meme). Remove .2 from motif name and run again
  •  meme_4.12.0/bin/ame --verbose 4 --method linreg -oc foo_bar test.fa test.CLEANED.meme
    • trace log:
ame Option: method - linreg
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.

M2: +MA0002 - Seq: chr1157283876157285044 Rankings[0] - pwm: 0.00000000 prank: 4        f: 0.80000000   frank: 1
M2: +MA0002 - Seq: chr213670229836702881 Rankings[1] -  pwm: 0.00000000 prank: 3        f: 0.60000000   frank: 2
M2: +MA0002 - Seq: chr213679253536793047 Rankings[2] -  pwm: 2.00000000 prank: 1        f: 0.50000000   frank: 3
M2: +MA0002 - Seq: chr213685838936858790 Rankings[3] -  pwm: 0.00000000 prank: 2        f: 0.20000000   frank: 4
M2: +MA0002 - Seq: chr213690970836910399 Rankings[4] -  pwm: 0.00000000 prank: 5        f: 0.10000000   frank: 5
Segmentation fault (core dumped)
    • output ame.txt content: empty 
    • Conclusion and next step: not sure if the change moved things forward or backwards. Try --method spearman with original motif file
  • /data/reddylab/software/meme_4.12.0/bin/ame --verbose 4 --method spearman -oc foo_bar test.fa test.meme
    • trace log:
ame Option: method - spearman
ame Option: oc - foo_bar
In LR/Spearman mode, x=PWM, y=FASTA score
The output directory 'foo_bar' already exists.
Its contents will be overwritten.

M2: +MA0002.2 - Seq: chr1157283876157285044 Rankings[0] -       pwm: 0.00000000 prank: 4        f: 0.80000000   frank: 1
M2: +MA0002.2 - Seq: chr213670229836702881 Rankings[1] -        pwm: 0.00000000 prank: 3        f: 0.60000000   frank: 2
M2: +MA0002.2 - Seq: chr213679253536793047 Rankings[2] -        pwm: 2.00000000 prank: 1        f: 0.50000000   frank: 3
M2: +MA0002.2 - Seq: chr213685838936858790 Rankings[3] -        pwm: 0.00000000 prank: 2        f: 0.20000000   frank: 4
M2: +MA0002.2 - Seq: chr213690970836910399 Rankings[4] -        pwm: 0.00000000 prank: 5        f: 0.10000000   frank: 5
Elapsed wall clock time: 1501599366 seconds
Elapsed CPU time:        0.000000 seconds
    •  output ame.txt content:
 AME (Analysis of Motif Enrichment): Compiled on Jul 31 2017 at 15:55:07
------------------------------
Copyright (c) XXXXX, 2009.

Command line
ame --verbose 4 --method spearman -oc foo_bar test.fa test.meme

Not in partition maximization mode. Fixing partition at 5.

Threshold p-value for reporting results: 0.05
Spearman MSE of motif +MA0002.2 top 5 seqs: 0
1. Spearman Rho of motif MA0002.2 RUNX1 (BBYTGTGGTTT) top 5 seqs: 0.0000e+00 
    • Conclusion: Spearman rank works fine (and so does fisher, BTW). Something wrong with `linreg`.

Do you have any idea on where the error might be? I'm happy to provide more info if needed. Following the 2010 paper, it seems that lineal regression is the best method overall, so if possible I would like to be able to run it. 

Thanks a lot!
Alex

test.fa
>chr1157283876157285044 0.8
ACAGGAGGATAATCCCATCAGAATACTCCAGGATTCCAACAGtatcttagtctatttagctactataataaaataccttagatgtataaatcactgggcaatttgtatataatagaaatttatttctcacagtctggaggctgagaagtccaatatcaaggcaccagtggattcaggatctggtaaggcagattcagggtctcttcatctcaccttcagctccatagatggcatcttattgtgacatcatcacatagcagagggggaaaCACTGCACTCAGTGGTGAAAGGGGAGAACAGTATGTCCTCACATGATGGAACGTGGGAACACCATGTCCTTACATTATGGAAGCAGGCAAcactgcgtcctcacatggtggaaggggcaaTACTGTGTCCTCACATGATGGAAAGCAGGAACACCGTGTCCTCACATGGTGGAAGGGGAAATACTGTGTCCTCACATGATGGAAAGTGGGAAcactgtgtcctcacatggtggaaggggcaaTactgtgtcctcacatggtggaaggggcaaTACTCTGTCATTACATAGTGGAAGCAGGGAACACTATGTCCTTACATAATGGAAGGAGAAACACCATGTCCTCATATGTTGGAAGTGGGGAACACCATGTCCTCATATGGAGAAAGGGGGTAAGACTATGTCCTCACATGGTGGAAGGGAGCAACACTGTGTCCTCACATGGTGAAAGGCAGGAACATCATGTCCTTATATGGTGGAAGGGGGAAACACAATGTCTTCACATGATCCaaggcaggaacactgtgtcctcacatggtggaagTGGGGAACActgtattctcacatggtggaaggaggaacactgtgccctcacatggttgaagtgacagaagggcaaaaggggtgacactgtcctcaagccattttatgagtaccaatcccattcttgggacctaattacctcctaaaacccggctcttcttcatgcttttgcattggggattaagttccaacatgaaacatccaaaccatagcaACGGTCTTCAGTGACCATTGTCATGaggggaaaagataggagctttggaatcatccagtgttggatctaaggctttgctctgccaccaactagttttgAATACTGTTGACCTATACCAACTATAGTAGGTTCTTATAACAATT
>chr213670229836702881  0.6
GGCACTCTGGAAGGGAGCTGTTTGGCCCTAGAGTTTTGGAAAGGGCCCTGAACCTGTTCGGTCCCCCTCGGAAAGGGAAGGGAGCAGTGGCTTAGTCCCTCCCTCCTCCATTCGTGCAATGCCTGGGGTAGGGGTAGACCTGGAGCCGGTGGACTCATATCCTTGGAATTCGTCAGGACAGCTGCTCCGGGGCCTTGGCCCTCAGTCAGTCTGGGGCTGAGGAGTAGGGAAGCTGGGAACTTGGGGCAGAGGAAGAAGATGCGTTTAGAAAGACCTCCATTATGCAAACTGGAGTCCATTTATGCAAACTGGTCACCCTTCCAGTAGCTCCAAAGAGTGGCAGTGGAGTGGCATCTTGATTGATTTAACCTCTTCTCAGGGGACCTGGGTCTGCGAGGGAGGATATGGCTGCGGGGTTGGAATAGGATCTGTCTGAGCTGCCAGGGTCAGGGTGGTGGCCCTAGGGAGGTTTTAGGGCCAGGGTGGTCCCGGGCTGTGGCAGGGGCTCTCAGATCGCCTCGGGCTCTCAGCTGCAAGGTGAAAAATACCATGAGGAATTGATCTGCCAAGGGCGGTCTTGTCT
>chr213679253536793047  0.5
AGCCAATTGTGCTACGTCAGTCCAGAGTGAGCTGCTACTTATGATTTCATTTCCTCCTTAGACTGATGGGAGCAATGACAatccccatttgtagatgaggcacctgctcagagaactgaagttactcatccagggtcacatggctgtcaagaggcatgcccagtacCTATGACAAACATTCATCCCCAAGTAACGAAGCCAACAACCTGTGTCTCACTCCAGAACCACAGAGCTGTTACAACACGTGGTGCCTCCTGAGCCAGCAGCCAGGGGCAAGAACGAGAGGATGCAGGGAGAGGCTCTGGGGAAAGGGCTGTGTCCCCTCAGTCATGTGACCCCCAGGCTTTCTAGAGTGACTCATGGTGAGGACAGCAAAGATCTCAAAATGCTGTCTAGACCTTAACTCTGCCCATCACCCCATCCATTTGGTTTTGTTTCCCTTTCATTTGCATGTGTAGTCACATGCTCAGCATCCTTGGGGGCAGCTGCATCTGTGGTCCTCCAGAACCCCA
>chr213685838936858790  0.2
CCAAGAGATGCCATGACTGATAAAACTGAGGTTGGAAAGATCAATGTCTTACAGAAAACAGAAGATTGGGGGGAAACGATTGAAACAGCACTGAATATGTTTATGAAATGCatccactcactcaacaaatatttaccgtgcctctgtgtcatccaagttccagggatacagcagtgagcaaagttcctgactttgtaaggtttaagtgatagagacagagacagatgttaaataacgcagtacatcaagggctgaggaggccttgaagaatagtggatggagggtgcctgcagtgaggtggtggctgctgttgtcaggctgggagggagcccgccctgccatgaggacctctgggcatctgcaggagacagggagcaggccatgcagctgtgcggggcaag
>chr213690970836910399  0.1
cagactcccaaagtgctgggtttacaggtgtgggccacATCACTTTAGAGCGGTTCCACCCCTGTTACTTGTCTTATCTGCTAAATTTACCCAGCTAACAAGATTCAGTGTTGAATTCAAAATATTTCTTATGATCTCCATGTATGTTTTGGAGCTTCTTTTCCCAAGAAACACCAAATAATTGATCTTAACCCATAAAGGTTTTCTATCTCAGAGACACATTTGCCTCATTGCAAATAATAAGATTAGTTGTGCTTCAGTCATAACATCGTAAGTTGCTAGGCAGCCAGAAGTGAGTTAATCAGAAAATGAATCATCAAAGACATGGTACGCAATCTCCTTTTATGGGTGTCTCATTCTTAAAACACTTTCAAGGAAGCAGAAGCAAATATTTGTCTACTCGTATGGAATTTATATGACGCAATCCAATTCTTACTATTGGTATTTGACACGCAAAATACTCACACGCAATTTCATGTTCGCTTGGTGTCTTCACTCACAGGACAAAGCCAGATGGCACAGCAAATCTTTATAATCAAAAGATGGCACTCCTGggccgggcgcagtggctaatgcctgtaatcccggcactttgggaggctgaggtgggtggatcacctgaggtcaggagtttgagaccagcctggccaacatggtgaaaccccatctctactaaaaaatacaaaaatta

test.meme
MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF MA0002.2 RUNX1

letter-probability matrix: alength= 4 w= 11 nsites= 2000 E= 0
  0.143500        0.248000        0.348000        0.260500
  0.117000        0.242500        0.233500        0.407000
  0.061500        0.536000        0.074500        0.328000
  0.028500        0.000000        0.003500        0.968000
  0.000000        0.037500        0.936000        0.026500
  0.043500        0.063500        0.035000        0.858000
  0.000000        0.000000        0.993500        0.006500
  0.008500        0.021000        0.924000        0.046500
  0.005000        0.200000        0.125500        0.669500
  0.065500        0.231500        0.040500        0.662500
  0.250000        0.079000        0.144500        0.526500


test.CLEANED.meme
MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF MA0002 RUNX1

letter-probability matrix: alength= 4 w= 11 nsites= 2000 E= 0
  0.143500        0.248000        0.348000        0.260500
  0.117000        0.242500        0.233500        0.407000
  0.061500        0.536000        0.074500        0.328000
  0.028500        0.000000        0.003500        0.968000
  0.000000        0.037500        0.936000        0.026500
  0.043500        0.063500        0.035000        0.858000
  0.000000        0.000000        0.993500        0.006500
  0.008500        0.021000        0.924000        0.046500
  0.005000        0.200000        0.125500        0.669500
  0.065500        0.231500        0.040500        0.662500
  0.250000        0.079000        0.144500        0.526500


Reply all
Reply to author
Forward
0 new messages