Question about mapping motif binding sites to genome location FIMO out put

103 views

Skip to first unread message

kanw...@gmail.com

unread,

Oct 14, 2017, 11:55:58 AM10/14/17

to MEME Suite Q&A

I used RSAT to get 5 genes -2000 bp upstream sequence of TSS. I used this FASTA file and binding motif (identified from my experiment) in FIMO to see where is the binding site of the identified motif. I know that protein of interest bound very close to TSS. I get following results from FIMO output

to get -2000 upstream TSS seq

# motif_id

motif_alt_id

sequence_name

start

stop

strand

score

p-value

q-value

matched_sequence

start

end

strand

D20_ENSG00000130164-A-ENST00000557958

108

41.9286

1.14E-14

6.18E-10

CTCTGCCACCCAGGCTGGAGTGCAATGGC

chr19

11102268

11104267

D102_ENSG00000161048-B-ENST00000425379

106

134

37.6735

4.50E-13

1.15E-08

CTCTGTCACCCAGGCTGGAATACAGTGGC

chr7

103128761

103130760

D17_ENSG00000130164-D-ENST00000558518

309

337

32.8367

1.19E-11

1.10E-07

CTCTGTCACCCAGGCTGGAGCGCAGTGAC

chr19

11130163

11132162

In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.

In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.

Thanks

cegrant

unread,

Oct 19, 2017, 8:20:21 PM10/19/17

to MEME Suite Q&A

By default, FIMO measures coordinates for the sequence starting at 1, at the first letter in the sequence. To translate FIMO's coordinates to genomic coordinates just add the FIMO coordinate to starting sequence coordinate from RSAT and subtract 1. This applies to both the start and stop positions.

In your example you know from RSAT that the genomic coordinates for the start of the first sequence is chr19 11102268. FIMO's coordinate for the starting position of the first match is 80, so the genomic coordinates for the start of FIMO's first match are

11102268 + 80 - 1 = 11102347

FIMO's coordinates the stop position of the first match is 108, so the genomic coordinates for the end of FIMO's first match are

11102268 + 108 - 1 = 11102375

The reverse strand uses the same coordinate system, measured from the first position of the sequence.

Reply all

Reply to author

Forward

0 new messages