Question about mapping motif binding sites to genome location FIMO out put

93 views
Skip to first unread message

kanw...@gmail.com

unread,
Oct 14, 2017, 11:55:58 AM10/14/17
to MEME Suite Q&A
 I used RSAT to get 5 genes -2000 bp upstream sequence of TSS. I used this FASTA file and binding motif (identified from my experiment) in FIMO to see where is the binding site of the identified motif. I know that protein of interest bound very close to TSS. I get following results from FIMO output
to get -2000 upstream TSS seq  
# motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence ch start end strand
2 D20_ENSG00000130164-A-ENST00000557958 80 108 + 41.9286 1.14E-14 6.18E-10 CTCTGCCACCCAGGCTGGAGTGCAATGGC chr19 11102268 11104267 D
2 D102_ENSG00000161048-B-ENST00000425379 106 134 - 37.6735 4.50E-13 1.15E-08 CTCTGTCACCCAGGCTGGAATACAGTGGC chr7 103128761 103130760 R
2 D17_ENSG00000130164-D-ENST00000558518 309 337 - 32.8367 1.19E-11 1.10E-07 CTCTGTCACCCAGGCTGGAGCGCAGTGAC chr19 11130163 11132162 D


In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.

In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.

 

Thanks

cegrant

unread,
Oct 19, 2017, 8:20:21 PM10/19/17
to MEME Suite Q&A
By default, FIMO measures coordinates for the sequence starting at 1,  at the first letter in the sequence. To translate FIMO's coordinates to genomic coordinates just add the FIMO coordinate to starting sequence coordinate from RSAT and subtract 1. This applies to both the start and stop positions. 

In your example you know from RSAT that the genomic coordinates for the start of the first sequence is  chr19 11102268. FIMO's coordinate for the starting position of the first match is 80, so the genomic coordinates for the start of FIMO's first match are

11102268 + 80 - 1 = 11102347  

 FIMO's coordinates the stop position of the first match is 108, so the genomic coordinates for the end of FIMO's first match are

11102268 + 108 - 1 = 11102375

The reverse strand uses the same coordinate system, measured from the first position of the sequence.
Reply all
Reply to author
Forward
0 new messages