to get -2000 upstream TSS seq | |||||||||||||
# motif_id | motif_alt_id | sequence_name | start | stop | strand | score | p-value | q-value | matched_sequence | ch | start | end | strand |
2 | D20_ENSG00000130164-A-ENST00000557958 | 80 | 108 | + | 41.9286 | 1.14E-14 | 6.18E-10 | CTCTGCCACCCAGGCTGGAGTGCAATGGC | chr19 | 11102268 | 11104267 | D | |
2 | D102_ENSG00000161048-B-ENST00000425379 | 106 | 134 | - | 37.6735 | 4.50E-13 | 1.15E-08 | CTCTGTCACCCAGGCTGGAATACAGTGGC | chr7 | 103128761 | 103130760 | R | |
2 | D17_ENSG00000130164-D-ENST00000558518 | 309 | 337 | - | 32.8367 | 1.19E-11 | 1.10E-07 | CTCTGTCACCCAGGCTGGAGCGCAGTGAC | chr19 | 11130163 | 11132162 | D |
In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.
In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.
Thanks