Understanding Centrimo site_couts.txt file

63 views
Skip to first unread message

DJEKIDEL MOHAMED NADHIR

unread,
Jul 5, 2013, 11:19:38 PM7/5/13
to meme-...@googlegroups.com
Hello everybody,
I am running Centrimo on a a set of sequences I have and then I want to know the position of each motif in each sequence, it seems it is given in the site_couts.text, but I couldn't understand the format.
when reading through the code or index.html generated by centrimo, I can see that it gives only the sequence that have the motif and the center position, 
how ever in the site_couts.txt we cannot figure out which sequence it is.

I want to parse the file, so it would convenient to directly use the text file,

Any explanation, would be apreciated.

Thanks in advance

James Johnson

unread,
Jul 8, 2013, 4:59:25 PM7/8/13
to meme-...@googlegroups.com
CentriMo does not output the position of each motif in each sequence, to find that information you will need to use FIMO.

For each motif you give CentriMo it will scan it against all the sequences but it does not store the results of that scan in a way that identifies the sequence. Instead it has a list of counts for every position that the motif can align against the sequence. Initially those counts are set to zero but each time it scans a sequence it will add one count at the position of the best match to the motif in the sequence. If more than one position in the sequence had the equal best scoring match then that single count will be divided amongst all those positions. That list of counts is what CentriMo outputs as the site_counts.txt file.

Note that the sequence positions in site_counts.txt use the center of the sequence as the origin. Site counts are added at the center of a motif's position so if a motif of length 1 matched the 3rd base of a sequence of length 5 it would be considered to be at position 0. If a motif of length 2 matched the 3rd and 4th base of a sequence of length 5 it would be considered to be at position 0.5 as it is impossible for it to perfectly align with the center of the sequence in that case.

So the site_counts file contains:
DB db_# MOTIF motif_name motif_alternate_name
sequence_position_1 count_1
sequence_position_2 count_2
...
sequence_position_n count_n

Reply all
Reply to author
Forward
0 new messages