regarding identical sequences

61 views
Skip to first unread message

Jiaqiao Zhou

unread,
Sep 27, 2022, 6:49:15 PM9/27/22
to MEME Suite Q&A

Hello, I am trying to use MEME to find motifs for a group of proteins.

The problem is that there are several sequences that are identical to each other.

I wonder if I should remove the identical ones and then decide to use MEME or STREME

according to the number of sequences.


Another question is that if the similarity among sequences is very high,

is it ok to set a larger maximum width of motifs?

 

Thanks in advance.

cegrant

unread,
Oct 6, 2022, 11:23:23 PM10/6/22
to MEME Suite Q&A

MEME provides a facility for weighting sequences in cases like this. In the MEME Suite FASTA documentation look at the section on “Weights”. Each sequence in the MEME input sequence file can be assigned a numerical weight. Sequences that are nearly identical can be down weighted. In the example there are three sequences, two of which are nearly identical. The two identical sequences are each assigned a weight of 0.5, and the remaining non-similar sequence is assigned a weight of 1.0

This facility is only available in MEME though. It is not available in STREME.


Another question is that if the similarity among sequences are very high,

is it ok to set a larger maximum width of motifs?


The underlying assumption of MEME and STREME is that all of the input sequences are independent. If you your input file includes sequences that are really just duplicates of each other, the estimates of statistical significance are going to be unreliable. To take an extreme case, if I have a single sequence and simply copy and paste it a couple of dozen times, MEME/STREME are going to report a very significant E-value. However, it’s purely an artifact caused by my duplicating the sequences. 

The trouble is that in biology, whether sequences are truly independent of each other can be ambiguous. In the end it has to be a judgment call based on your experiment, and your understanding of the biology you are study.
Reply all
Reply to author
Forward
0 new messages