bad sequence identifier error: meme-chip in discriminative mode

9 views
Skip to first unread message

Kaustav Mukherjee

unread,
Mar 19, 2025, 12:06:03 AMMar 19
to MEME Suite Q&A
While uploading the "background" Fasta file set generated by Homer for the hg38 build for discriminative meme-chip search, I receive an error: "bad sequence identifier: Found X sequences with identifiers longer than 50 characters on lines ..."

Is there an easy way to get around this other than trying to fix many many lines on the fasta file?


cegrant

unread,
Mar 19, 2025, 12:19:04 AMMar 19
to MEME Suite Q&A
Each sequence in the sequence database needs to have a unique name, but because there may be thousands of sequences the names have to be limited in size to save memory. We chose to set the limit at 50 characters. The tools in the MEME Suite generally only consider the characters up to the first white space character when setting the name..
If you have names that are longer than 50 characters, then yes you will have to fix all those sequence names.  It would be fairly straightforward to write a script that simply truncates the names at less than 50 characters, but depending on what information is being conveyed in the name, it might  not be unique. You'll have to evaluate that for your sequences. 

Reply all
Reply to author
Forward
0 new messages