Bad format "Invalid BED file" Line 1: sequence chrI not found in genome database.

85 views
Skip to first unread message

zealot

unread,
Dec 20, 2022, 11:56:31 PM12/20/22
to MEME Suite Q&A
Hi,

I keep getting this error message while using MEME: Bad format "Invalid BED file" Line 1: sequence chrI not found in genome database.

I can't figure out where I did wrong. 
The settings are:
Image1.png

The bed file content:
Image2.png

Charles E Grant

unread,
Dec 21, 2022, 12:27:44 AM12/21/22
to dcz60...@gmail.com, MEME Suite Q&A
Hi zealot,

The name given in columns 1 of the BED file has to exactly match the name of a sequence in the FASTA file. Genbank is probably not a great choice for this as they include a lot of meta-data in their sequence names. For example the first sequence in the Genbank FASTA file for Sac. cer. uid 128 is:

>gi|329136676|tpg|BK006934.2| TPA_inf: Saccharomyces cerevisiae S288c chromosome VIII, complete sequence

which clearly doesn’t match any of the sequence names in your BED file.

You could try using the Ensembl Fungi database for Saccharomyces cerevisiae instead. But beware! Those sequences simply named with the number of the chromosome. They aren’t prefixed with ‘chr’. So a typical sequence name would look like:

>I dna:chromosome chromosome:Sc_YJM1248_v1:I:1:198792:1 REF

The sequence name in the BED file has to match the string between the ‘>’ and the first white space, so your BED file would have to have sequence names I, II, III, etc, not chrI, chrII, charIII.

We also have the UCSC version online in the ‘UCSC Other collection’ as sacCer1. They name their sequences chr1, chr2, chr3, etc. That almost matches your BED file, but with Arabic rather than Roman numerals.


Charles

> On Dec 20, 2022, at 8:56 PM, zealot <dcz60...@gmail.com> wrote:
>
> This Message Is From an Untrusted Sender
> You have not previously corresponded with this sender.
> See https://itconnect.uw.edu/guides-by-topic/email-calendaring/protecting-your-email/email-tags/ for additional information.
>
> Contact the UW-IT Service Center, he...@uw.edu 206.221.5000, for assistance.
> Hi,
>
> I keep getting this error message while using MEME: Bad format "Invalid BED file" Line 1: sequence chrI not found in genome database.
>
> I can't figure out where I did wrong.
> The settings are:
> <Image1.png>
>
> The bed file content:
> <Image2.png>
>
>
> --
> You received this message because you are subscribed to the Google Groups "MEME Suite Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to meme-suite+...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/meme-suite/969c7b39-dda3-4a50-831c-b2d5765c72e5n%40googlegroups.com.
> <Image2.png><Image1.png>

zealot

unread,
Dec 21, 2022, 12:50:59 AM12/21/22
to MEME Suite Q&A
It works now! Thanks for your help!

Aastha Pal

unread,
Mar 10, 2025, 9:15:30 PMMar 10
to MEME Suite Q&A
Hi,

I had the same error and I fixed it by assing chr in front of chromosome number. However, I get the following error:

# Warning: Ignoring sequence '>chr9:112224825-112224827(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:112226224-112226226(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:112234299-112234301(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr10:78284641-78284643(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113441825-113441827(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113645463-113645465(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr10:78287196-78287198(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113695378-113695380(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113699659-113699661(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113705910-113705912(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113707633-113707635(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113719007-113719009(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113726813-113726815(+)' because it is too short (2 < 8).

# Warning: Ignoring sequence '>chr9:113731515-113731517(+)' because it is too short (2 < 8).

ERROR: No (valid) sequences in multiple FASTA file.

ERROR: There was a problem reading the primary sequence file 'output.bed.fa'.

Thanks,
Aastha

Aastha Pal

unread,
Jun 5, 2025, 5:31:21 PMJun 5
to MEME Suite Q&A
Hi,

Following up on this as my bed.file looks like this- 


Can you help?

Thank you,
Aastha 

cegrant

unread,
Jun 8, 2025, 12:23:22 AMJun 8
to MEME Suite Q&A
Note that each sequence specified by your BED file is only two positions long, 112224825-112224827 for example. You can't compare a sequence with only two nucleotides to a motif that is 8 nucleotides long.

Charles

Reply all
Reply to author
Forward
0 new messages