"Make sure to separate the sequence from its name by 2 or more spaces."

396 views
Skip to first unread message

John Deszyck

unread,
Oct 16, 2021, 3:21:06 PM10/16/21
to PAML discussion group
Hi all.  I'm new here and I'm getting an error message that doesn't make a lot of sense.

I have a file in fasta format with some sequences. It has a bunch of data items that look like this:

>GCF_000186725.1_ASM18672v1_genomic.prot&NC_017265.1_270_710  
ATGGCTGACATAACGTTGATAAGTGGCAGTACGCTTGGTAGTGCTGAATATGTTGCTG
CATTTAGCGGATAAATTAGAAGAAGCTGGGTTTTCTACAGAAATACTTCATGGCCCAGA
TTGGACGAACTTACGCTGAATGGCCTGTGGTTAATCGTGACATCCACTCATGGTGCCG
GATCTACCTGATAACTTGCAGCCATTATTAGAACAGATCGAACAACAAAAGCCTGATTT
TCCCAAGTACGCTTTGGGGCGGTTGGTTTAGGCAGCTCAGAATATGACACTTTCTGCG
GCAATCATAAAACTGGATCAACAATTGATCGCACAAGGTGCTCAACGGTTGGGTGAA
TTAGAAATTGACGTCATCCAACATGAAATACCAGAGGATCCAGCAGAGATTTGGGTCA
GATTGGATTAATTTACTC

That is, a ">", a filename, and some sequence data.  The file contains 37 of these.  I run yn00 on this file using the following control file

      seqfile = /overflow/bobaylab/jd/paml.test.out/fam1.fa
      outfile = /overflow/bobaylab/jd/paml.test.out/fam1.out
      icode = 0
      weighting = 1

And I get output.  yn00 reads the 37 sequences, then prints this:

        Reading sequences, sequential format..

        Error in sequence data file: O at 1 seq 1.
       Make sure to separate the sequence from its name by 2 or more spaces.


Which doesn't make any sense.  Why add two spaces at the end of a line, where they'll be invisible?  But just in case I wrote a program to add two spaces to the end of every line like this

>GCF_000186725.1_ASM18672v1_genomic.prot&NC_017265.1_270_710  

But when I run yn00 on the updated files I still get the same error.

I know I must be missing some basic knowledge here.  Any help would be greatly appreciated.

Nicholas Bailey

unread,
Oct 21, 2021, 7:07:27 PM10/21/21
to PAML discussion group
I don't know what causes that specific error though I believe paml will not generally accept FASTA as a seqfile format. You may need to convert it to a phylip file. This is a relatively simple conversion and there's probably plenty of scripts available online to do it. 



I have personally used the perl script in the second link and it was fine but I noticed it put the sequences on the same line as the names, with paml still didn't like. I scripted something simple to make sure the sequences were in the lines after the name, as a FASTA file is. 
Reply all
Reply to author
Forward
0 new messages