"Error: EOF?" at line 170 of 194 when reading DNA sequence data in baseml

1,559 views
Skip to first unread message

Kurt Wollenberg

unread,
Jul 24, 2014, 11:24:57 AM7/24/14
to pamlso...@googlegroups.com
Hello:

I am trying to run an ancestral sequence reconstruction in baseml and am having a problem getting the data read into the program. The error I am getting is

Reading seq #170: SequenceID
Error: EOF?

where SequenceID is the sequence identifier in my data. It doesn't seem to matter what sequence is at line 170, as I've swapped around a few and keep getting this error at this line. Looking at the EOL characters doesn't turn up anything as they are all the same. Because it doesn't matter which line from the data file is at line #170 this doesn't appear to be due to some hidden character in my file. I have tried this with v4.8 and v4.7a (running on a Mac Xeon desktop under OS 10.8.5) and get the same error. Because this is so aggravating I'm betting it's a simple typo somewhere in my sequence data file, but for the life of me I can't see it and the error message isn't giving me any clues. Help?

genn_2014

unread,
Aug 1, 2014, 3:51:49 PM8/1/14
to pamlso...@googlegroups.com
Hello Kurt Wollenberg,
Did you figure out the error? If yes, can you please share the solution? Because I am getting the same error with my files.  I am sure that it is due to simple formatting issues. But I tried my luck on reformatting the input files, no luck so far. 

Running baseml for Seq1.fa
BASEML in paml version 4.4, January 2010

ns = 2   ls = 1379
Reading sequences, interlaved format..
Reading seq # 2: Seq id ---------- ---------T TGGTAATTAA TGACAC     
err reading site 1344, seq 1 group 29
sites read in each seq:
Error: EOF?.

Ziheng

unread,
Aug 3, 2014, 3:39:54 PM8/3/14
to pamlso...@googlegroups.com
In both those posts, it is almost certain that something is wrong with the sequence data file, especially since you are getting the same error from 4.7 and 4.8. It is much less likely to be a program bug. The precise reason is less clear.
EOF stands for end of file.
One thing to check is the CR (cartridge return) and LF (line feed) characters. You can open the file in ms word, for example, insert a space and then delete it, and then save the file as a text file. If you have the option of choosing "text file with line breaks", then choose it. Then try again. If you have another editor you can try to save the file as well. Also if you are using the command line, you can use more or cat or type to show the file content on the monitor to see whether anything is obviously wrong.

The error message from the second post seems to suggest that you are using the interleaved format rather than sequential format. Check that this is what you intend. See the document for the file formats.
Ziheng

Hugo Rody

unread,
Aug 10, 2014, 8:19:36 PM8/10/14
to pamlso...@googlegroups.com
hey buddy, did you solve your problem?

Brittany Rife

unread,
Sep 20, 2014, 3:04:32 PM9/20/14
to pamlso...@googlegroups.com
I am actually having the same problem. I opened my file in Text Wrangler and made sure that it was saved in Unix format, but I am still getting the same error.

Ziheng

unread,
Oct 16, 2014, 4:57:36 PM10/16/14
to pamlso...@googlegroups.com
Reading seq # 2: Seq id ---------- ---------T TGGTAATTAA TGACAC


In this case, you should add at least two spaces between the sequence name and the sequence.
In the normal behavior, the program prints out the sequence name here but not part of the sequence. The output above indicates that the program got confused and thought the following is your sequence name:


" Seq id ---------- ---------T TGGTAATTAA TGACAC "

Also use Unix and dos commands cat, more etc . to confirm that your sequence file is a plain text file. Some text editors add mysterious strange characters without the user's knowledge.

Ziheng

Ruoqian Xiong

unread,
Nov 18, 2015, 1:51:56 PM11/18/15
to PAML discussion group
Could somebody help me with this error as shown in the attached screenshot? Greatly appreciated!
屏幕快照 2015-11-18 下午1.48.54.png

Ziheng

unread,
Feb 19, 2016, 4:08:41 PM2/19/16
to PAML discussion group
add spaces after "orangutan".
right now part of the sequence is read as part of the name.
look at the line for seq #10.
ziheng

SK

unread,
Dec 15, 2016, 12:21:56 AM12/15/16
to PAML discussion group
I have been having similar issues with the sequences not being read properly. I have been trying the different solutions posted on this thread but nothing seems to be working. Here is the error I keep receiving: 

BASEML in paml version 4.9, March 2015


ns = 6 ls = 153

Reading sequences, sequential format..

Reading seq # 1: Px


Error in sequence data file: J at 136 seq 1.

Make sure to separate the sequence from its name by 2 or more spaces.


C:\Users\Samiksha\Downloads\paml4.9c\bin\baseml.exe finished


And this is the latest format I have been using: 

6   153    

Px       ATGCTAATGAACGTTCCATCGGAAGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTGGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG            

 

ju                     ATGCTAATGAACGTTCCATCGGAAGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTGGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG            

 

cb                    ATGCTTATGAACGTCCCATCGGTGGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTCGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG              

 

n2                  ATGCTTATGAACGTCCCATCGGAGGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTCGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG              

 

eg7                                  ATGCTTATGAACGTCCCATCGGAGGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTCGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG                   


eg9                    ATGCTTATGAACGTCCCATCGGAGGCTTCAGGCCAAACACGTGTACTTTCTGCAAATGCACTCGAAACTACGATGAAAACGATACCGATTCGTGATGTGTGTGCTAATCAACTGGATAAAGGAGTCGTGGAAGAG      


I have tried to change the sequence names so the J will not be read but when I do that it still reads that position as a continuation of the first sequence. I am not sure what I am doing wrong. Any help would be greatly appreciated!! Thank you very much! 

cajawe

unread,
Dec 15, 2016, 10:30:15 AM12/15/16
to PAML discussion group
Your alignment indicates that your sequences are 153 bp long, but they aren't: they're 135 bp long.
Reply all
Reply to author
Forward
0 new messages