Hi,
your problem is that you use tesseractTrainer.py that was done in 2007
and not pyTesseractTrainer.py (2010) that corrected this issue. I would suggest
to use
http://code.google.com/p/pytesseracttrainer/downloads/detail?name=pyTesseractTrainer-1.01.py
or (if you are brave enough devel version:
http://pytesseracttrainer.googlecode.com/svn/trunk/pyTesseractTrainer.py).
In these case you do not need to solve problems that was solved
already.
Anyway issues regarding tesseractTrainer.py/pyTesseractTrainer.py please post to
http://code.google.com/p/pytesseracttrainer/issues/list or
pytesseractt...@googlegroups.com
BR,
Zd.
On Sat, Aug 21, 2010 at 10:39 AM, tt
<yury.tar...@gmail.com> wrote:
This Trainer variant won't open v3 box file:
Traceback (most recent call last):
File "/home/ty/files/tesseractTrainer.py", line 546, in doFileOpen
self.loadImageAndBoxes(fileName, chooser)
File "/home/ty/files/tesseractTrainer.py", line 471, in
loadImageAndBoxes
self.boxes = loadBoxData(boxName, height)
File "/home/ty/files/tesseractTrainer.py", line 129, in loadBoxData
(text, left, bottom, right, top) = line.split()
ValueError: too many values to unpack
It needs something like this diff to proceed (I made this recently for
own use, and I didn't care about 6th field semantics, yet):
--- tesseractTrainer.py.prev___^2009-04-07 12:18:08.000000000 +0300
+++ tesseractTrainer.py^2010-08-17 12:05:31.000000000 +0300
@@ -60,6 +60,7 @@
right = 0
top = 0
bottom = 0
+ something = 0
bold = False
italic = False
underline = False
@@ -126,7 +127,8 @@
prevRight = -1
.
for line in f:
- (text, left, bottom, right, top) = line.split()
+ #print "%s\n" % (line)
+ (text, left, bottom, right, top, something) = line.split()
s = Symbol()
.
if (text.startswith('@')):
@@ -589,9 +596,9 @@
if s.bold:
text = '@' + text
#endif
- f.write('%s %d %d %d %d\n' %
+ f.write('%s %d %d %d %d %d\n' %
(text, s.left, height - s.bottom, s.right,
- height - s.top))
+ height - s.top, s.something))
#endfor
#endfor
f.close()
--