Re: Announcement: new version of pyTesseractTrainer available

3 views
Skip to first unread message

zdenko podobny

unread,
Aug 21, 2010, 4:51:23 AM8/21/10
to tesser...@googlegroups.com, pytesseractt...@googlegroups.com
Hi,

your problem is that you use tesseractTrainer.py that was done in 2007 and not pyTesseractTrainer.py (2010) that corrected this issue. I would suggest to use http://code.google.com/p/pytesseracttrainer/downloads/detail?name=pyTesseractTrainer-1.01.py or (if you are brave enough devel version: http://pytesseracttrainer.googlecode.com/svn/trunk/pyTesseractTrainer.py). In these case you do not need to solve problems that was solved already.

Anyway issues regarding tesseractTrainer.py/pyTesseractTrainer.py please post to http://code.google.com/p/pytesseracttrainer/issues/list or pytesseractt...@googlegroups.com

BR,

Zd.

On Sat, Aug 21, 2010 at 10:39 AM, tt <yury.tar...@gmail.com> wrote:
This Trainer variant won't open v3 box file:
Traceback (most recent call last):
 File "/home/ty/files/tesseractTrainer.py", line 546, in doFileOpen
   self.loadImageAndBoxes(fileName, chooser)
 File "/home/ty/files/tesseractTrainer.py", line 471, in
loadImageAndBoxes
   self.boxes = loadBoxData(boxName, height)
 File "/home/ty/files/tesseractTrainer.py", line 129, in loadBoxData
   (text, left, bottom, right, top) = line.split()
ValueError: too many values to unpack

It needs something like this diff to proceed (I made this recently for
own use, and I didn't care about 6th field semantics, yet):

--- tesseractTrainer.py.prev___^2009-04-07 12:18:08.000000000 +0300
+++ tesseractTrainer.py^2010-08-17 12:05:31.000000000 +0300
@@ -60,6 +60,7 @@
    right = 0
    top = 0
    bottom = 0
+    something = 0
    bold = False
    italic = False
    underline = False
@@ -126,7 +127,8 @@
    prevRight = -1
.
    for line in f:
-        (text, left, bottom, right, top) = line.split()
+        #print "%s\n" % (line)
+        (text, left, bottom, right, top, something) = line.split()
        s = Symbol()
.
        if (text.startswith('@')):
@@ -589,9 +596,9 @@
                if s.bold:
                    text = '@' + text
                #endif
-                f.write('%s %d %d %d %d\n' %
+                f.write('%s %d %d %d %d %d\n' %
                        (text, s.left, height - s.bottom, s.right,
-                         height - s.top))
+                         height - s.top, s.something))
            #endfor
        #endfor
        f.close()


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.


Reply all
Reply to author
Forward
0 new messages