a 32 692 165 958 0
b 221 734 354 958 0
c 32 446 165 628 0
d 221 488 354 628 0
e 32 275 165 373 0
f 221 317 277 373 0
tesseract bil.pat.exp0.tif bil.pat.exp0 box.trainTesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
APPLY_BOXES:
Boxes read from boxfile: 6
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,221)->(-734,277)
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,221)->(-488,277)
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,32)->(-734,88)
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,32)->(-488,88)
APPLY_BOXES: Unlabelled word at :Bounding box=(-373,32)->(-317,88)
Found 6 good blobs.
5 remaining unlabelled words deleted.
Generated training data for 6 wordsbil.pat.box 0 0 1 0 0a
b
c
d
e
f
$ unicharset_extractor bil.pat.exp0.box
Extracting unicharset from bil.pat.exp0.box
Wrote unicharset file ./unicharset.9
NULL 0 NULL 0
Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Joined [4a 6f 69 6e 65 64 ]
|Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Broken
a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # a [61 ]
b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # b [62 ]
c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # c [63 ]
d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # d [64 ]
e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # e [65 ]
f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # f [66 ]$ mftraining -F font_properties -U unicharset -O bil.unicharset bil.pat.exp0.tr
Read shape table shapetable of 0 shapes
Reading bil.pat.exp0.tr ...
Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char b: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char c: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char d: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char e: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char f: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Warning: no protos/configs for a in CreateIntTemplates()
Warning: no protos/configs for b in CreateIntTemplates()
Warning: no protos/configs for c in CreateIntTemplates()
Warning: no protos/configs for d in CreateIntTemplates()
Warning: no protos/configs for e in CreateIntTemplates()
Warning: no protos/configs for f in CreateIntTemplates()
Done!
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.4.0) : libpng 1.2.50 : libtiff 4.0.5 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a619104a-79d5-40ec-8a08-a6a9941ec292%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Juan Pablo,
The problem cannot be solved by Tesseract as is. Even given such perfect images like you've shown, Tesseract would fail since your "characters" are too disjointed, have no meaningful baseline and only happen as singletons.
However a simple and robust recognition can be implemented without Tesseract using common sense and a bit of programming. Of image processing operations, you only would need trivial thresholding. Though, some more involved image preprocessing is required to convert the image to the form close to what you've demonstrated in your sample images.
The said preprocessing would be needed anyway even if Tesseract worked for your "characters". Tell what you already have done so far in this direction so I can share more details about the above method, if you wish.
-Dmitri
#include <opencv2/highgui/highgui.hpp>#include <opencv2/imgproc/imgproc.hpp>#include <tesseract/baseapi.h>#include <leptonica/allheaders.h>#include <iostream>
using namespace cv;
int main(int, char**){ VideoCapture cap(0); // open the default camera if(!cap.isOpened()) // check if we succeeded return -1; int c; Mat gray; namedWindow("gray", 1); tesseract::TessBaseAPI tess; tess.Init("/usr/share/tesseract-ocr/tessdata/", "bil", tesseract::OEM_DEFAULT ); tess.SetPageSegMode(tesseract::PSM_SINGLE_WORD);
for(;;) { Mat frame; cap >> frame; cvtColor(frame, gray, CV_BGR2GRAY);
c = waitKey(30);
if(c == 27) break; else if(c > 0) { tess.SetImage((uchar*)gray.data, gray.cols, gray.rows, 1, gray.cols); Boxa* boxes = tess.GetComponentImages(tesseract::RIL_WORD, true, NULL, NULL); for(int i=0; i < boxes->n; i++){ BOX* box = boxaGetBox(boxes, i, L_CLONE); rectangle(gray, Point(box->x, box->y), Point(box->x+box->w, box->y+box->h), Scalar(255, 0, 0)); } char* out = tess.GetUTF8Text(); std::cout << out << std::endl; imshow("gray", gray); waitKey(4000); }else imshow("gray", gray); } tess.~TessBaseAPI(); // the camera will be deinitialized automatically in VideoCapture destructor return 0;}
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/12ffb9a1-8530-445f-b126-2b5a884efd3e%40googlegroups.com.
sir i am doing a project based on hand written character recognition based on google tesseract but i the problem is i dont find any suitable suit to make it learn for hand writting. sir after some research on internet, it has to be first to build BOX file of the image to be learned and then edit this file with the help any box file editor sir but i am not able to make box filr of the image ...can you plzzz tell me how to make box file ..??
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c743781d-3951-47ae-9ecc-77266bf20075%40googlegroups.com.
...