jTessBoxEditor 0.6 Beta release

1,956 views
Skip to first unread message

Quan Nguyen

unread,
Oct 2, 2011, 11:50:00 PM10/2/11
to tesseract-ocr
A box editor for Tesseract OCR data. This release includes the
following fixes and enhancements:

- Add a utility function which creates TIFF/Box pair suitable for
training with Tesseract
- Fix a bug which may clear out a modified box file when loading
another image

Please help test and post your comments/suggestions here. Thanks.

http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/
http://vietocr.sourceforge.net/training.html

Quan Nguyen

unread,
Oct 18, 2011, 7:20:36 PM10/18/11
to tesseract-ocr
jTessBoxEditor v0.6 Release

- Add a utility function that creates TIFF/Box pair suitable for
training with Tesseract
- Fix a bug which may clear out a modified box file when loading
another image
- Enhance box search operations
- Fix font issues in various visual components
- Merged box will have a character value composed of all the
characters of the merging boxes

http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/

Quan Nguyen

unread,
Jun 17, 2012, 2:32:30 AM6/17/12
to tesser...@googlegroups.com
jTessBoxEditor v0.7 has been released with the following enhancements:
  • Increase line spacing
  • Fix an issue with opening Help file on OS X
  • For TIFF/Box generation:
    • abbreviate bold/italic font style to b/i for filename
    • add a Prefix (Language Code) textbox
    • add support for text anti-aliasing

http://vietocr.sourceforge.net/training.html

Also, the PowerShell script train.ps1 has been updated to automate training with Tesseract 3.02 on Windows platform.

http://vietocr.svn.sourceforge.net/viewvc/vietocr/jTessBoxEditor/trunk/tools/

Quan Nguyen

unread,
Apr 17, 2013, 11:19:08 PM4/17/13
to tesser...@googlegroups.com
Version 0.8 has been released with the following enhancements:

- Add row number header
- Char cell now editable
- Convert Unicode escape sequences where possible
- Find box now displays Unicode characters and allows search using Unicode escape sequences
- Improve Generate TIFF/Box functionality:
  * automatically combine boxes that have the same coordinates or completely encloses one another
  * automatically combine boxes that are combining symbols, specified in an external file, with the main, base character
  * retain last-modified exp number in filename

http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/

Quan Nguyen

unread,
Apr 30, 2013, 5:58:12 PM4/30/13
to tesser...@googlegroups.com
Yes, it runs on Ubuntu. Just unzip and execute run script. Be sure to have Java installed first.

On Tuesday, April 23, 2013 12:17:21 AM UTC-5, mama wrote:
Sir
Is it work in UBUNTU
I did't get jTessBoxEditor for UBUNTU
Thank
mama
Message has been deleted

Quan Nguyen

unread,
Apr 30, 2013, 6:02:00 PM4/30/13
to tesser...@googlegroups.com
Version 0.9 Release:

- Enhance Generate TIFF/Box functionality to allow for combining prepending symbols in addition to appending
- Fix a bug that failed to persist changes to table in edit mode
- Find function now supports partial matches
- Fix a problem with table not scrolling along when row header has focus and scrolling

http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/

mamata nayak

unread,
May 4, 2013, 9:04:25 AM5/4/13
to tesser...@googlegroups.com
Sit,
As per your reply I have successfully install jTessBoxEditor, but not able to open the input box file.
So please kindly help me the steps followed.
thank you
 


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mamata nayak

unread,
May 4, 2013, 9:10:33 AM5/4/13
to tesser...@googlegroups.com
sir,
After giving this command at the command prompt, the output as follows
java -Xms128m -Xmx512m -jar jTessBoxEditor.jar
4 May, 2013 6:21:23 PM java.util.prefs.FileSystemPreferences$2 run
INFO: Created user preferences directory.
Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
    at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173)
    at java.awt.Window.<init>(Window.java:546)
    at java.awt.Frame.<init>(Frame.java:419)
    at java.awt.Frame.<init>(Frame.java:384)
    at javax.swing.JFrame.<init>(JFrame.java:174)
    at net.sourceforge.tessboxeditor.Gui.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithMRU.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithEdit.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithSpinner.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithFont.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithLaF.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithTools.<init>(Unknown Source)
    at net.sourceforge.tessboxeditor.GuiWithTools$2.run(Unknown Source)
    at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:226)
    at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:673)
    at java.awt.EventQueue.access$300(EventQueue.java:96)
    at java.awt.EventQueue$2.run(EventQueue.java:634)
    at java.awt.EventQueue$2.run(EventQueue.java:632)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:643)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
    at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)

However i could not get how to open the window
jTessBoxEditor Swing UIBox View
jTessBoxEditor Swing U

Please reply me
Thank you


On Wed, May 1, 2013 at 3:32 AM, Quan Nguyen <nguy...@gmail.com> wrote:

--
Message has been deleted

Quan Nguyen

unread,
May 4, 2013, 10:08:32 AM5/4/13
to tesser...@googlegroups.com
What Ubuntu and Java versions are installed on your machine? You probably has a headless Java -- i.e., one without graphics libraries. Can you use Oracle Java 7, which is the version I tested with? Thanks.

http://askubuntu.com/questions/55848/how-do-i-install-oracle-java-jdk-7

mamata nayak

unread,
May 5, 2013, 1:35:07 PM5/5/13
to tesser...@googlegroups.com
Sir
I have install JDK follwing the steps given in the given site http://askubuntu.com/questions/55848/how-do-i-install-oracle-java-jdk-7
Now my  java -version is
$java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) Server VM (build 23.21-b01, mixed mode)


please tell me is it required to Enable mozilla firefox plugin???

Another thing is it necessary to install Oracle Java JDK 7 as given in the site http://www.webupd8.org/2011/09/how-to-install-oracle-java-7-jdk-in.html

whats the difference ...........

please reply me
thank you

mamata nayak

unread,
May 6, 2013, 2:49:33 AM5/6/13
to tesser...@googlegroups.com
Sir ,
I have attached the format of jTessBoxEditor.
It could open the image file, please tell me how to open the box file using this editor.
or is it necessary to open the tif file and convert into box.
Please suggest me.
thanking you


Screenshot at 2013-05-06 12:13:47.png

Sven Pedersen

unread,
May 6, 2013, 6:09:57 AM5/6/13
to tesser...@googlegroups.com
JDK is only needed for development, so you should not need it for the box editor. And the Mozilla plugin is needed only for the web browser.
Sven
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Shree Devi Kumar

unread,
May 6, 2013, 4:12:18 AM5/6/13
to tesser...@googlegroups.com
Mamata,

It all depends on what you want to do.

If you want to open an image file and have the tesseract software create a box file for it, then you should try QTBOXeditor. I have found it to work well with single page .png files. It does not work for me for both single and multipage tifs. These box files need to be edited to correct the text placed for the boxes by tesseract.

If you have a text file, for which you want to generate a matching tif and box file (without using tesseract) using the provided text, please use jTessBoxEditor
Menu item -
Tools
Generate Tiff/Box

Later you can modify the box files through the editor.

Please read the program documentation / help file for more details.

Shree

If you want t

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

mamata nayak

unread,
May 8, 2013, 1:23:40 AM5/8/13
to tesser...@googlegroups.com
Good Morning Sir,
Thanks for your reply.
Now my problem is, for few set of characters the jTessBoxEditor could open the corresponding tif files but for few other it can't be generate the box co-ordinate.


  



On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen <nguy...@gmail.com> wrote:

mamata nayak

unread,
May 8, 2013, 1:15:06 AM5/8/13
to tesser...@googlegroups.com
Thank you sir for your suggestion.
I accomplish the steps to install Qtboxeditor as follows
1. download zdenop-qt-box-editor-v1.10-11-gcdb923a.tar.gz
2. $tar -xvf zdenop-qt-box-editor-v1.10-11-gcdb923a.tar.gz
3. $cd zdenop-qt-box-editor-cdb923a
4. $sudo apt-get install libqt4-dev
5. $qmake
6. $make
but i did not know how to load a tif or box file.
please help me
Thank you

mamata nayak

unread,
May 8, 2013, 1:29:43 AM5/8/13
to tesser...@googlegroups.com
Good Morning Sir,
Thanks for your reply.
Now my problem is, for few set of characters of my language the jTessBoxEditor could open the corresponding tif file and generate its box file but for few other it can't be generate the box co-ordinate.Please sir I have attached the file.


On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen <nguy...@gmail.com> wrote:
Screenshot at 2013-05-08 10:57:16.png

Quan Nguyen

unread,
May 8, 2013, 8:17:20 AM5/8/13
to tesser...@googlegroups.com
You would need to run the tesseract command to generate the box file for your image, e.g.:

tesseract eng.timesitalic.exp0.tif eng.timesitalic.exp0 batch.nochop makebox

Check Tesseract Training Wiki for more details.

http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Once you have the TIFF/Box pair, you can open it in jTessBoxEditor.

mamata nayak

unread,
May 11, 2013, 11:31:57 AM5/11/13
to tesser...@googlegroups.com
Thank you sir.
I could able to detect a set of character set of my language.
However a single character among all of those i.e ଫୀ is recognized as character pairs differently at different place in training image such as କ୍ଷୀଛୀ, ନୀନୀ .ଯୀଛୀ, ପୀଛୀ, ବୀନୀ as it occurs 5 times
.
then i use unicharambigs file having the information as follows
v1
2    କ୍ଷୀ ଛୀ    1    ଫୀ    1   
2    ନୀ ନୀ    1    ଫୀ    1
2    ଯୀ ଛୀ    1    ଫୀ    1
2    ପୀ ଛୀ    1    ଫୀ    1
2    ବୀ ନୀ    1    ଫୀ    1
But the problem while recognizing these pair of characters it replace with ଫୀ
So please understood my problem and give suggestion.
thanking you

Shree Devi Kumar

unread,
May 12, 2013, 2:53:29 AM5/12/13
to tesser...@googlegroups.com

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

mamata nayak

unread,
May 21, 2013, 4:56:16 AM5/21/13
to tesser...@googlegroups.com
Sir
Can you please tell me, the recent list of indian languages those are trained the tesseract-ocr engine.

Thank you

Shree Devi Kumar

unread,
May 21, 2013, 6:08:16 AM5/21/13
to tesser...@googlegroups.com
Mamata,
Please see https://code.google.com/p/tesseract-ocr/downloads/list for the available language data friles for tesseract 3.02. In case Odia is similar to bangala, you can use the bengali traineddata to bootstrap for odia.

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


mamata nayak

unread,
Jun 1, 2013, 2:38:11 AM6/1/13
to tesser...@googlegroups.com
Sir
While I extract the existing eng.traineddata the  following error appears
$ combine_tessdata -u tesseract-ocr/tessdata/eng.traineddata  /home/temp/eng.
Extracting tessdata components from tesseract-ocr/tessdata/eng.traineddata
Error openning /home/temp/eng.unicharset

 


On Sat, Jun 1, 2013 at 11:35 AM, mamata nayak <mamat...@gmail.com> wrote:
Sir,
please help me
Actually character set of my language consists of about 500 characters.
I have divide these into subset's i.e about 10 .tif files and generate box file and edit those using Qt editor separately and then use the following command:

$ cat >> LohitOriya.tr C.e0.tr

to concatenate one .tr files with the previously generated LohitOriya.tr file. 

$ unicharset_extractor A.3.box B.e0.box C.e0.box

to generate the unicharset  file.

Please response as early as possible.

Eagerly waiting
$unicharset_extractor       

mamata nayak

unread,
Jun 1, 2013, 2:05:45 AM6/1/13
to tesser...@googlegroups.com
Sir,
please help me
Actually character set of my language consists of about 500 characters.
I have divide these into subset's i.e about 10 .tif files and generate box file and edit those using Qt editor separately and then use the following command:

$ cat >> LohitOriya.tr C.e0.tr

to concatenate one .tr files with the previously generated LohitOriya.tr file. 

$ unicharset_extractor A.3.box B.e0.box C.e0.box

to generate the unicharset  file.

Please response as early as possible.

Eagerly waiting
$unicharset_extractor       

mama

unread,
Sep 2, 2013, 4:56:55 AM9/2/13
to tesser...@googlegroups.com
Sir please help me

During installation of tesseract-3.01 I got this error during make

svutil.cpp: In static member function 'static void SVSync::StartProcess(const char*, const char*)':
svutil.cpp:89:18: error: 'fork' was not declared in this scope
svutil.cpp:119:28: error: 'execvp' was not declared in this scope
svutil.cpp: In member function 'void SVNetwork::Close()':
svutil.cpp:262:16: error: 'close' was not declared in this scope
svutil.cpp: In constructor 'SVNetwork::SVNetwork(const char*, int)':
svutil.cpp:417:14: error: 'sleep' was not declared in this scope
make[3]: *** [svutil.lo] Error 1
make[3]: Leaving directory `/home/mamata/tesseract-3.01/viewer'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/mamata/tesseract-3.01/viewer'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/mamata/tesseract-3.01'
make: *** [all] Error 2

when i upgrade ubuntu to 13.04


On Monday, October 3, 2011 9:20:00 AM UTC+5:30, Quan Nguyen wrote:

iram akbar

unread,
Oct 28, 2014, 9:31:25 AM10/28/14
to tesser...@googlegroups.com
anyone tell me the training tool for .net.. jtessbox editor is java based. 

Quan Nguyen

unread,
Oct 30, 2014, 7:18:00 PM10/30/14
to tesser...@googlegroups.com
You only need JRE to run jTessBoxEditor.

The AddOns page lists a few .NET tools.

iram akbar

unread,
Nov 5, 2014, 5:38:13 AM11/5/14
to tesser...@googlegroups.com
thank you Quan, jtessbox editor supports Arabic language?

Quan Nguyen

unread,
Nov 5, 2014, 8:43:54 AM11/5/14
to tesser...@googlegroups.com
Yes. The latest version supports RTL languages.

iram akbar

unread,
Nov 6, 2014, 4:19:02 AM11/6/14
to tesser...@googlegroups.com
i have downloaded the lates version 1.1 of jtessbox editor but i am unable to see the Arabic language option from the list. that's why you can see my text is not appearing in Arabic.

Question: How can i get the Arabic text so i can generate TIFF of that.   Moreover i have tried coptic language but get no result. 
issue.png

ShreeDevi Kumar

unread,
Nov 6, 2014, 5:07:27 AM11/6/14
to tesser...@googlegroups.com
Please also  change the FONT under TRAINER tab to Arabic .

ShreeDevi

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
arabic.png

iram akbar

unread,
Nov 6, 2014, 5:30:31 AM11/6/14
to tesser...@googlegroups.com
thanks for your reply. i am able to type Arabic in jtessbox builder and also entered the Arabic Language in the text box under trainer tab. but when i give the attached (arabic) input file, text is not displayed in Arabic (please see my previous attachment).

Question: how can i get the text in in Arabic in jtessbox editor to generate Tiff while giving the attached input ِfile containing Arabic language
test.txt

ShreeDevi Kumar

unread,
Nov 6, 2014, 6:38:25 AM11/6/14
to tesser...@googlegroups.com
Click on the 'generate' box - with some devanagri fonts I have found that text does not display but the tiff/box are generated. Maybe same for the arabic font you are using. Give it a try.

You can also try to copy and paste the text, sometimes that works.

iram akbar

unread,
Nov 6, 2014, 7:19:53 AM11/6/14
to tesser...@googlegroups.com
thank you for your help but my issue still exits. if i need to generate the Tiff of an image text i am unable to generate the TIFF as it only ask to load the text file not image file. second if i have a lots of documents i need to copy paste first then generate the TIFF. Any one else can help me in this.
Question: how can i Input the Arabic text image in jtessbox editor to generate Tiff (as attached). 
Capture.JPG

ShreeDevi Kumar

unread,
Nov 6, 2014, 8:37:17 AM11/6/14
to tesser...@googlegroups.com
​I think you are using the wrong tools ...

If you need to convert a jpg to tif, use an image editor such as imagemagick, irfanview

If you need to OCR the image, tesseract accepts jpg as input as well as tif

There already is arabic traineddata for tesseract - see https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata

A newer version of traineddata should be available with the release of 3.04 which should be soon.

Regarding creating box/tiff 

I was able to use Jtessboxeditor for creating arabic box/tiff - I just copied some text from wikipedia and pasted in the Jtess.

see attached..


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
ara.arabictypesetting.exp0.box
ara.arabictypesetting.exp0.tif

Quan Nguyen

unread,
Nov 7, 2014, 9:42:37 AM11/7/14
to tesser...@googlegroups.com
Look in samples folder for a working example. You can start out from a UTF-8 text file about 2-page long, generate TIFF/Box from it, and prepare other necessary input files for training. You can train entirely in jTessBoxEditor.
Message has been deleted

iram akbar

unread,
Nov 10, 2014, 2:41:34 AM11/10/14
to tesser...@googlegroups.com
Quan,
i am able to generate some files with jtess ox editor but i am having an issue, when i select "Train with existing box" or "Train from Scratch" under the Trainer tab i am getting this attached message.
Question: How i can generate the Arabic.font_properties, Arabic.frequent_word_list and Arabic.words_list files using jtessbox editor?
traindata.PNG

ShreeDevi Kumar

unread,
Nov 10, 2014, 5:30:21 AM11/10/14
to tesser...@googlegroups.com
Look under jtessboxeditor/samples/vie folder

and create similar files for your language

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Nov 10, 2014 at 1:10 PM, iram akbar <irama...@gmail.com> wrote:
Quan,
i am able to generate some files with jtess ox editor but i am having an issue, when i select "Train with existing box" or "Train from Scratch" under the Trainer tab i am getting this attached message.
Question: How i can generate the Arabic.font_properties, Arabic.frequent_word_list and Arabic.words_list files using jtessbox editor?

On Friday, 7 November 2014 19:42:37 UTC+5, Quan Nguyen wrote:

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.

iram akbar

unread,
Nov 10, 2014, 7:31:40 AM11/10/14
to tesser...@googlegroups.com
thank you for your help regarding training data. one more thing there is a icon near character box (see attachment). it is not functional on my side. i was expecting by clicking the icon it will give you correction option. 
Question: is it active or not active?
options.PNG

Quan Nguyen

unread,
Nov 10, 2014, 5:48:36 PM11/10/14
to tesser...@googlegroups.com
You can edit the letters by manually typing in the Character textbox or in the Char table cells.

iram akbar

unread,
Nov 12, 2014, 5:46:57 AM11/12/14
to tesser...@googlegroups.com
Hi,

i am able to generate the required files with jtessbox editor. i want to use Serak for training but i am getting attached error.Debugging give me no solution. according to my knowledge you don't need to generate the files like "frequent words file" in Serak. You just need to train the image and then combine the tessdata and you will get the required output file. 
note: i am training Arabic. 

Question: why i am getting the attached error although i am training simple Arabic 1 line sentence . please share the solution.
issue.PNG

iram akbar

unread,
Nov 13, 2014, 1:42:20 AM11/13/14
to tesser...@googlegroups.com
FYI: i have found that by giving different fonts in jtessbox editor you will got below above error. so now i am creating TIFF by giving "Monospaced" font as per default jtessbox editor settings. 

iram akbar

unread,
Nov 20, 2014, 5:56:17 AM11/20/14
to tesser...@googlegroups.com
Hello shree,

i am having an issue while training arabic in Serak (for box file generation i am using jtessbox editor). i am doing some testing. i have assigned  english alphabet for a single arabic word and created the box file as attached (jtessbox file). now following all training process in serak i got the OCR result as attached. although you can see in the box file there is 4 alphabets "A,B,C,D" but i was expecting OCR result will be ABCD but the results are BDBBAABBBBA as attached (serak result).
Question: why i a getting that result? is it some wrong while making box file in jtessbox editor or training in serak?


On Monday, 10 November 2014 15:30:21 UTC+5, shree wrote:
jtess box file.JPG
serak results.JPG

ShreeDevi Kumar

unread,
Nov 20, 2014, 7:33:16 AM11/20/14
to tesser...@googlegroups.com
I have not used Serak - but the issues page there indicates problems with RTL languages - see https://code.google.com/p/serak-tesseract-trainer/issues/detail?id=6

why are u not using jtessbox editor's trainer or the command line programs? I think the binaries are bundled with JTess...



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

iram akbar

unread,
Nov 20, 2014, 9:07:35 AM11/20/14
to tesser...@googlegroups.com
it seems its a known issue of Serak. i have created the "ara" folder with files as "vie" folder in jtessbox editor as you can see in attachment. after that i have set the box file path in jtessbox editor of "Tesseract executable" and "Training data" for "ara" as attached. when i click the "Run" button i got the attached error. i don't know what goes wrong here.
Question: m i giving the wrong file in the path in "Tesseract executable" and "Training data" i.e ara box file? or what goes wrong.
note: i have put no data words_list, frequent_words, font_properties file. 


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
ara folder.JPG
jtess error.JPG

Quan Nguyen

unread,
Nov 20, 2014, 7:30:34 PM11/20/14
to tesser...@googlegroups.com
No need to change "Tesseract executable" setting. You need an entry in .font_properties file for arialunicodems font.

I strongly suggest you re-read the training wiki before continuing on.

iram akbar

unread,
Nov 21, 2014, 1:09:11 AM11/21/14
to tesser...@googlegroups.com
thank you for your help. can you please share what is the tesseract 3.04 release date?

Reply all
Reply to author
Forward
0 new messages