Problem about training characters like chinese character

Tran Minh Tuan

unread,

Jul 23, 2008, 1:46:33 AM7/23/08

to tesseract-ocr, thera...@gmail.com

I used tesseract 2.03 and train it with some tiff image of chinese
character, but it has a problem like this:
For example my training image have 20 characters, but in the my .box
file just have a few lines (5 line) so
it 's not corresponding to the training image so I can't set the
meaning for characters in box file
( image: 640X400, 300 dpi, greyscale ).

Can someone help me about this ? Thanks

Wenjing Jia

unread,

Jul 23, 2008, 10:09:51 PM7/23/08

to tesseract-ocr

Hi, I'm not exactly sure what the problem is with your case. Below is
based on my personal experience only.

The BOX file you expected is just the bounding box automatically
marked by tesseract, which simply to save up our labour work in
manually obtaining the bounding box. This bounding box info can be
inaccurate. Hence, we need to check them one by one and correct any
errors if there are. Otherwise, the following training procedure won't
go correctly.

Since you are training Chinese characters, it is not a surprise if the
initial bounding box info obtained by tesseract contains some errors,
because I suppose currently tesseract is not doing well enough to
recognise Chinese characters.

Maybe, you will need to manually mark and get the bounding box info
for each of characters in your sample image?

Tran Minh Tuan

unread,

Jul 23, 2008, 11:39:44 PM7/23/08

to tesseract-ocr

Hi, Thank you for you help. So in this step, tesseract will segment
each character
to specify the bounding box coordinates? And in my input image there
is about 20 characters
but in box file just about 6 line corresponding to bounding box.
So Can I use bbtesseract tool to correct this problem? (for each
character corresponding to bounding box of it).
By the way, when I use bbtesseract it occured thi error when I 'm
loading Tiff image. It seems it could not load "MagickNet.dll "
Can you instruct for me ?
Thanks

See the end of this message for details on invoking
just-in-time (JIT) debugging instead of this dialog box.

************** Exception Text **************
System.IO.FileLoadException: Could not load file or assembly
'MagickNet, Version=1.0.0.3, Culture=neutral, PublicKeyToken=null' or
one of its dependencies. This application has failed to start because
the application configuration is incorrect. Reinstalling the
application may fix this problem. (Exception from HRESULT: 0x800736B1)
File name: 'MagickNet, Version=1.0.0.3, Culture=neutral,
PublicKeyToken=null' ---> System.Runtime.InteropServices.COMException
(0x800736B1): This application has failed to start because the
application configuration is incorrect. Reinstalling the application
may fix this problem. (Exception from HRESULT: 0x800736B1)
at bbTesseract.IO.IMSaveNew(String SourceFileName, String Format,
String TargetDir)
at bbTesseract.f_bbT.OpenImage(Object sender, EventArgs e)
at System.Windows.Forms.ToolStripItem.RaiseEvent(Object key,
EventArgs e)
at System.Windows.Forms.ToolStripMenuItem.OnClick(EventArgs e)
at System.Windows.Forms.ToolStripItem.HandleClick(EventArgs e)
at System.Windows.Forms.ToolStripItem.HandleMouseUp(MouseEventArgs
e)
at
System.Windows.Forms.ToolStripItem.FireEventInteractive(EventArgs e,
ToolStripItemEventType met)
at System.Windows.Forms.ToolStripItem.FireEvent(EventArgs e,
ToolStripItemEventType met)
at System.Windows.Forms.ToolStrip.OnMouseUp(MouseEventArgs mea)
at System.Windows.Forms.ToolStripDropDown.OnMouseUp(MouseEventArgs
mea)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons
button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
at System.Windows.Forms.ToolStrip.WndProc(Message& m)
at System.Windows.Forms.ToolStripDropDown.WndProc(Message& m)
at
System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at
System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32
msg, IntPtr wparam, IntPtr lparam)

************** Loaded Assemblies **************
mscorlib
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
----------------------------------------
bbTesseract
Assembly Version: 1.0.0.0
Win32 Version: 1.0.0.0
CodeBase: file:///C:/Tuantm/Master%20Project/tesseract-ocr/bbT_exe_00_05_38/bbTesseract.exe
----------------------------------------
Microsoft.VisualBasic
Assembly Version: 8.0.0.0
Win32 Version: 8.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/Microsoft.VisualBasic/8.0.0.0__b03f5f7f11d50a3a/Microsoft.VisualBasic.dll
----------------------------------------
System
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
----------------------------------------
System.Windows.Forms
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
----------------------------------------
System.Drawing
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
----------------------------------------
System.Configuration
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Configuration/2.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll
----------------------------------------
System.Xml
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml/2.0.0.0__b77a5c561934e089/System.Xml.dll
----------------------------------------
System.Runtime.Remoting
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.832 (QFE.050727-8300)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Runtime.Remoting/2.0.0.0__b77a5c561934e089/System.Runtime.Remoting.dll
----------------------------------------

************** JIT Debugging **************
To enable just-in-time (JIT) debugging, the .config file for this
application or computer (machine.config) must have the
jitDebugging value set in the system.windows.forms section.
The application must also be compiled with debugging
enabled.

For example:

<configuration>
<system.windows.forms jitDebugging="true" />
</configuration>

When JIT debugging is enabled, any unhandled exception
will be sent to the JIT debugger registered on the computer
rather than be handled by this dialog box.

Message has been deleted

Tran Minh Tuan

unread,

Jul 23, 2008, 11:46:08 PM7/23/08

to tesseract-ocr

I used version bbTesseract V0.5.38 alpha

Reply all

Reply to author

Forward