Language file for MICR font

6,827 views
Skip to first unread message

Hunter

unread,
May 26, 2011, 3:04:21 AM5/26/11
to tesseract-ocr
Does anyone have a MICR language file they are willing to share?

I need to use Tesseract 2 (via TessNet2) to read cheque details.
Tesseract has a lot of difficultly reading the MICR font on the bottom
of the cheque so it will need to be trained. Rather than wasting a day
attempting to do this, it would be very cool if someone has this
already done. Even the box file would be a huge help.

Thankyou in advance!

Dmitri Silaev

unread,
May 26, 2011, 8:38:01 AM5/26/11
to tesser...@googlegroups.com, hbea...@gmail.com
Well, I can do that for you. Given that you provide me with 10-20
sample image files. One thing I can't do at the moment is to generate
final language files since I abandoned Tesseract 2 long time ago. So
these could be only box/tiff pairs.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Hunter

unread,
May 27, 2011, 5:11:21 AM5/27/11
to tesseract-ocr
I bit the bullet and did my own training - it wasn't as bad as i
thought. I didn't include all of the control chars in the spec - just
the ones i found on my cheque samples. Seems to detect all my cheques
perfectly - so far. If any wants my training files or tessdata (v2)
files, let me know.
Message has been deleted

Dmitri Silaev

unread,
May 27, 2011, 8:08:22 AM5/27/11
to tesser...@googlegroups.com, hbea...@gmail.com
I'd appreciate having box/tiff pairs.
Thanks in advance!

--
Dmitri

Sven Pedersen

unread,
May 27, 2011, 1:32:46 PM5/27/11
to tesser...@googlegroups.com
Me too. I've worked with MICR fonts before, and I'd like to see high
quality support for them in tesseract. I could spearhead a movement to
get them working properly. I believe most people have wanted to do it
commercially and have not shared their info, but if we did it as a
community it could yield much better recognition quality.
Thanks,
Sven

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Sriranga(78yrsold)

unread,
May 27, 2011, 1:47:12 PM5/27/11
to tesser...@googlegroups.com
Yes as community it could yield much better recognition quality for all langs including MICR. I appreciate your spearhead movement to quality support
in tesseract by way of  patches/suggestions as deemed fit to enable Ray to
release final stable version of tesseract-ocr earliest possible.
With regards,
-sriranga(78yrs)

Dmitri Silaev

unread,
May 27, 2011, 5:08:52 PM5/27/11
to tesser...@googlegroups.com, sven.p...@gmail.com
Agree. We can make ready version 3.0x traineddata files from box/tiff
pairs community provides, crediting the commencement to Hunter.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Hunter

unread,
May 31, 2011, 8:31:26 PM5/31/11
to tesseract-ocr
[I keep posting this, but it doesn't show up after 1 day. Attempt #4]

I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language files
and maybe a T3 compiled file.

I have posted the source files and the compile T2 language files on my
site: http://beanland.net.au/Programming/dotnet/TesseractMICR.zip

If you have any feedback, more samples etc, please send them to me (my
email is in the readme.htm file of the zip)

On May 26, 5:04 pm, Hunter <hbeanl...@gmail.com> wrote:

Hunter

unread,
May 31, 2011, 1:20:22 AM5/31/11
to tesseract-ocr
I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language files
and maybe a T3 compiled file.

I have posted the source files and the compile T2 language files on my
site: http://beanland.net.au/Programming/dotnet/TesseractMICR.zip

If you have any feedback, more samples etc, please send them to me (my
email is in the readme.htm file of the zip)

On May 28, 7:08 am, Dmitri Silaev <daemons2...@gmail.com> wrote:
> Agree. We can make ready version 3.0x traineddata files from box/tiff
> pairs community provides, crediting the commencement to Hunter.
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Fri, May 27, 2011 at 1:32 PM, Sven Pedersen <sven.peder...@gmail.com> wrote:
> > Me too. I've worked with MICR fonts before, and I'd like to see high
> > quality support for them in tesseract. I could spearhead a movement to
> > get them working properly. I believe most people have wanted to do it
> > commercially and have not shared their info, but if we did it as a
> > community it could yield much better recognition quality.
> > Thanks,
> > Sven
>
> > On Fri, May 27, 2011 at 7:08 AM, Dmitri Silaev <daemons2...@gmail.com> wrote:
> >> I'd appreciate having box/tiff pairs.
> >> Thanks in advance!
>
> >> --
> >> Dmitri
>
Message has been deleted
Message has been deleted

Hunter

unread,
Jul 5, 2011, 2:37:24 AM7/5/11
to tesseract-ocr
Since i found the TesseractDotNet project and have built a new MICR
language file for Tesseract 3. I have included the training files and
compiled files for Tesseract 2 and 3. The T3 file has more samples
used in its training and standard symbols for the 4 control glyphs.
http://beanland.net.au/Programming/dotnet/TesseractMICR.zip

On May 31, 3:20 pm, Hunter <hbeanl...@gmail.com> wrote:
> I only have a limited number of samples, but that seems to be enough
> for now (100% detection rate for everything i have). If I can collate
> a few more then i shall retrain and post the updated language files
> and maybe a T3 compiled file.
>
> I have posted the source files and the compile T2 language files on my
> site:http://beanland.net.au/Programming/dotnet/TesseractMICR.zip
>
> If you have any feedback, more samples etc, please send them to me (my
> email is in the readme.htm file of the zip)
>
> On May 28, 7:08 am, Dmitri Silaev <daemons2...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Agree. We can make ready version 3.0x traineddata files from box/tiff
> > pairs community provides, crediting the commencement to Hunter.
>
> > Warm regards,
> > Dmitri Silaevwww.CustomOCR.com
>
> > On Fri, May 27, 2011 at 1:32 PM, Sven Pedersen <sven.peder...@gmail.com> wrote:
> > > Me too. I've worked withMICRfonts before, and I'd like to see high
> > > quality support for them in tesseract. I could spearhead a movement to
> > > get them working properly. I believe most people have wanted to do it
> > > commercially and have not shared their info, but if we did it as a
> > > community it could yield much better recognition quality.
> > > Thanks,
> > > Sven
>
> > > On Fri, May 27, 2011 at 7:08 AM, Dmitri Silaev <daemons2...@gmail.com> wrote:
> > >> I'd appreciate having box/tiff pairs.
> > >> Thanks in advance!
>
> > >> --
> > >> Dmitri
>
> > >> On Fri, May 27, 2011 at 5:11 AM, Hunter <hbeanl...@gmail.com> wrote:
> > >>> I  bit the bullet and did my own training - it wasn't as bad as i
> > >>> thought. I didn't include all of the control chars in the spec - just
> > >>> the ones i found on my cheque samples. Seems to detect all my cheques
> > >>> perfectly - so far. If any wants my training files or tessdata (v2)
> > >>> files, let me know.
>
> > >>> On May 26, 5:04 pm, Hunter <hbeanl...@gmail.com> wrote:
> > >>>> Does anyone have aMICRlanguage file they are willing to share?
>
> > >>>> I need to use Tesseract 2 (via TessNet2) to read cheque details.
> > >>>> Tesseract has a lot of difficultly reading theMICRfont on the bottom

Kalaivani Subramaniyam

unread,
Aug 22, 2013, 5:37:38 AM8/22/13
to tesser...@googlegroups.com
Hi Hunter,

Can you please share me a code snippet how to read MICR code in IOS? I have downloaded your TesseractMICR tessdata and used mcr.traineddata but i get this error

other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 .

Kindly help me to fix this

Thanks


Brett Allred

unread,
Oct 8, 2013, 2:49:20 PM10/8/13
to tesser...@googlegroups.com
Kalaivani,

Any luck on this? I am getting the same error.


Brett

Arturo Pruneda

unread,
Nov 25, 2013, 2:34:24 PM11/25/13
to tesser...@googlegroups.com
Brett, Kalaivani,

Did you succeeded adding this file? I'm getting the same error 

Hemantha S S

unread,
Jan 2, 2014, 9:11:32 AM1/2/14
to tesser...@googlegroups.com

Hi,

This is Hemanth,

I am using tesseract 3.0 for micr recognition.The Engine as it is working for the  English and other language even the occuracy is also good.But When i tried for
MICR in android the application is crashing while loading the mcr.trained data. The error i am getting is "Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)"
We dont know what exactly is giving the problem.We tried debugging the native code(CPP) but is of no use.Can any one Help me out.

Thanks in advance.

Regards,
Hemanth

Nick White

unread,
Jan 3, 2014, 9:18:45 AM1/3/14
to tesser...@googlegroups.com
Hi,

Can you upload somewhere the training file that crashes Tesseract,
please? That will help us see what's up.

Thanks,

Nick
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to tesseract-oc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Hunter Beanland

unread,
Jan 3, 2014, 7:08:12 PM1/3/14
to tesser...@googlegroups.com

I thought all of the files were on my site. Maybe in a zip file in the main zip file. Will check out next week when I return from holidays.

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/obWI4cz8rXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

Sridhar Krishnamoorthy

unread,
Jan 6, 2014, 1:13:05 PM1/6/14
to tesser...@googlegroups.com
Hi Hemanth,

I do have the same issue. Please let me know if you have any update..

Thanks,
Sridhar

Hemantha S S

unread,
Jan 7, 2014, 2:20:00 AM1/7/14
to tesser...@googlegroups.com
Hi Nick,

Please find an attachment of micr trained data file.If you have mcr.trained data file for android Tesseract 3.0 please Can you share It would be great help.

Regards,
Hemanth
mcr.traineddata

Hemantha S S

unread,
Jan 7, 2014, 2:24:54 AM1/7/14
to tesser...@googlegroups.com

 Hi Hunter,

 We are using the mcr.traineddata which is downloaded from the following link.


Is there any other mcr.traineddata file to use for tesseract3.0+ on android.

Regards,
Hemanth

GOPI NATH

unread,
Feb 1, 2014, 2:45:05 AM2/1/14
to tesser...@googlegroups.com

GOPI NATH

unread,
Feb 1, 2014, 2:59:38 AM2/1/14
to tesser...@googlegroups.com
Hi All,
I need to train Tesseract version 3.02 for Reading MICR codes in Bottom of  Cheque(i.e for MICR Font),I read through Training Tesseract 3 page and i followed  the steps and generated Trained data from micr tif image
and i placed the mcr.trained data in tessdata folder of tesseract 3.02 console application.When i try executing this command tesseract console application>mcr.mcr_font.exp0.tif output -l mcr
I get this error other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 .

Even i downloaded tam.traineddata from downloads and placed it in tessdata,and i tried tam.tif output -l tam
i get this same error other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 .
One thing i want get clear is that> After generating lang.traineddata,,what are the configurations and changes to be done in Tesseract to get the expected output of that language.
It would be a great Help If someone give solution.
Thanks

On Thursday, 22 August 2013 15:07:38 UTC+5:30, Kalaivani Subramaniyam wrote:
Message has been deleted

Quan Nguyen

unread,
Feb 1, 2014, 12:23:07 PM2/1/14
to tesser...@googlegroups.com
I edited (adding 2 more boxes at the end), renamed some files, and retrained with Hunter's source files using jTessBoxEditor.

Quan
TesseractMicr-3.zip

GOPI NATH

unread,
Feb 2, 2014, 2:32:43 AM2/2/14
to tesser...@googlegroups.com
Hi Quan
Iam newbie to this tesseract ocr...I need to be clear and  clarify some doubts in tesseract...so only am asking this question 
For Any Trainedata language that is available at downloads,after being downloaded from there and placed in tessdata folder....what are the steps to be done to get the expected output for that language(i.e I mean is that any configurations to be done) or it is enough just placing the downloaded traineddata in tessdata folder,,,then running this command in ocr console application>tesseract imagename.tif output -l lang
It would be great help to me..If some one gives clear idea
Thanks 


On Sat, Feb 1, 2014 at 10:53 PM, Quan Nguyen <nguy...@gmail.com> wrote:
I edited (adding 2 more boxes at the end), renamed some files, and retrained with Hunter's source files using jTessBoxEditor.

Quan

--

Quan Nguyen

unread,
Feb 2, 2014, 11:14:03 AM2/2/14
to tesser...@googlegroups.com
Hi Gopi,

AFAIK, once having the .traineddata files in tessdata folder, you're set to go.

Quan

GOPI NATH

unread,
Feb 3, 2014, 10:44:22 AM2/3/14
to tesser...@googlegroups.com
HI Quan,
thanks for your Help.
1)I downloaded your mcr.traineddata file and placed it in tessdata  folder .
2)Opened console application and entered this command
   >tesseract mcr.tif output -l mcr
I get this error other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 .
Kindly help me to fix this

Another doubt i have is that whether to place both the files from source folder and the mcr.traineddata in tessdata folder or only the mcr.traineddata in tessdata folder.
Dont mistake me...It would be great help If you can provide your entire full setup of tesseract 3.02 project folder.

Thanks

Quan Nguyen

unread,
Feb 3, 2014, 6:58:45 PM2/3/14
to tesser...@googlegroups.com
You only need to copy mcr.traineddata in tessdata. I have a feeling that your app is using another mcr.traineddata.

GOPI NATH

unread,
Feb 4, 2014, 6:35:14 AM2/4/14
to tesser...@googlegroups.com
Hi Quan,
I am Using Tesseract 3.02
I am using your mcr.traineddata and  placed your file in tessdata folder.
When i try executing this command in tesseract console application>tesseract mcr.tif output -l mcr


I get this error other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 .

I Even tried for other Languages such as tam.traineddata i get this same error.

Everyone is getting this same Error as you can see the previous conversations
Kindly help me to fix this.
It would be great if anyone provides the Solution for it and Updates it here.

Anurag Kalra

unread,
May 29, 2014, 11:55:28 AM5/29/14
to tesser...@googlegroups.com
Hi,

Has anybody been able to solve this issue? I am getting the same problem:
>>
tesseract.exe check3.jpg out10 -l mcr
other_case < unicharset_size:Error:Assert failed:in file ..\..\ccutil\unicharset.cpp, line 737
<<
I copied the  mcr.traineddata in 'tessdata' folder
I am trying to read the MICR information from a check. Is there any other way to get that information if this training data doesn't work?
Thanks

Anurag Kalra

unread,
Jun 9, 2014, 3:32:59 PM6/9/14
to tesser...@googlegroups.com
Ok, the MICR training data shared by Quan is now working for me.

Mi Zhang

unread,
Jul 29, 2014, 2:40:20 AM7/29/14
to tesser...@googlegroups.com
I am having the exact same problem.  Can you explain how you solved the problem?  Really appreciate.

Juned Khan

unread,
Aug 14, 2014, 5:31:04 AM8/14/14
to tesser...@googlegroups.com
Hi All,
I want to integrate this in my Android app. I have put this
mcr.traineddata

file under this
 /mnt/sdcard/tesseract/tessdata

but still its not recognizing MICR codes on cheque.

is there anything else to be done to make this working ?

Anyone can please help me with this

Regards
Juned Khan

Juned Khan

unread,
Aug 14, 2014, 5:36:00 AM8/14/14
to tesser...@googlegroups.com
Hi Anurag,

What else you did to make this working ?
I have copied mcr.traineddata shared by Quan in appropriate directory but still my application is not recognizing MICR codes.

Can you share your thoughts regarding this?

Regards
Juned Khan
Message has been deleted

Quan Nguyen

unread,
Aug 14, 2014, 9:17:31 PM8/14/14
to tesser...@googlegroups.com
I simply copied the mcr.traineddata file into tessdata and specify -l mcr at the command line.
Message has been deleted

Juned Khan

unread,
Aug 20, 2014, 9:30:26 AM8/20/14
to tesser...@googlegroups.com
Thank you guys, I got this working for Android too.

Santhosh Kumar

unread,
Aug 20, 2014, 10:01:53 AM8/20/14
to tesser...@googlegroups.com
Hi Juned,
 
Could please post code snippet for MICR recognize in Android.
 
I have downloaded 'mcr.traineddata' and saved it in assests.
 
At runtime I have copied file into sdcard (OCR_DATA_PATH) and initialised Tessaract OCR engine with

baseApi.init(OCR_DATA_PATH, 'mcr', TessBaseAPI.OEM_TESSARACT_ONLY);

But I am getting Fatal signal 11 error.

Please help me on this.

Thanks

Santhosh Kumar.K

Message has been deleted

Juned Khan

unread,
Aug 21, 2014, 8:56:07 AM8/21/14
to tesser...@googlegroups.com
Hi Santhos

Here is the updated code for intialization

       
        TessBaseAPI baseApi = new TessBaseAPI();
        baseApi
.setDebug(true);
        baseApi
.init(DATA_PATH, "mcr");
        baseApi
.setImage(bitmap);

       
String recognizedText = baseApi.getUTF8Text();

        baseApi
.end();

Thanks
Juned Khan

Andrew Litvinov

unread,
Oct 30, 2014, 8:52:13 AM10/30/14
to tesser...@googlegroups.com
For me too. The one shared by Hunter doesn't work. (Ubuntu 14.04  , tesseract version 3.03)

Abu Balanandan

unread,
Feb 2, 2015, 4:19:18 PM2/2/15
to tesser...@googlegroups.com
Hi Anurag,

Can you tell us what you did to solve the 

other_case < unicharset_size:Error:Assert failed:in file unicharset.cpp, line 737 error?

Am getting this error whenever I try adding a new .traineddata file!

Siva Kumar

unread,
Aug 28, 2015, 10:12:00 AM8/28/15
to tesseract-ocr
Hi Anurag,
  I can't execute the line  ocr.Init(TessdataPath, "mcr", true); from my VS2010 application("eng" is working good). I have downloaded the MICR language pack from Hunter's link. Could you help me how get this work?

Jyotirmay

unread,
Aug 13, 2019, 10:57:47 AM8/13/19
to tesseract-ocr
Hello Juned,

I read the conversation above and implemented in my android application using Kotlin.
It is working. But it is returning some junk value instead of proper MICR number.

val baseApi = TessBaseAPI()
baseApi.setDebug(true)
baseApi.init(datapath, "mcr")
baseApi.setImage(bitmap)

val recognizedText:String = baseApi.utF8Text
Print.d("MICR: $recognizedText")

Can you tell me if I am missing something?

Thanks,
Jyotirmay
Reply all
Reply to author
Forward
0 new messages