Version 3.02 in alpha

849 views
Skip to first unread message

Ray Smith

unread,
Feb 2, 2012, 1:55:57 PM2/2/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Wil Hadden

unread,
Feb 3, 2012, 6:24:52 AM2/3/12
to tesseract-ocr
Hi Ray,

Any idea of timescales when there will be a 3.02 package on the
downloads page of googlecode?

Or are there any release notes between 3.01 and 3.02, I'm, just a bit
wary of being bleeding edge :)

Wil

Sriranga(78yrsold)

unread,
Feb 3, 2012, 7:06:28 AM2/3/12
to tesser...@googlegroups.com
When tried to generate exe files using VS2008 but failed. where exe files will be stored? in bin or bin.dbg or training folder ?

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Sriranga(78yrsold)

unread,
Feb 3, 2012, 7:32:27 AM2/3/12
to tesser...@googlegroups.com
Attached release notes for 3.02. Download can be done from svn of the project site.tesseract-ocr - Project Hosting on Google Code
cheers,
-sriranga(79yrs)

On Fri, Feb 3, 2012 at 4:54 PM, Wil Hadden <wilh...@gmail.com> wrote:
ReleaseNotes

zdenko podobny

unread,
Feb 3, 2012, 7:33:39 AM2/3/12
to tesser...@googlegroups.com
Do you have VS2008 for linux ;-) (as Ray wrote  "currently Linux-only") ?

PS: I work on patches for VS2008, but there are some problems... I need to made some additional tests...

Zdenko

Sriranga(78yrsold)

unread,
Feb 3, 2012, 7:44:42 AM2/3/12
to tesser...@googlegroups.com
Zdenko,
Thanks for the information. I don't have VS2008 in Linux but in winXP(sp3) :-). Actually i downloaded from svn into ubuntu 11.10 and then  copied to winxp. Since there was file tesseract.sln in the folder "VS2008", as such I tried- only 24 succeeded.  Now I shall wait for patches for VS2008 are uploaded.
With Warmest Regards,
-sriranga(79yrs)

Sriranga(78yrsold)

unread,
Feb 3, 2012, 11:29:10 AM2/3/12
to tesser...@googlegroups.com
zdenko,
Tried in ubuntu 11.10 - failed to install even after following the guidelines in wiki. In this connection attached typescript for your perusal and valuable guidance. Where i made mistake may kindly be intimated to me.
With Warmest Regards,
-sriranga(79yrs)
typescript

Derek Dohler

unread,
Feb 3, 2012, 7:56:37 AM2/3/12
to tesser...@googlegroups.com
I'm excited by this:
Added simultaneous multi-language capability.

Can you provide any info on how this works?

Cheers,
Derek 

Ray Smith

unread,
Feb 3, 2012, 11:59:00 AM2/3/12
to tesser...@googlegroups.com
Try using eng+hin as the language code...

zdenko podobny

unread,
Feb 3, 2012, 12:10:21 PM2/3/12
to tesser...@googlegroups.com
On Fri, Feb 3, 2012 at 5:29 PM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:
zdenko,
Tried in ubuntu 11.10 - failed to install even after following the guidelines in wiki.

No, you did not follow guidelines in wiki [1]. Try to read it first ;-)

Speedy

unread,
Feb 3, 2012, 12:44:37 PM2/3/12
to tesseract-ocr
I'd be very interested in this as well. How does it work?

I mean, if I have a font in one language and another in the other
language, dies it make sure that no characters from different
languages are intermingled in the same word? How about in the same
line? Is there a way to influence this? Does the result contain
information about which language matched?

Two weeks ago I asked a question about combining fonts. This feature
could be a perfect answer to my question.

Best regards,
Marcus

On 3 Feb., 17:59, Ray Smith <theraysm...@gmail.com> wrote:
> Try using eng+hin as the language code...
>
>
>
> On Fri, Feb 3, 2012 at 4:56 AM, Derek Dohler <doh...@gmail.com> wrote:
> > I'm excited by this:
>
> >> Added simultaneous multi-language capability.
>
> > Can you provide any info on how this works?
>
> > Cheers,
> > Derek
>
> > On Fri, Feb 3, 2012 at 4:32 PM, Sriranga(78yrsold) <
> > withblessi...@gmail.com> wrote:
>
> >> Attached release notes for 3.02. Download can be done from svn of the
> >> project site.tesseract-ocr - Project Hosting on Google Code<http://code.google.com/p/tesseract-ocr/>
> >> cheers,
> >> -sriranga(79yrs)
> >http://groups.google.com/group/tesseract-ocr?hl=en- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Speedy

unread,
Feb 3, 2012, 12:50:32 PM2/3/12
to tesseract-ocr
Another feature that sounds very promising are the bigrams. Is this a
feature that works on a word level? Does this include a probability
for the first word? I.e., is position 0 a valid context for a bigram?
So for example, if I wanted to recognize license plates and I know
that the first one or two characters always encode the city, the
bigram probability to go from position 0 to one of the cities should
be much higher than to go to any other character combination. And if I
know that after that there can only be digits, I could put the digit
sequences into the dictionary and have a high bigram probability for
that. Would that work?

Best regards,
Marcus

On 3 Feb., 13:32, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> Attached release notes for 3.02. Download can be done from svn of the
> project site.tesseract-ocr - Project Hosting on Google
> Code<http://code.google.com/p/tesseract-ocr/>
>  ReleaseNotes
> 17KAnzeigenHerunterladen- Zitierten Text ausblenden -

Speedy

unread,
Feb 3, 2012, 5:07:14 PM2/3/12
to tesser...@googlegroups.com
Getting packages into Ubuntu precise would be awesome!  As someone involved in putting together Vinux, a distribution of Ubuntu for the blind and visually impaired, OCR is essential.  We have several utilities people have built to simplify these tasks. 

Is tesseract version 3.02 backward compatible with version 2.04?  Perhaps it is better to ask if bash and Python scripts written for tesseract 2.04 command line will break if used with the new tesseract version 3.02?  I realize that this would not take advantage of most of the new, great features.  I am excited about the new capavilities and am anxious to utilize them!  I am curious if ocropus or any of our other OCR utilities will break. 

By the way, the lead developer for Vinux at the moment is Luke Yelavich, an Ubuntu Engineer tasked with accessibility.  If there are issues with packaging or getting into precise, perhaps we can help. 


Don Marang
Vinux Package Development Coordinator - vinuxproject.org

zdenko podobny

unread,
Feb 3, 2012, 5:32:28 PM2/3/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
I just uploaded some fixes to VC2008 build - target was to compile and run tesseract.exe ("tesseract.exe eurotext.tif eurotext" produced output :-) )

Please test it. Feel free to improve it.

I still continue to support the current "vs2008 structure".  When Tom will finalize his contribution[1] I will adapt it to 3.02 version and use it for next tesseract release.

Zdenko

Sriranga(78yrsold)

unread,
Feb 4, 2012, 2:31:15 AM2/4/12
to tesser...@googlegroups.com
Zenko,
Thanks for the valuable guidance. in fact I had followed http://code.google.com/p/tesseract-ocr/wiki/TesseractSvnInstallation -which leads to confusion. Now I followed as per your valuable guidance, downloaded all required items as per readme http://code.google.com/p/tesseract-ocr/wiki/ReadMe. tried to install in ubuntu 11.10 but failed vide typescript or untitled cocument.Kindly intimate me where I made mistake?

 I will test in WinXP now.

With warmest Regards,
-Sriranga(79yrs)
typescript
Untitled Document

Zdenko Podobný

unread,
Feb 4, 2012, 3:30:10 AM2/4/12
to tesser...@googlegroups.com
You are not able to compile any c++ program on linux from source. This is our of tesseract scope to learn you how to compile source.
You should read first some manual how to compile program from source.

Zdenko

Sven Pedersen

unread,
Feb 4, 2012, 12:23:51 PM2/4/12
to tesser...@googlegroups.com
Hi Sriranga,
You need to install the development tools for C++ on your Ubuntu
system. Something like
sudo apt-get install build-essential
sudo apt-get install autotools

Then try the instructions again as you did. You've been very helpful
to people on this list in the past. Sorry Zdenko was a bit rude --
developers often don't like to explain the fundamentals of compiling
software because they have to do it a lot.
Good luck!
--Sven

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Patrick Questembert

unread,
Feb 5, 2012, 8:45:25 AM2/5/12
to tesser...@googlegroups.com
I just did and I get this error:
"Error opening data file tessdata/eng+ell.traineddata"

I am passing "eng+ell" as the language parameter (2nd parameter) in:

myTess->Init(tessDataDir.c_str(), language, OEM_DEFAULTNULL, 0false);

No issue when using just "ell" or "eng". Should I be using a different/new API?

Thanks,
Patrick

zdenko podobny

unread,
Feb 5, 2012, 12:00:38 PM2/5/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
Ray,

I got 'Empty page!!' message from tesseract 3.02 for attached image (created as 'convert -rotate 10 phototest.tif phototest-r.png'). Tesseract 3.01 was able to handle it [1]...

Zdenko


On Thu, Feb 2, 2012 at 7:55 PM, Ray Smith <thera...@gmail.com> wrote:
phototest-r.png

Sriranga(78yrs)

unread,
Feb 5, 2012, 4:49:09 AM2/5/12
to tesser...@googlegroups.com
Downloaded r -666 from the svn - today and generated exe files using VS2008 - result  == Build: 21 succeeded, 4 failed, 0 up-to-date, 0 skipped ==. I am shocked to note tesseract.exe  and  ambiguous_words.exe missing. In bin folder contains only 7exe files.(1)cntraining.exe,(2) combine_tessdata.exe, (3)mftraining.exe,(4) unicharset_extractor.exe and(5) shapeclustering.exe (6) wordlist2dawg.exe (7)dawg2wordlist.exe.
-sriranga(79yrs)


On Sat, Feb 4, 2012 at 1:46 PM, Sriranga(78yrs) <withblessing....@gmail.com> wrote:
zdenko,
in WinxP, I was able to  build viz. = Build: 22 succeeded, 3 failed, 0 up-to-date, 0 skipped ===
when checked in bin.dbg - it contains (1)cntraining.exe,(2) combine_tessdata.exe, (3)mftraining.exe,(4)tesseract.exe,(5) unicharset_extractor.exe and(6) wordlist2dawg.exe and also(7) liblept168d.dll. In the Debug folder contains(8) ambiguous_words.exe,

Thus I was able to locate 7exe files and one Dll file and rest(14 files out of 22 succeeded) could not located. i may kindly intimated where i made mistake, if any?

Also tested as suggested and output was fine vide attached files for persual. i am thankful to you for the valuable guidance rendered to me.

Sriranga(78yrs)

unread,
Feb 5, 2012, 12:14:17 PM2/5/12
to tesser...@googlegroups.com
failed to attach exp0.tr in the previous mail and now   tried to attach again - may rejected by gmail.

On Sun, Feb 5, 2012 at 10:40 PM, Sriranga(78yrs) <withblessing....@gmail.com> wrote:
Tested using tesseract-ocr 3.02 in WinXP(with sp3).
Tried to generate .tr file using the following commandline. exp0.box was generated successfully - but failed to generate exp0.tr file exp0.txt - attached herewith for perusal.
M:\rao- files\chilume\test-3.02>tesseract exp0.tif exp0 batch.nochop makebox
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
M:\rao- files\chilume\test-3.02>tesseract exp0.tif exp0 nobatch box.train logfile
Tesseract Open Source OCR Engine v3.02 with Leptonica
M:\rao- files\chilume\test-3.02>

Guidance is requested.
-sriranga(79yrs)

Sriranga(78yrs)

unread,
Feb 4, 2012, 12:45:26 PM2/4/12
to tesser...@googlegroups.com
Sven Pedersen,
I am really glad you still remember me! thanks for the valuable guidance. Yes
I followed  your guidance and successfully able to run the tesseract in ubuntu 11.10.
However I faced with problem of "export=TESSDATA_PREFIX='/usr/local/share/' "  and then
i tried to copy "sudo cp *.traineddata /usr/local/share/. as a last attempt. successfully copied all files without any further problem. I have to learn many things as a student  under the  people like you. i cannot forget your great help rendered to me during last year.

I am also thankful to Zdenko for his valuable guidance for installing in Winxp successfully. It is natural that Developers are often under hectic pressure of work as well as problems faced by them.
With Warmest Regards,
-sriranga(79yrs)

Sriranga(78yrs)

unread,
Feb 4, 2012, 3:16:04 AM2/4/12
to tesser...@googlegroups.com
zdenko,
in WinxP, I was able to  build viz. = Build: 22 succeeded, 3 failed, 0 up-to-date, 0 skipped ===
when checked in bin.dbg - it contains (1)cntraining.exe,(2) combine_tessdata.exe, (3)mftraining.exe,(4)tesseract.exe,(5) unicharset_extractor.exe and(6) wordlist2dawg.exe and also(7) liblept168d.dll. In the Debug folder contains(8) ambiguous_words.exe,

Thus I was able to locate 7exe files and one Dll file and rest(14 files out of 22 succeeded) could not located. i may kindly intimated where i made mistake, if any?

Also tested as suggested and output was fine vide attached files for persual. i am thankful to you for the valuable guidance rendered to me.
With Warmest Regards,
-sriranga(79yrs)




On Sat, Feb 4, 2012 at 1:01 PM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:
testeuro.txt
testphototif.txt

Sriranga(78yrs)

unread,
Feb 5, 2012, 12:10:41 PM2/5/12
to tesser...@googlegroups.com
Tested using tesseract-ocr 3.02 in WinXP(with sp3).
Tried to generate .tr file using the following commandline. exp0.box was generated successfully - but failed to generate exp0.tr file exp0.txt - attached herewith for perusal.
M:\rao- files\chilume\test-3.02>tesseract exp0.tif exp0 batch.nochop makebox
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
M:\rao- files\chilume\test-3.02>tesseract exp0.tif exp0 nobatch box.train logfile
Tesseract Open Source OCR Engine v3.02 with Leptonica
M:\rao- files\chilume\test-3.02>

Guidance is requested.
-sriranga(79yrs)

On Sun, Feb 5, 2012 at 7:15 PM, Patrick Questembert <patrick.q...@gmail.com> wrote:
exp0.txt
exp0.tif
exp0.box

Sriranga(78yrsold)

unread,
Feb 7, 2012, 7:54:47 AM2/7/12
to tesser...@googlegroups.com
Zdenko,
Downloaded r-667 from the svn today. Tried to generate exe files using VS2008, I got result as follows: (first  generated debug version and again generated release version)

Debug version =  25 succeeded 0 failed -0 -skipped - in  folder "bin.dbg" contains 10 exe files.

Release version=21 succeeded  4 failed -0 -skipped - in folder "bin" contains 7 exe files.
missing release exe=(1) tesseract.exe, (2)ambiguous_words.exe (3)classifier_tester.exe in the "bin" folder.
This is brought to your kind notice. I may kindly be intimated where i made mistake?
With regards,
-sriranga(79yrs)

zdenko podobny

unread,
Feb 7, 2012, 8:53:34 AM2/7/12
to tesser...@googlegroups.com
try r668.

Zd.

Sriranga(78yrs)

unread,
Feb 7, 2012, 12:26:00 PM2/7/12
to tesser...@googlegroups.com
Zdenko,
Downloaded r-668 from svn. Tried to generate exe files using VS2008, I got result as follows: (first  generated debug version and again generated release version)

Debug version =  21 succeeded 4 failed -0 -skipped - in  folder "bin.dbg" contains 6 exe files.
missing exe files are (1)ambiguous_words.exe(2)classifier_tester.exe (3)dawg2wordlist.exe and (4)shapeclustering.exe.  (Note= two times tried but result are same)

Release version=25 succeeded  0 failed -0 -skipped - in folder "bin" contains 10 exe files.

This is brought to your kind notice for needful.
With Warmest Regards,
-sriranga(79yrs)

zdenko podobny

unread,
Feb 7, 2012, 1:12:44 PM2/7/12
to tesser...@googlegroups.com
there are:
  • debug configuration
  • release.static configuration
  • release.dynamic configuration
If you see "release" configuration than something is wrong. Try to delete vs2008 directory, run 'svn update' and build it again.

Zd.

Sriranga(78yrs)

unread,
Feb 7, 2012, 10:23:00 PM2/7/12
to tesser...@googlegroups.com
Zdenko,
Thank you for the valuable guidance. yes.  VS2008 is displayed under configuration as follows:
  • debug configuration
  • release.dynamic configuration.
  • release.static configuration
when run VS2008, I see "debug" first- which is default and then press F7 and thereafter clean build and clean cntraining also - next I will select "release.static" and run  "rebuild solution".

As suggested I shall delete folder of vs2008 and will run svn update in ubuntu 11.10  and again will build it in WinXP (since I don't know how to download or update from svn in Winxp itself )
and feedback to you.

With Warmest Regards,
-sriranga(79yrs)- INDIA

Sriranga(78yrs)

unread,
Feb 7, 2012, 11:23:11 PM2/7/12
to tesser...@googlegroups.com
Zdenko,
Congratulations! Followed your valuable guidance. i succeeded to generate debug version=25succeeded -0 failed and also release version=25succeeded-0 failed.

Attached --help list.rtf for your perusal. it would be nice to intimate me the commandlines  for each generated exe files to be used for testing and feedback to you.

With Warmest Regards,
-sriranga(79yrs)
--help list.rtf

asmwarrior

unread,
Feb 8, 2012, 1:27:29 AM2/8/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
I'm a guy from mingw/msys world, but currently I have now success in building tesseract under MSYS, see:
https://groups.google.com/d/topic/tesseract-ocr/7MwfC1JdXyA/discussion
I'm asking that some developers can fix this in the next release, Thanks.

Asmwarrior
ollydbg from Codeblocks' forum

Sriranga(78yrs)

unread,
Feb 8, 2012, 10:47:48 AM2/8/12
to tesser...@googlegroups.com
Zdenko,
I forgot to inform you that total 10 exe files as debug version and 10exe files as release Static version. Remaining 15 files of debug or release could not located. As such it is presumed that only 10 exe files were expected to generate using VS2008. Kindly confirm.

Now Valuable guidance/documentation - how to use 10 exe files in generating traineddata file is awaited.
With regards,
-sriranga(79yrs)

Sriranga(78yrs)

unread,
Feb 8, 2012, 11:32:32 AM2/8/12
to tesser...@googlegroups.com, Ray Smith
Derek,
As suggested by Ray( to combine eng+hin) i tested  using version 3.02 vide extract of CMD below*** by using combined as eng+kan
Also attached sample untitled.tif and output file viz. testunittled.txt. Thus confirmed "Added simultaneous multi-language capability"

***extract of CMD:
M:\rao- files\chilume\test-3.02>tesseract untitled.TIF  testuntitled -l eng+kan
Error: unichar |:|0n2 in normproto file is not in unichar set.
Error: unichar |:|1n2 in normproto file is not in unichar set.
Error: unichar |!|0n2 in normproto file is not in unichar set.
Error: unichar |!|1n2 in normproto file is not in unichar set.
Error: unichar |;|0n2 in normproto file is not in unichar set.
Error: unichar |;|1n2 in normproto file is not in unichar set.
Error: unichar |ರಂ|0n2 in normproto file is not in unichar set.
Error: unichar |ರಂ|1n2 in normproto file is not in unichar set.
Error: unichar |ರಿಂ|0n2 in normproto file is not in unichar set.
Error: unichar |ರಿಂ|1n2 in normproto file is not in unichar set.
Error: unichar |%|0n3 in normproto file is not in unichar set.
Error: unichar |%|1n3 in normproto file is not in unichar set.
Error: unichar |%|2n3 in normproto file is not in unichar set.
Error: unichar |ರೀಂ|0n3 in normproto file is not in unichar set.
Error: unichar |ರೀಂ|1n3 in normproto file is not in unichar set.
Error: unichar |ರೀಂ|2n3 in normproto file is not in unichar set.
Error: unichar |ಲಂ|0n2 in normproto file is not in unichar set.
Error: unichar |ಲಂ|1n2 in normproto file is not in unichar set.

Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
M:\rao- files\chilume\test-3.02>

cheers,
-sriranga(79yrs)

=================================================================



On Sun, Feb 5, 2012 at 7:15 PM, Patrick Questembert <patrick.q...@gmail.com> wrote:
testuntitled.txt
untitled.TIF

Sriranga(78yrs)

unread,
Feb 9, 2012, 2:40:39 AM2/9/12
to tesser...@googlegroups.com, Ray Smith
Derek,
Again tested using version 3.02 for combinations of  four traineddata files viz. eng+kan+tam+tel - vide extract of CMD is attached. output-testing.txt of the testing.tif also attached.
Cheers,
-sriranga(79yrs)

2012/2/8 Sriranga(78yrs) <withblessing....@gmail.com>
output-testing.txt
testing.tif
Extract of CMD.txt

Derek Dohler

unread,
Feb 9, 2012, 2:45:24 AM2/9/12
to tesser...@googlegroups.com, Ray Smith
Hi Sriranga,

Many thanks for doing this -- I haven't had time to test it myself yet. What is your assessment of the effect on processing time?

Cheers,
Derek

2012/2/9 Sriranga(78yrs) <withblessing....@gmail.com>

Sriranga(78yrs)

unread,
Feb 9, 2012, 2:55:39 AM2/9/12
to tesser...@googlegroups.com
Hi Derek,
Same processing time used for  "tesseract <imagefile><output> -l eng" - taken for "tesseract <imagefile><output> -l eng+kan+tam+tel" also. I am using WinXP(sp3). Image taken using Print screen and saved in paint Brush - which takes few minutes.
Cheers,
-sriranga(79yrs)

Renard Wellnitz

unread,
Mar 17, 2012, 4:40:43 PM3/17/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
Hi all,

first of all i would like to express my heartfelt thanks for this great piece of software which tesseract is. :-)

Right now i am currently making an OCR Android App  with tesseract and the results i got so far are very good.

But i encountered a strange issue with tesseract 3.01 and also 3.02.
When running tesseract on the supplied file, tesseract fails to correctly recognize some characters. Especially in line 8 it gives "wwwxegio-bahnde" instead of "www.regio-bahn.de"
I then ran the makebox command to see what was going on. To my surprise if found that the boxes and characters where all 100% correct!
I guess there is no easy fix or config value that i can experiment with?

Cheers
Renard


Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Am Donnerstag, 2. Februar 2012 19:55:57 UTC+1 schrieb Ray Smith:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

binarized.box
content.txt
binarized.jpg

Zdenko Podobný

unread,
Mar 18, 2012, 8:00:25 AM3/18/12
to tesser...@googlegroups.com
Hi,

you did give all details, so I need to guess some details:

1. I guess that you run something like this:
$ tesseract binarized.jpg content -l deu
but you created makebox file with command
$ tesseract binarized.jpg binarized makebox
if yes, than difference is in used language file

2. I try to run OCR eng and than with deu language file. With eng url
was ok (see binarized-eng), but some German words were not correct. It
look like "problem" is in German language file (dictionary?) and not in
tesseract library. This is just quick option, so maybe I am wrong. As a
workaround you can combine English and German file in tesseract3.02 (see
result binarized-eng_deu.txt)
$ tesseract binarized.jpg binarized-eng_deu -l eng+deu

Zdenko

binarized-eng.txt
binarized-eng_deu.txt

Renard Wellnitz

unread,
Mar 18, 2012, 1:47:43 PM3/18/12
to tesser...@googlegroups.com, zde...@gmail.com
Hi,

thank you for putting me on the right track. I was indeed using different training data for each command!
But interestingly the boxes generated with deu.traineddata where the same i got with eng.traineddata..
But nonetheless i successfully compiled 3.02 for android and the workaround using both trainingdata simultaneously works quite well :-) 

Cheers 
Renard


Haben Sie noch Fragen?
Unsere Mitarbeiter/-innen helfen lhnen gem weiter:

KundenCenter Regiobahn
An der Regiobahn 13
40822 Mettmann
Telefon: 02104 305-400
Telefax: 02104 305-403

www.regio-bahn.de
in...@regio-bahn.de

Schlaue Nummer 0 180 3/50 40 30
(Festnetzpreis 0,09 €/Minute;
mobil max. 0,42 €/Minute)

Gute Fahrtwtmscht lhnen lhre REGIOBAHN

Haben Sie noch Fragen?
Unsere Mitarbeiter/-innen helfen Ihnen gern weiter:

KundenCenter Regiobahn
An der Regiobahn 13
40822 Mettmann
Telefon: 02104 305-400
Telefax: 02104 305-403

www.regio-bahn.de
in...@regio-bahn.de

Schlaue Nummer 0 180 3/50 40 30
(Festnetzpreis 0,09 €/Minute;
mobil max. 0,42 €/Minute)

Gute Fahrt wünscht Ihnen Ihre REGIOBAHN

zdenko podobny

unread,
Apr 26, 2012, 4:59:35 PM4/26/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
On Thu, Feb 2, 2012 at 7:55 PM, Ray Smith <thera...@gmail.com> wrote:
Tesseract 3.02 is now available in svn for preliminary testing, currently Linux-only.

There are now 65 languages and some big improvements in layout analysis and character accuracy.
This version will with luck make it into Ubunto LTS Precise Pangolin, so please test to see if your favorite issue is resolved.

Thanks and enjoy!

Ray.

Ray,

can you please clarify status of tesseract-3.02 release?

Ubuntu 12.4 LTS Precise Pangolin was released today [1] and it provide tesseract-3.02 package. I analyzed it quickly ([2]) and it looks like 3.02 = r675 (at the moment current revision is 724)


--
Zdenko

troplin

unread,
Apr 27, 2012, 6:07:01 AM4/27/12
to tesser...@googlegroups.com, tesser...@googlegroups.com
Hello Zdenko,

I know that on Linux/Unix, it is usual to name a shared library libtesseract302.so, and have a symlink libtesseract.so -> libtesseract302.so.
On Windows, you don't have the possibility of symlinks, so usually you don't code the version into the name of the DLL. At least not minor versions.
Instead you embed the version number into the resources, and if a client wants to restrict itself to a specific version of a DLL, he can do that with an application manifest. That's the "Windows way".
The "lib" prefix is also a bit strange on Windows.
Personally I would prefer the name tesseract3.dll.

Am Freitag, 3. Februar 2012 23:32:28 UTC+1 schrieb Zdenko Podobný:
I just uploaded some fixes to VC2008 build - target was to compile and run tesseract.exe ("tesseract.exe eurotext.tif eurotext" produced output :-) )

Please test it. Feel free to improve it.

I still continue to support the current "vs2008 structure".  When Tom will finalize his contribution[1] I will adapt it to 3.02 version and use it for next tesseract release.

Zdenko

Andres

unread,
Apr 27, 2012, 11:57:12 AM4/27/12
to tesser...@googlegroups.com
Hi,

I couldn't find info about the improvements in 3.02 against 3.01. Could you provide a link ?

Thanks,

Andres
--

2012/4/27 troplin <tro...@gmail.com>

Nick White

unread,
Apr 27, 2012, 12:14:00 PM4/27/12
to tesser...@googlegroups.com
On Fri, Apr 27, 2012 at 12:57:12PM -0300, Andres wrote:
> I couldn't find info about the improvements in 3.02 against 3.01. Could you
> provide a link ?

Read the ReleaseNotes file in SVN, and see this thread:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/ef8c6819fc5385f/665905218585219f

Sushil K

unread,
Aug 23, 2016, 6:32:27 AM8/23/16
to tesseract-ocr, thera...@gmail.com
Well, this is a really old thread but I'm hoping some of you are still around. What do those Error messages mean? I am using tesseract on some Kannada files and I get these messages. Since I'm processing hundreds of pages, I cannot tell whether or not the OCR is accurate. Error messages are worrisome.

Sushil

Tom Morris

unread,
Aug 23, 2016, 6:36:38 PM8/23/16
to tesseract-ocr
On Tuesday, August 23, 2016 at 6:32:27 AM UTC-4, Sushil K wrote:
Well, this is a really old thread but I'm hoping some of you are still around. What do those Error messages mean? I am using tesseract on some Kannada files and I get these messages. Since I'm processing hundreds of pages, I cannot tell whether or not the OCR is accurate. Error messages are worrisome.

Yup, really old and about an obsolete version of Tesseract. You should update to the latest release and, if you're still having problems, create a new thread which describes what version you're using, what operating system, gives some example images, etc. In other words a complete description of your configuration.

Tom
Reply all
Reply to author
Forward
0 new messages