Tesseract in Subtitle Edit

2,445 views
Skip to first unread message

Hallur Guðjónsson

unread,
May 22, 2012, 2:53:47 PM5/22/12
to tesseract-ocr
I don't know if any one you guys have used Subtitle Edit for Windows,
but it uses tesseract to OCR the subpictures ripped from dvds. And
after I added the Icelandic language pack (which I extracted from a
debian file) it crashes. Is there a windows version of the Icelandic
language pack? how can I get this to work, I am ripping tons of
subtitles and it would really speed up things to get this OCR working.

Thank you

Sincerely

Hallur Örn

zdenko podobny

unread,
May 23, 2012, 2:27:30 AM5/23/12
to tesser...@googlegroups.com
Packages are platform independent. I would expect that language pack you used is from different (newer) tesseract version than version used in Subtitle Edit. 

Please try to post error message (if you see any) or contact author of  Subtitle Edit for help.

--
Zdenko

Hallur Guðjónsson

unread,
May 23, 2012, 7:19:33 AM5/23/12
to tesseract-ocr
Yeah I tried to run it through CMD to see what the error was, and it
gives me this:

actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert
failed:in file ..\ccutil\tessdatamanager.cpp, line 48

The author of Subtitle Edit pointed to this website for acquiring new
language packs, but I didn't find an Icelandic pack. So I googled an
Icelandic language pack and found the debian one and I thought it
would be the same type of document and only the program itself would
differ on different platforms. I still haven't found a windows version
of Icelandic, maybe it simply doesn't exist. But is there a way to
convert this to a windows compatible version?

zdenko podobny

unread,
May 23, 2012, 7:51:36 AM5/23/12
to tesser...@googlegroups.com
Did you read my reply carefully? 
See also FAQ [1] (IMO line number is not important in this case).
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Hallur Guðjónsson

unread,
May 23, 2012, 10:19:14 AM5/23/12
to tesser...@googlegroups.com
Yes I read it carefully but I understood wrong at first, is there some place to get the 3.02 windows version of tesseract? do I have to compile it myself (because I'm a dumbass and don't know how to do that)

Sincerely

Hallur Örn

zdenko podobny

unread,
May 23, 2012, 4:17:26 PM5/23/12
to tesser...@googlegroups.com
Officially 3.02 is not released, so there is not official (windows) binary version (you should compile it by yourself)...
Anyway I can post somewhere current svn build if needed (no support and installer will be provided for this :-) ).

-- 
Zdenko

Sven Pedersen

unread,
May 23, 2012, 4:20:39 PM5/23/12
to tesser...@googlegroups.com
Hei Hallur,
You can get the isl.traineddata file from subversion (SVN):
http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/?r=656

You can perhaps use that language file with the 3.01 version. You can
get Microsoft's free compiler and follow the recipe on the Wiki,
though it might be hard for a non-programmer (that does not make you a
dumb person :-)
--Sven
>>> tesseract-oc...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Hallur Guðjónsson

unread,
May 23, 2012, 5:19:52 PM5/23/12
to tesser...@googlegroups.com
Yes please post it here somewhere and I will try to compile it myself.

Thank you

Sincerely

Hallur Orn

zdenko podobny

unread,
May 23, 2012, 5:30:35 PM5/23/12
to tesser...@googlegroups.com
On Wed, May 23, 2012 at 10:20 PM, Sven Pedersen <sven.p...@gmail.com> wrote:
Hei Hallur,
You can get the isl.traineddata file from subversion (SVN):
http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/?r=656

You can perhaps use that language file with the 3.01 version.
no, he can not. this is reason why he got that error.
He can use 3.01 language file in 3.02, but he can not use 3.02 in 3.01. The same rule applies for 3.01 and 3.00.

TP

unread,
May 23, 2012, 10:02:34 PM5/23/12
to tesser...@googlegroups.com
On Wed, May 23, 2012 at 7:19 AM, Hallur Guðjónsson <hal...@lottobaes.com> wrote:
> Yes I read it carefully but I understood wrong at first, is there some place
> to get the 3.02 windows version of tesseract? do I have to compile it myself
> (because I'm a dumbass and don't know how to do that)

Now that I have written some in-depth documentation on the process
[1], I would hope that building tesseract on Windows using Visual
Studio 2008 (or the free VC++ 2008 Express) is relatively painless. It
is slightly out of date (and needs a bit of re-arranging), but I
continue to appreciate any feedback from "newbies".

[1] Visual Studio 2008 Developer Notes for Tesseract-OCR
http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/index.html

Taha Alasli

unread,
May 24, 2012, 2:41:35 AM5/24/12
to tesser...@googlegroups.com
Hallur Guðjónsson,
 
do you want the compaild Tesseract3.02.exe?
 
If it is I'll send it to you.

Hallur Guðjónsson

unread,
May 24, 2012, 4:36:50 PM5/24/12
to tesser...@googlegroups.com
Yes I would appreciate if you could send it :D

Thank you

Taha Alasli

unread,
May 26, 2012, 2:46:23 AM5/26/12
to tesser...@googlegroups.com
Find it here.
 
Good Luck.


 
Reply all
Reply to author
Forward
0 new messages