ocr on subtitles

244 views
Skip to first unread message

franck dev

unread,
May 3, 2015, 3:35:51 PM5/3/15
to tesser...@googlegroups.com
Hi, I have tried to do ocr on subtitles picture but depending on the color of subtitle font it works or not
When the subtitle characters are in green it doesn't work
Could you give me some advices on parameters for tesseract I should use or how i should transform the picture before calling tesseract ?

Thanks,
Franck
sub0001.png

Dmitri Silaev

unread,
May 3, 2015, 4:28:09 PM5/3/15
to tesser...@googlegroups.com
Hi Franck,

It seems the problem here is not character color itself but rather char outlines and image transparency.

To get rid of all of these problems at once try converting your images to monochrome (use same old ImageMagick), like this - see the attachment. Tess then produces perfect results. If your other subtitle images have similar structure this method should work regardless of char color.

Best regards,
Dmitri Silaev
www.CustomOCR.com





--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/36028dbe-721f-4806-b120-b38a96d45bf1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

inet010_bw.png

Sriranga(82yrsold)

unread,
May 3, 2015, 10:28:47 PM5/3/15
to tesser...@googlegroups.com
tested in vietocr3.6 works fine. attached png and also output.txt.

sub0001.png.txt
sub0001.png
sub0001a.png
sub0001a.png.txt

franck dev

unread,
May 4, 2015, 6:29:43 AM5/4/15
to tesser...@googlegroups.com
Hi,

I tried with imagemagick:
-colorspace Gray
-negate

But tesseract doesn't work after this conversion
Which command have you used with imagemagic ?

With your picture the ocr works perfetcly
Thanks
sub0001_bw_inv.png

Sriranga(81+yrsold)

unread,
May 4, 2015, 8:45:16 AM5/4/15
to Michael Reimer
I used IrfanView software to convert to grey scale.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.

Dmitri Silaev

unread,
May 4, 2015, 2:03:36 PM5/4/15
to tesser...@googlegroups.com
I used FastStone.

For ImageMagick use this:

- Source
(inet010.png)

- Get rid of transparency
>convert inet010.png -background black -alpha remove inet010_ntransp.png
(inet010_ntransp.png)

- Threshold
>convert inet010_ntransp.png -threshold 40% inet010_ntransp_ts.png
(inet010_ntransp_ts.png)

- Run Tess - perfect
>tesseract.exe inet010_ntransp_ts.png inet010_ntransp_ts.png
(inet010_ntransp_ts.png.txt)

Best regards,
Dmitri Silaev
www.CustomOCR.com





--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
inet010_ntransp_ts.png.txt
inet010.png
inet010_ntransp.png
inet010_ntransp_ts.png
Reply all
Reply to author
Forward
0 new messages