Tesseract ignores tessedit_char_whitelist parameter

2,511 views
Skip to first unread message

Ľuboš Katrinec

unread,
Oct 13, 2017, 5:43:46 AM10/13/17
to tesseract-ocr
Hello,

I'm trying to solve captcha images just for fun (or rather a challenge ;-) ). I'm passing tessedit_char_whitelist and tessedit_char_blacklist parameters but somehow they seem to be ignored. Perhaps I just miss something.

> tesseract -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ -c tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz  threshold_problem1.jpeg stdout
Warning. Invalid resolution 0 dpi. Using 70 instead.
R x C
Eo e

I'm using a windows version:

> tesseract -v
tesseract
4.00.00alpha
 leptonica
-1.74.1
  libgif
4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0


I'm doing it over a JPEG, could that be a problem?

Thanks and regards,
Lubos
threshold_problem1.jpeg

Dan9er

unread,
Oct 14, 2017, 10:55:42 AM10/14/17
to tesseract-ocr
-c goes at the very end of the command, and you can combine those two arguments. Try this:

> tesseract threshold_problem1.jpeg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz

ShreeDevi Kumar

unread,
Oct 14, 2017, 11:43:16 AM10/14/17
to tesser...@googlegroups.com
whitelist parameter does not work with tesseract 4.0x

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7036c184-2d91-43f1-874f-44f2c29f3d61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ľuboš Katrinec

unread,
Oct 19, 2017, 10:15:54 AM10/19/17
to tesseract-ocr
I already tried this, didn't help at all.

Ľuboš Katrinec

unread,
Oct 19, 2017, 10:19:08 AM10/19/17
to tesseract-ocr
I used --print-parameters with this version and I could see the parameter in the list included. Do you think it is not used even if listed? It's the same with tessedit_char_blacklist? Is there an alternative?

Thanks and regards,
Lubos

On Saturday, October 14, 2017 at 5:43:16 PM UTC+2, shree wrote:
whitelist parameter does not work with tesseract 4.0x

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Oct 14, 2017 at 8:25 PM, Dan9er <dan9ert...@gmail.com> wrote:
-c goes at the very end of the command, and you can combine those two arguments. Try this:

> tesseract threshold_problem1.jpeg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz

On Friday, October 13, 2017 at 5:43:46 AM UTC-4, Ľuboš Katrinec wrote:
Hello,

I'm trying to solve captcha images just for fun (or rather a challenge ;-) ). I'm passing tessedit_char_whitelist and tessedit_char_blacklist parameters but somehow they seem to be ignored. Perhaps I just miss something.

> tesseract -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ -c tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz  threshold_problem1.jpeg stdout
Warning. Invalid resolution 0 dpi. Using 70 instead.
R x C
Eo e

I'm using a windows version:

> tesseract -v
tesseract
4.00.00alpha
 leptonica
-1.74.1
  libgif
4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0


I'm doing it over a JPEG, could that be a problem?

Thanks and regards,
Lubos

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Quan Nguyen

unread,
Oct 19, 2017, 3:11:45 PM10/19/17
to tesseract-ocr
https://github.com/tesseract-ocr/tesseract/issues/751

Use current version 3.05.x, if possible.
Reply all
Reply to author
Forward
0 new messages