Captcha Error

87 views
Skip to first unread message

Peter Lee

unread,
Nov 18, 2023, 7:41:02 AM11/18/23
to Selenium Users
Hello everyone, I'm a beginner user of Python and Selenium,
and this is my very first post in this group. I hope someone
can help me out with my problem.

I'm using Python and Selenium to convert a text-based
captcha to a text string.


When executing the following statement,

captcha_text = pytesseract.image_to_string(captcha_image, lang='eng')

I get the error messages below:

 File "C:\Users\Peter\AppData\Local\Programs\Python\Python37\lib\site-packages\
pytesseract\pytesseract.py", line 427, in image_to_string
    }[output_type]()
  File "C:\Users\Peter\AppData\Local\Programs\Python\Python37\lib\site-packages\
pytesseract\pytesseract.py", line 426, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\Peter\AppData\Local\Programs\Python\Python37\lib\site-packages\
pytesseract\pytesseract.py", line 288, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\Peter\AppData\Local\Programs\Python\Python37\lib\site-packages\
pytesseract\pytesseract.py", line 264, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_
file')

The last error message seems to be complaining about an incorrect usage of
the language parameter.

Am I using the wrong syntax for pytesseract.image_to_string()?

I'm using the following versions:

Python 3.7
Selenium 4.11.2
Tesseract 5.3.3

Thank you very much,

Peter

ddlionx

unread,
Nov 18, 2023, 8:13:10 AM11/18/23
to seleniu...@googlegroups.com
Could be a hundred different things. Is the image the correct format for Pytesseract? Have you installed the language files? Are you sure the error is coming from Pytesseract (and can you prove that with more error handling)? Are there permission issues with the images?

This is to say nothing of the validity of using something like Pytesseract on Captchas to begin with, which are designed to elude detection from these kinds of libraries...

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/9831dd2a-f8e2-4353-8554-5351de720ac1n%40googlegroups.com.

Peter Lee

unread,
Nov 18, 2023, 9:16:04 PM11/18/23
to Selenium Users
That Guy,

Thanks very much for your reply and feedback.

When I installed Tesseract, I selected the default
English as the language. Also, at the beginning of
the intall process, it asks for which components to
install, and I checked off "language data", so I'm
pretty confident that the necessary English language
files have been installed.

The captcha image was saved as a .png file, and I can
open it with Microsoft Paint program. In addition, inside
my Python program I was able to open the captcha image file
without any errors, so I think it's safe to eliminate
any format or permission problems with the input captcha file.

Do you have any other suggestions for debugging this problem?

Thank you,

Peter


Peter Lee

unread,
Nov 19, 2023, 8:02:59 AM11/19/23
to Selenium Users
Some good news. The problem turned out to be in my pytesseract.pytesseract.tesseract_cmd = 'path' statement.
I had the path pointing to pytesseract.exe. When I changed the path to point to tesseract.exe, instead, the previous errors disappeared. 

The problem now is that the accuracy of the result after converting from image to text is very poor. I tried all the different options for page segmentation modes from 0 to 13, and 13 seems to be work the best in terms of accuracy and consistency. However, it's far from accurate. 

I've included two examples of the captcha images I'm dealing with. 

Is there anyone who has succeeded in dealing with this particular style of captcha? Or any suggestions for dealing with this kind of captcha?

Thank you,

Peter
captcha10.png
captcha9.png

ddlionx

unread,
Nov 19, 2023, 8:32:24 AM11/19/23
to seleniu...@googlegroups.com
"Dealing with" in what sense? If you are running automated tests, this FAQ answer will be very helpful:


If you are not automated testing, and are trying to bypass a real world capture, I'm not sure what people can do to help you. As I mentioned before, Capchas are designed to not be usable by robots. 

Peter Lee

unread,
Nov 20, 2023, 7:19:54 AM11/20/23
to Selenium Users
Ok, I understand what you're staying. 

Thanks, anyway.

Reply all
Reply to author
Forward
0 new messages