Getting Tesseract Output as ANSI Encoding

753 views
Skip to first unread message

Manankumar Bhatt

unread,
Jan 9, 2020, 3:37:51 AM1/9/20
to tesseract-ocr
Hello there,

I have been using Tesseract 4. When as running using command-line, I am getting output text file as "ANSI" encoded instead of "UTF-8". 

I have tried creating a new file and saving it as UTF-8 encoding. But when I run using command line , it generates ANSI encoded file by default.

Can you please help on this?

Thanks,
Manan Bhatt

 

universal reseller

unread,
Jan 9, 2020, 4:45:29 AM1/9/20
to tesser...@googlegroups.com
you mean print output in command line ?! or run command in a separated language and put string in a variable ?!
Message has been deleted

Shree Devi Kumar

unread,
Jan 9, 2020, 7:11:57 AM1/9/20
to tesseract-ocr
output is utf-8, how are you opening it? what is your locale?

On Thu, Jan 9, 2020 at 5:37 PM Manankumar Bhatt <mananku...@gmail.com> wrote:

I am running command "Tesseract image.jpg output -l eng -psm 6" which generates output.txt file. 


On Thursday, 9 January 2020 15:15:29 UTC+5:30, universal reseller wrote:
you mean print output in command line ?! or run command in a separated language and put string in a variable ?!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/eb89e484-2ca5-4127-884e-76632d36ad00%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Manankumar Bhatt

unread,
Jan 9, 2020, 7:12:37 AM1/9/20
to tesseract-ocr
I am running command "Tesseract image.jpg output -l eng -psm 6" which generates output.txt file. Generated file has ANSI encoding. However, UTF-8 extension is desired. 

Manankumar Bhatt

unread,
Jan 9, 2020, 7:15:38 AM1/9/20
to tesseract-ocr
Locale is English(United States) and OS is Windows 7.


On Thursday, 9 January 2020 17:41:57 UTC+5:30, shree wrote:
output is utf-8, how are you opening it? what is your locale?

On Thu, Jan 9, 2020 at 5:37 PM Manankumar Bhatt <mananku...@gmail.com> wrote:

I am running command "Tesseract image.jpg output -l eng -psm 6" which generates output.txt file. 

On Thursday, 9 January 2020 15:15:29 UTC+5:30, universal reseller wrote:
you mean print output in command line ?! or run command in a separated language and put string in a variable ?!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Jan 9, 2020, 8:16:52 AM1/9/20
to tesser...@googlegroups.com
Provide also detail for reproducing problem: input image and output file, tesseract version, how you get tesseract installed, which model you use for ocr...

Zdenko


št 9. 1. 2020 o 13:15 Manankumar Bhatt <mananku...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a40c100d-bf57-4173-9b26-04692c65d20b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages