tesseract multiply .png files to singular .txt file

73 views
Skip to first unread message

Lako

unread,
Mar 16, 2017, 11:43:43 AM3/16/17
to tesseract-ocr
Hi,

Apologies for the beginner question, unfortunately I am fairly new to Tesseract, and also coding. I have a fairly huge amount of .png files (one line, upperCase code) and would preferably want to create a singular text file where they are seperated with a semicolon, or even a space to get the entire list.  

I have successfully managed to convert a single .png to .txt but can not get a bulk to work. I have also looked in other posts, but often I guess it's explained above my degree of understanding. 

Many thanks for any help! 


ShreeDevi Kumar

unread,
Mar 16, 2017, 11:50:48 AM3/16/17
to tesser...@googlegroups.com
Please inform what environment you are running in, Linux, windows, etc.

Basically, you need to to setup a loop which will process all .PNG files and concatenate the OCR results.

- excuse the brevity, sent from mobile

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7c05d9c3-9da9-435b-8d21-40892a58034b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,
Mar 16, 2017, 11:52:21 AM3/16/17
to tesser...@googlegroups.com
Gui front-end for tesseract such as Vietocr and gimagereader will also allow for batch processing of multiple files.


- excuse the brevity, sent from mobile
On 16-Mar-2017 9:13 PM, "Lako" <laurent...@gmail.com> wrote:

Greg Dunkel

unread,
Mar 16, 2017, 1:30:56 PM3/16/17
to tesser...@googlegroups.com
For a large number of files, it is better to do it a chunk at a time,
catch any errors , then concatenate the chunk.

On Thu, Mar 16, 2017 at 11:52 AM, ShreeDevi Kumar <shree...@gmail.com> wrote:
> Gui front-end for tesseract such as Vietocr and gimagereader will also allow
> for batch processing of multiple files.
>
> - excuse the brevity, sent from mobile
>
> On 16-Mar-2017 9:13 PM, "Lako" <laurent...@gmail.com> wrote:
>>
>> Hi,
>>
>> Apologies for the beginner question, unfortunately I am fairly new to
>> Tesseract, and also coding. I have a fairly huge amount of .png files (one
>> line, upperCase code) and would preferably want to create a singular text
>> file where they are seperated with a semicolon, or even a space to get the
>> entire list.
>>
>> I have successfully managed to convert a single .png to .txt but can not
>> get a bulk to work. I have also looked in other posts, but often I guess
>> it's explained above my degree of understanding.
>>
>> Many thanks for any help!
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-oc...@googlegroups.com.
>> To post to this group, send email to tesser...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/7c05d9c3-9da9-435b-8d21-40892a58034b%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXmH0zODGgvrqSKbm-q8OVsEYDe-uifpEOGSeM-UdVJdg%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
/greg

Lako

unread,
Mar 17, 2017, 4:56:59 AM3/17/17
to tesseract-ocr
Many thanks for your reply I am running on a Mac! I will test the options listed below. Thank you again for helping me out on this issue! 

ShreeDevi Kumar

unread,
Mar 17, 2017, 5:11:16 AM3/17/17
to tesser...@googlegroups.com
On a unix like environment, you can setup a batch file similar to this:

#!/bin/bash
export TESSDATA_PREFIX=/mnt/c/Users/User/shree

touch combined.txt
img_files=$(ls *.png)

for img_file in ${img_files}; do
  tesseract ${img_file} ${img_file%.*}   --oem 1  -l eng 
  cat ${img_file%.*}.txt >> combined.txt
done 

Reply all
Reply to author
Forward
0 new messages