Saurabh Srivastav

unread,

Mar 3, 2017, 2:09:39 AM3/3/17

to tesseract-ocr

how to train tesseract 4.0. Please help me..

thanks,
Saurabh Srivastav

Screenshot from 2017-03-03 12-15-12.png

ShreeDevi Kumar

unread,

Mar 3, 2017, 2:23:31 AM3/3/17

to tesser...@googlegroups.com

screenshot of warning means that your image does not have resolution info. Your OCR output file should have been created.

Training 4.0 is not easy. Please see https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f1782fd1-97a1-40db-8ba0-f003052f39ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Saurabh Srivastav

unread,

Mar 22, 2017, 2:01:18 PM3/22/17

to tesseract-ocr

Thank you shree for your valuable reply. But now i have created box files for a particuler image and trained it..but still i am missing something, may you please help me what i have to do after creating box file for that image and make tesseract to read the characters from that image.

thanks and regards.

On Friday, March 3, 2017 at 12:53:31 PM UTC+5:30, shree wrote:

screenshot of warning means that your image does not have resolution info. Your OCR output file should have been created.

Training 4.0 is not easy. Please see https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 3, 2017 at 12:17 PM, Saurabh Srivastav <hiiiam...@gmail.com> wrote:

how to train tesseract 4.0. Please help me..

thanks,
Saurabh Srivastav

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,

Mar 23, 2017, 7:54:59 AM3/23/17

to tesser...@googlegroups.com

To read characters from an image, it is not necessary to train it. Just use an appropriate traineddata.

Training is required only if it is a new language or font or some such special circumstance.

Read the wiki for documentation.

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/14d1eb0f-7881-4d71-82ba-25e85f8867fa%40googlegroups.com.

Saurabh Srivastav

unread,

Apr 3, 2017, 9:40:05 AM4/3/17

to tesseract-ocr

hello shree ! thank you for your help.
may you please help me how can i write a bash script for tesseract.

ShreeDevi Kumar

unread,

Apr 3, 2017, 10:41:33 AM4/3/17

to tesser...@googlegroups.com

Saurabh,

It depends on what you want to do with the bash script.

Here is a sample of a script I used to compare results using diff tessdata files by looping thru a set of image files. Google the bash commands to figure out what they do!

#!/bin/bash

set -vx

export TESSDATA_PREFIX=/mnt/c/Users/User/shree/tesseract-ocr

img_files=$(ls *.jpeg)

for img_file in ${img_files}; do

time tesseract ${img_file} ${img_file%.*}-ssd -l ssd

time tesseract ${img_file} ${img_file%.*}-ssdsmall --psm 6 --oem 1 -l ssdsmall

time tesseract ${img_file} ${img_file%.*}-eng --psm 6 --oem 1 -l eng

done

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 7:10 PM, Saurabh Srivastav <saurabhkum...@gmail.com> wrote:

hello shree ! thank you for your help.
may you please help me how can i write a bash script for tesseract.

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ac53f578-d14c-401b-b65e-b222fe4cb067%40googlegroups.com.

Saurabh Srivastav

unread,

Apr 3, 2017, 11:38:11 AM4/3/17

to tesseract-ocr

shree,
actually i want a bash script which run tesseract and store ouput file in a folder..

kindly help me to make this type of bash script.

thank you.

srn...@gmail.com

unread,

Apr 4, 2017, 8:36:52 AM4/4/17

to tesseract-ocr

Hello ShreeDevi,

https://medium.com/apegroup-texts/training-tesseract-for-labels-receipts-and-such-690f452e8f79

In the link, we can see a full fledged tutorial of tesseract 3.0 version, of using it and training it. Can you please clarify the below points...?

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

But in the github link, i feel its good if they elaborate more..

1) How should i train tesseract if i dont know or i may get random fonts in image files. ?

2) In github tutorial, its specified that we should skip clustering steps (mftraining, cntraining, shapeclustering) ?

3) And I want to generate a trained data file, and want to merge with tessdata(already present ) and dont want to replace it?

Can you please specify how to achieve these steps..?

Thank You.

On Monday, April 3, 2017 at 8:11:33 PM UTC+5:30, shree wrote:

Saurabh,

It depends on what you want to do with the bash script.

Here is a sample of a script I used to compare results using diff tessdata files by looping thru a set of image files. Google the bash commands to figure out what they do!

#!/bin/bash
set -vx
export TESSDATA_PREFIX=/mnt/c/Users/User/shree/tesseract-ocr

img_files=$(ls *.jpeg)
for img_file in ${img_files}; do
time tesseract ${img_file} ${img_file%.*}-ssd -l ssd
time tesseract ${img_file} ${img_file%.*}-ssdsmall --psm 6 --oem 1 -l ssdsmall
time tesseract ${img_file} ${img_file%.*}-eng --psm 6 --oem 1 -l eng
done

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 7:10 PM, Saurabh Srivastav <saurabhkum...@gmail.com> wrote:

hello shree ! thank you for your help.
may you please help me how can i write a bash script for tesseract.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

srn...@gmail.com

unread,

Apr 4, 2017, 8:47:47 AM4/4/17

to tesseract-ocr

Hello ShreeDevi,

can you elaborate regarding lstm step, which is new in Tesseract 4.0, and the new steps I need to follow for training Tesseract 4?

Thank you

On Monday, April 3, 2017 at 8:11:33 PM UTC+5:30, shree wrote:

Saurabh,

It depends on what you want to do with the bash script.

Here is a sample of a script I used to compare results using diff tessdata files by looping thru a set of image files. Google the bash commands to figure out what they do!

#!/bin/bash
set -vx
export TESSDATA_PREFIX=/mnt/c/Users/User/shree/tesseract-ocr

img_files=$(ls *.jpeg)
for img_file in ${img_files}; do
time tesseract ${img_file} ${img_file%.*}-ssd -l ssd
time tesseract ${img_file} ${img_file%.*}-ssdsmall --psm 6 --oem 1 -l ssdsmall
time tesseract ${img_file} ${img_file%.*}-eng --psm 6 --oem 1 -l eng
done

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 3, 2017 at 7:10 PM, Saurabh Srivastav <saurabhkum...@gmail.com> wrote:

hello shree ! thank you for your help.
may you please help me how can i write a bash script for tesseract.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

srn...@gmail.com

unread,

Apr 4, 2017, 8:48:24 AM4/4/17

to tesseract-ocr

Are u having any progress Saurabh..?

ShreeDevi Kumar

unread,

Apr 4, 2017, 8:53:33 AM4/4/17

to tesser...@googlegroups.com

See

https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh

https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh

https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh

srn...@gmail.com

unread,

Apr 4, 2017, 9:31:08 AM4/4/17

to tesseract-ocr

I am trying to tesseract 4,, and i am getting folowing error,,

command used:

mkdir -p /home/p/Documents/T/engoutput
/home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Documents/T/img_frm_3/unicharset \
--script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master --debug_interval 100 \
--train_listfile /home/p/Documents/T/TESS_4_ALPHA/langdata-master/eng/eng.training_files \
--eval_listfile /home/p/Documents/T/TESS_4_ALPHA/langdata-master/eng/eng.training_files \
--max_iterations 5000 &>/home/p/Documents/T/basetrain.log

used for log:
tail -f basetrain.log
Failed to load list of training filenames from /home/p/Documents/T/TESS_4_ALPHA/langdata-master/eng/eng.training_files
tail: basetrain.log: file truncated

error getting:
Failed to load list of training filenames from /home/p/Documents/T/TESS_4_ALPHA/langdata-master/eng/eng.training_files

ShreeDevi Kumar

unread,

Apr 4, 2017, 10:28:44 AM4/4/17

to tesser...@googlegroups.com

Tesstrain.sh generates a file called eng.training_files.txt

You are using command without .text extension

Check the name of generated file and use that.

I have found that editing that file also gives errors.

- excuse the brevity, sent from mobile

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/77c03857-e090-4a68-9cb9-505ff9ba52d4%40googlegroups.com.

Saurabh Srivastav

unread,

Apr 4, 2017, 11:57:06 AM4/4/17

to tesseract-ocr

Yes, i trained my tesseract for eng font and make them read the characters from image.

thanks,
Saurabh Srivastav

Saurabh Srivastav

unread,

Apr 4, 2017, 12:08:24 PM4/4/17

to tesseract-ocr

thank you shree ,
you always help me.

but i still have one problem that i wrote a bash script which trace the all images with .jpg extension and make their output files as the name of image.
but i want that when i run script it trace more images with some different extensions like .jpg , .jpeg , .png .is it possible? if it is, then please help me out.

thank you shree,

srn...@gmail.com

unread,

Apr 4, 2017, 3:24:26 PM4/4/17

to tesseract-ocr

Can you please post some experiences in this post, as there are no posts to train tesseract 4.

1)And also, is there any way to add the new trained data file to old trained data file, without replacing the old file.
2)If we dont know what font we may get in our images, then how should we proceed in training the tessract

ShreeDevi Kumar

unread,

Apr 4, 2017, 11:37:40 PM4/4/17

to tesser...@googlegroups.com

Read

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer

and

https://github.com/tesseract-ocr/tesseract/wiki/Documentation

https://github.com/tesseract-ocr/tesseract/wiki/Fonts

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

https://github.com/tesseract-ocr/tesseract/wiki/FAQ

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com.

srn...@gmail.com

unread,

Apr 5, 2017, 4:25:21 AM4/5/17

to tesseract-ocr

After u have said,

I tried in two ways and i am stuck at lstm step:

Training

command used:

/home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Documents/T/img_frm_3/eng.unicharset \
>   --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master --debug_interval 100 \
>   --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]' \
>   --model_output /home/p/Documents/T/ \
>   --train_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt \
>   --eval_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt \
>   --max_iterations 5000 &>/home/p/Documents/T/basetrain.log

tail -f basetrain.log
Error getting is :

Deserialize header failed: BnO. 005 SUBHISHIs TOWN CENTRE
Deserialize header failed: MOKILA SHAKARPALLY
Deserialize header failed: PHONE: 040-8989898989
Load of page 0 failed!
Load of images failed!!
Deserialize header failed: TIN: 8989898989
Deserialize header failed: Station 1D: 01 Time: 03:26:46 PM
Deserialize header failed: CASHIER ID:; 3001 Date: 21-02-2017
Deserialize header failed: (null)
Deserialize header failed: (null)

Fine tuning:

command used:-

/home/plianto/Documents/Tvat/tesseract-master/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
--training_text /home/plianto/Documents/Tvat/img_frm_3/eng.ArialBold.exp0.txt \
--langdata_dir /home/plianto/Documents/Tvat/TESS_4_ALPHA/langdata-master --tessdata_dir /usr/share/tesseract-ocr/tessdata \
--fontlist "Arial Bold" \
--output_dir /home/plianto/Documents/Tvat/engoutput/

error:

=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata
[Wed Apr 5 13:53:05 IST 2017] /usr/local/bin/tesseract /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.tif /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0 lstm.train
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
ERROR: /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.lstmf does not exist or is not readable

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,

Apr 5, 2017, 4:29:05 AM4/5/17

to tesser...@googlegroups.com

4.0 is alpha software. Please use an older released version.

- excuse the brevity, sent from mobile

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6e9e098f-da2f-4c4a-a866-24f9938bdb1b%40googlegroups.com.

ShreeDevi Kumar

unread,

Apr 5, 2017, 4:29:56 AM4/5/17

to tesser...@googlegroups.com

You do not have the LSTM.train config file.

- excuse the brevity, sent from mobile

On 05-Apr-2017 1:55 PM, <srn...@gmail.com> wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6e9e098f-da2f-4c4a-a866-24f9938bdb1b%40googlegroups.com.

srn...@gmail.com

unread,

Apr 5, 2017, 4:32:06 AM4/5/17

to tesseract-ocr

Overview of Training Process

The overall training process is similar to training 3.04 Conceptually the same:

Prepare training text.
Render text to image + box file. (Or create hand-made box files for existing image data.)
Make unicharset file.
Optionally make dictionary data.
Run tesseract to process image + box file to make training data set.
Run training on training data set.
Combine data files.

The key differences are:

The boxes only need to be at the textline level. It is thus far easier to make training data from existing image data.
The .tr files are replaced by .lstmf data files.
Fonts can and should be mixed freely instead of being separate.
The clustering steps (mftraining, cntraining, shapeclustering) are replaced with a single slow lstmtraining step.

Hello shrreDevi,

I request u to guide me in eloborating the above marked steps, as i am not able to find the relevant steps for them.

The steps which I am following is giving me the above errors in previuos reply.

Please guide me.

On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

srn...@gmail.com

unread,

Apr 5, 2017, 4:35:19 AM4/5/17

to tesseract-ocr

You can use *.* when identifying the files.. but you should be careful only image files are only supplied... as it can take all available files, because * means it takes input for all the files.

1)I request you can help me with posts i had posted today..
2) And please guide how can i generate lstm files for images which i have to use..
and pls explain how you have followed...

srn...@gmail.com

unread,

Apr 5, 2017, 4:37:03 AM4/5/17

to tesseract-ocr

Please tell and help me how can i get LSTM.train config file.. as i need to work on Tesseract 4 only... dont have other option

srn...@gmail.com

unread,

Apr 5, 2017, 7:20:08 AM4/5/17

to tesseract-ocr

Hello ShreeDevi,

I solved this error lstm.train, i have given wrong path.

mkdir -p ~/tesstutorial/engoutput
training/lstmtraining -U ~/tesstutorial/engtrain/eng.unicharset \
  --script_dir ../langdata --debug_interval 100 \

  --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]' \


  --model_output ~/tesstutorial/engoutput/base \
  --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log

1)Can u plz tell tell me how to generate unicharset file for my image files after genearting box files with tesseract.
2)And also please clarify about netspec param and what input should be given to it

Thanks

On Wednesday, April 5, 2017 at 1:59:56 PM UTC+5:30, shree wrote:

Saurabh Srivastav

unread,

Apr 10, 2017, 5:26:06 AM4/10/17

to tesseract-ocr

hello srn ,
can you please let me know about your progress...

srn...@gmail.com

unread,

Apr 12, 2017, 6:09:01 AM4/12/17

to tesseract-ocr

I am able to train the tesseract with fine tuning technique with some training text (not images).. and i want to know how train tesseract with images and box files.. I am getting confused because when i give this

tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train

command, tr files are being produced (my tesseract is 4 alpha version).

I will post my tutorial or experiences in this week end.

And can u plz give overview how to train tessract with some images(blurred) and what changes i need to do in the link

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

Saurabh Srivastav

unread,

Apr 25, 2017, 5:07:27 AM4/25/17

to tesseract-ocr

Edit your box files with correct data and the make a traineddata file and then paste it to usr/local/share/tessdata

kislay bajpai

unread,

Oct 16, 2018, 8:33:53 AM10/16/18

to tesseract-ocr

Hello Shree,

I am confused how to train tesseract 4.0 alpha for new font (E 13B). Please help me for it.

Shree Devi Kumar

unread,

Oct 16, 2018, 9:40:48 AM10/16/18

to tesser...@googlegroups.com

Please do not use tesseract 4.0 alpha. There have been many changes since then.

Use the latest code from github, which is 4.0.0-rc3 or install from Alex's PPA or from ub mannheim (for Windows).

Please read the wiki pages about training for new font for tesseract 4 - fine tuning for Impact.

On Tue, 16 Oct 2018, 08:33 kislay bajpai, <kislay....@gmail.com> wrote:

Hello Shree,

I am confused how to train tesseract 4.0 alpha for new font (E 13B). Please help me for it.

.

kislay bajpai

unread,

Oct 22, 2018, 6:59:58 AM10/22/18

to tesser...@googlegroups.com

Hello,

Sorry to disturb you, actually i am very new with tesseract and getting no idea, how to train it.

Please help me out. I am in big trouble.

version - tesseract4.0 alpha

OS - ubuntu16.04 and RHEL 7.3 (any one i can use)

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU1ZNHmbkPAraFAO2a7AzQTwDyGi9%3D9ZAs8ipBPU%2B1NMw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Thanks and regards

Kislay Bajpai

Shree Devi Kumar

unread,

Oct 22, 2018, 12:42:17 PM10/22/18

to tesser...@googlegroups.com

Please see https://github.com/tesseract-ocr/tesseract/wiki and https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKPmCYj_E-TnZxuyzZstJSHDDZydistcaM1ik0S6%2B-ZS1kRX0w%40mail.gmail.com.

saman ukh

unread,

Feb 22, 2020, 10:02:27 AM2/22/20

to tesseract-ocr

Hello all,

I am using tesseract 4.0 which uses LSTM

I have searched a lot for training new characters, unfortunately, I found difficult to do training

I am trying to train Arabic Traineddata by adding a few new characters

can anyone help me with this

what are the steps, where to start?

Reply all

Reply to author

Forward

train tesseract OCR 4.0

Saurabh Srivastav

ShreeDevi Kumar

Saurabh Srivastav

ShreeDevi Kumar

Saurabh Srivastav

ShreeDevi Kumar

Saurabh Srivastav

srn...@gmail.com

srn...@gmail.com

srn...@gmail.com

ShreeDevi Kumar

srn...@gmail.com

ShreeDevi Kumar

Saurabh Srivastav

Saurabh Srivastav

srn...@gmail.com

ShreeDevi Kumar

srn...@gmail.com

ShreeDevi Kumar

ShreeDevi Kumar

srn...@gmail.com

Overview of Training Process

srn...@gmail.com

srn...@gmail.com

srn...@gmail.com

Saurabh Srivastav

srn...@gmail.com

Saurabh Srivastav

kislay bajpai

Shree Devi Kumar

kislay bajpai

Shree Devi Kumar

saman ukh