oem Detection

139 views
Skip to first unread message

Ibr

unread,
Jun 13, 2017, 5:47:34 AM6/13/17
to tesseract-ocr
Hi,

when make detection using the tesseract 4.00.00alpha and use the command: tesseract image results -l ara --tessdata-dir ./tessdata --oem 1 the oem here means "Neural nets LSTM only", so there is no argument in tesseract to specify where to find the LSTM files, how the tesseract find them? I used to place the LSTM files inside the tesseract folder, but I tried to detect after I deleted the LSTM files, with the argument --oem 1 which meanst LSTM only yet the detection happened, so does the tesseract search in other folders for LSTM files? as I had LSTM files in different folders

Thanks.

ShreeDevi Kumar

unread,
Jun 13, 2017, 7:36:54 AM6/13/17
to tesser...@googlegroups.com
tesseract image results -l ara --tessdata-dir ./tessdata --oem 1

uses the LSTM files that are there in ara.traineddata in your tessdata directory.

Just placing lstm files in tesseract folder is not going to change anything.

You need to create a new traineddata with the new lstm files and then test with it.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ibr

unread,
Jun 13, 2017, 7:55:33 AM6/13/17
to tesseract-ocr
seems so, to add or merge the new LSTM files in the traineddata this command to user correct: training/combine_tessdata -o tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm
but that gave me the following:
TessdataManager can't determine which tessdata component is represented by lstmf
TessdataManager combined tesseract data files.
Offset for type  0 (.traineddataconfig                ) is 172
Offset for type  1 (.traineddataunicharset            ) is 2745
Offset for type  2 (.traineddataunicharambigs         ) is 283372
Offset for type  3 (.traineddatainttemp               ) is 288048
Offset for type  4 (.traineddatapffmtable             ) is 30906394
Offset for type  5 (.traineddatanormproto             ) is 30942955
Offset for type  6 (.traineddatapunc-dawg             ) is 31395690
Offset for type  7 (.traineddataword-dawg             ) is 31398292
Offset for type  8 (.traineddatanumber-dawg           ) is 32406214
Offset for type  9 (.traineddatafreq-dawg             ) is 32406256
Offset for type 10 (.traineddatafixed-length-dawgs    ) is -1
Offset for type 11 (.traineddatacube-unicharset       ) is -1
Offset for type 12 (.traineddatacube-word-dawg        ) is -1
Offset for type 13 (.traineddatashapetable            ) is 32407402
Offset for type 14 (.traineddatabigram-dawg           ) is -1
Offset for type 15 (.traineddataunambig-dawg          ) is -1
Offset for type 16 (.traineddataparams-model          ) is 33071948
Offset for type 17 (.traineddatalstm                  ) is 33072647
Offset for type 18 (.traineddatalstm-punc-dawg        ) is 43371656
Offset for type 19 (.traineddatalstm-word-dawg        ) is 43374258
Offset for type 20 (.traineddatalstm-number-dawg      ) is 44380188

any idea? 
thanks


On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote:
tesseract image results -l ara --tessdata-dir ./tessdata --oem 1

uses the LSTM files that are there in ara.traineddata in your tessdata directory.

Just placing lstm files in tesseract folder is not going to change anything.

You need to create a new traineddata with the new lstm files and then test with it.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jun 13, 2017 at 3:17 PM, Ibr <ibr.h...@gmail.com> wrote:
Hi,

when make detection using the tesseract 4.00.00alpha and use the command: tesseract image results -l ara --tessdata-dir ./tessdata --oem 1 the oem here means "Neural nets LSTM only", so there is no argument in tesseract to specify where to find the LSTM files, how the tesseract find them? I used to place the LSTM files inside the tesseract folder, but I tried to detect after I deleted the LSTM files, with the argument --oem 1 which meanst LSTM only yet the detection happened, so does the tesseract search in other folders for LSTM files? as I had LSTM files in different folders

Thanks.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Jun 13, 2017, 8:03:40 AM6/13/17
to tesser...@googlegroups.com
you have to be clear on what files you are combining.

the command you have given is overwriting japanese traineddata - is that what you want to do?

training/combine_tessdata -o tessdata/jpn.traineddata

Look at help for all options of combine_tessdata

Figure out which files (lstm, dawg etc) you want to combine

Give appropriate command options and files to create new traineddata

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Ibr

unread,
Jun 13, 2017, 8:39:07 AM6/13/17
to tesseract-ocr
thanks for the response, well actually I wrote the command wrong, I wanted to combine, also I didn't extract the lstm file before I do the combination, which brings another question.

if I use the tesstrain.sh it will create .lstmf files, correct? but if I used combine_tessdata -e that will create lstm file, so what is the difference between both of them?
I know that lstmf files are substitute for the .tr files, if you gave me little explanation about both I would be grateful, since there were not much of explanation on the web about them

Thanks in advance

ShreeDevi Kumar

unread,
Jun 13, 2017, 9:28:21 AM6/13/17
to tesser...@googlegroups.com
combine_tessdata -e 

extracts the lstm file from the traineddata provided from original training by google.

-----------------
 tesstrain.sh it will create .lstmf files

yes. these are created from the box-tiff pairs created from the training text and fonts

---------------------------

lstmtraining program takes all of these .lstmf files (via the file which has all the .lstmf filenames)
and 
creates intermediate .lstm files and _checkpoint files

-------------------------------
these can be converted to the final .lstm file for use in traineddata
--------------------------
the final .lstm file has to be combined using combine_tessdata to create new traineddata.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Ibr

unread,
Jun 14, 2017, 3:27:30 AM6/14/17
to tesseract-ocr
Thanks
Message has been deleted

Ibr

unread,
Jun 14, 2017, 9:17:24 AM6/14/17
to tesseract-ocr
is this command correct too create the intermediate .lstm and _checlpoint?

training/lstmtraining --model_output ~/tesstutorial/impact_from_small/impact \
   --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt  \
  --continue_from ~/tesstutorial/impact_from_full/jpn.lstm
 

as for --continue_from, its mentioned in here its can be for recognition model which is be .lstm, if not what is the existing model? because when I run the command above it says:-
Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.traineddata, unpacking...
Failed to continue from: /home/ibr/tesstutorial/impact_from_full/jpn.traineddata


On Tuesday, June 13, 2017 at 4:28:21 PM UTC+3, shree wrote:

ShreeDevi Kumar

unread,
Jun 14, 2017, 9:49:51 AM6/14/17
to tesser...@googlegroups.com
You need to extract .lstm from traineddata

eg. (change foldernames to match ur setup)

combine_tessdata -e  ../tessdata/jpn.traineddata jpn.lstm
Extracting tessdata components from ../tessdata/jpn.traineddata
Wrote jpn.lstm
0:config:size=2573, offset=168
1:unicharset:size=280627, offset=2741
2:unicharambigs:size=4676, offset=283368
3:inttemp:size=30618346, offset=288044
4:pffmtable:size=36561, offset=30906390
5:normproto:size=452735, offset=30942951
6:punc-dawg:size=2602, offset=31395686
7:word-dawg:size=1007922, offset=31398288
8:number-dawg:size=42, offset=32406210
9:freq-dawg:size=1146, offset=32406252
13:shapetable:size=664546, offset=32407398
16:params-model:size=699, offset=33071944
17:lstm:size=10299009, offset=33072643
18:lstm-punc-dawg:size=2602, offset=43371652
19:lstm-word-dawg:size=1005930, offset=43374254
20:lstm-number-dawg:size=50, offset=44380184


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Jun 14, 2017 at 6:45 PM, Ibr <ibr.h...@gmail.com> wrote:
is this command correct too create the intermediate .lstm and _checlpoint?

training/lstmtraining --model_output ~/tesstutorial/impact_from_small/impact \
   --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt  \
  --continue_from ~/tesstutorial/impact_from_full/jpn.lstm
 

as for --continue_from, its mentioned in here its can be for recognition model which is be .lstm, if not what is the existing model? because when I run the command above it says:-
Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.traineddata, unpacking...
Failed to continue from: /home/ibr/tesstutorial/impact_from_full/jpn.traineddata


On Tuesday, June 13, 2017 at 4:28:21 PM UTC+3, shree wrote:

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Ibr

unread,
Jun 14, 2017, 9:58:47 AM6/14/17
to tesseract-ocr
yes I already extracted the lstm file and specified that at the argument continue:  --continue_from ~/tesstutorial/impact_from_full/jpn.lstm 
isn't this step should do it?
yet the error keep coming:

Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.lstm, unpacking...
Failed to continue from: /home/ibr/tesstutorial/impact_from_full/jpn.lstm

Thanks for the response

ShreeDevi Kumar

unread,
Jun 14, 2017, 10:53:35 AM6/14/17
to tesser...@googlegroups.com
check that the file is there

ls -l  /home/ibr/tesstutorial/impact_from_full/jpn.lstm

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages