Combining tessdata files Error opening unicharset file

1,424 vistas
Ir al primer mensaje no leído

Miguel Goyanes

no leída,
7 may 2015, 2:00:45 a.m.7/5/2015
para tesser...@googlegroups.com
Hello.

I'm reading the tutorial on hoe to create trainedata and I came across the error in the title jsut at the last part.

I'm running Ubuntu 14.04 LTS on a Virtual machine.
Tesseract version is 3.02.02 abd leptonica is 1.72

Hi have a por.monospaced.exp0.tif file and here are the steps I've made and resulting output

miguel@miguel:~/Desktop/TessTests$ tesseract por.monospaced.exp0.tif por.monospaced.exp0 batch.nochop makebox
Tesseract Open Source OCR Engine v3.02.02 with Leptonica


Then:

miguel@miguel:~/Desktop/TessTests$ tesseract por.monospaced.exp0.tif por.monospaced.exp0 box.train
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
APPLY_BOXES:
   Boxes read from boxfile:      59
   Found 59 good blobs.
   Leaving 2 unlabelled blobs in 0 words.
TRAINING ... Font name = monospaced
Generated training data for 4 words

And  

miguel@miguel:~/Desktop/TessTests$ unicharset_extractor por.monospaced.exp0.box
Extracting unicharset from por.monospaced.exp0.box
Wrote unicharset file ./unicharset.

I've created the font_properties file
 
miguel@miguel:~/Desktop/TessTests$ echo monospaced 0 0 0 0 0 > font_properties

And

miguel@miguel:~/Desktop/TessTests$ shapeclustering -F font_properties -U unicharset por.monospaced.exp0.tr
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Distance = 0.000000: Stopped with 1 merged, min dist 0.165217
Master shape_table:Number of shapes = 20 max unichars = 2 number with multiple unichars = 1
miguel@miguel:~/Desktop/TessTests$ mftraining -F font_properties -U unicharset -O por.unicharset por.monospaced.exp0.tr
Read shape table shapetable of 20 shapes
Done!

And then

miguel@miguel:~/Desktop/TessTests$ cntraining por.monospaced.exp0.tr
Clustering ...

Writing normproto ...

Finally,in the last command I'm getting the error:

miguel@miguel:~/Desktop/TessTests$ combine_tessdata pass.
Combining tessdata files
Error opening unicharset file
Error combining tessdata files into pass.traineddata


What am I doing  wrong?

I've attached all the files.

Thanks 























 
training.zip

Quan Nguyen

no leída,
8 may 2015, 3:32:07 p.m.8/5/2015
para tesser...@googlegroups.com
It appears that you left out some steps, such as file rename.

If por is the language code, then the combine command would be:

combine_tessdata por.

Use a training tool, such as jTessBoxEditor, if possible.

Miguel Goyanes

no leída,
3 jun 2015, 12:30:55 p.m.3/6/2015
para tesser...@googlegroups.com
Sorry for the late response.

Yep. Renamed and all went fine.

Thanks.

ahmed.ba...@gmail.com

no leída,
27 jul 2017, 11:46:17 a.m.27/7/2017
para tesseract-ocr
hello,


I have already tried this step but finnaly i got this error :


Combining tessdata files
Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.
Error combining tessdata files into por.traineddata
Version string:4.00.00alpha
1:unicharset:size=1124, offset=192
23:version:size=12, offset=1316



please can you help me !
 



ahmed barbouche .










 

ShreeDevi Kumar

no leída,
27 jul 2017, 11:50:27 a.m.27/7/2017
para tesser...@googlegroups.com
what command did you use? 

make sure that all components are there as listed.

looks like only the unicharset was available for building your traineddata.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0c123c19-01cd-469f-97f7-3e7d0fc331a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Se borró el mensaje

ahmed.ba...@gmail.com

no leída,
28 jul 2017, 6:47:49 a.m.28/7/2017
para tesseract-ocr
Hi,


My objective is to make a new trained file,  i follow the process "how make raining- tessract with commande on linux", ( PDF, tif , fix

the size of each alphabet with jtestbox , then make uncharset ...  ) i have all file ready , but in final  i have probleme when i set

command combine_tessdata por.    // por is the name of the file 


this is the error shown :



Combining tessdata files
Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.
Error combining tessdata files into por.traineddata
Version string:4.00.00alpha


thanks

Barbouche.


ahmed.ba...@gmail.com

no leída,
28 jul 2017, 6:53:26 a.m.28/7/2017
para tesseract-ocr
This my essay
my essay.rar

ShreeDevi Kumar

no leída,
28 jul 2017, 11:51:58 a.m.28/7/2017
para tesser...@googlegroups.com
You need to mv or rename the files with por. prefix

then when you use combine_tessdata command it will use all por. files to create traineddata.


    mv ${TRAINING_DIR}/inttemp ${TRAINING_DIR}/${LANG_CODE}.inttemp
    mv ${TRAINING_DIR}/shapetable ${TRAINING_DIR}/${LANG_CODE}.shapetable
    mv ${TRAINING_DIR}/pffmtable ${TRAINING_DIR}/${LANG_CODE}.pffmtable

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jul 28, 2017 at 4:23 PM, <ahmed.ba...@gmail.com> wrote:
This my essay

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos