Can :traineddata" for Tesseract 3 be used for Tesseract 4

119 visningar
Hoppa till det första olästa meddelandet

chandra churh chatterjee

oläst,
13 juni 2018 09:16:302018-06-13
till tesseract-ocr
I have trained tesseract 3 with 64 fonts using respective box and .tr files, But now i want to use the same trained data for training tesseract 4 after creating the starter trained data using the "

Using tesstrain

The setup for running tesstrain.sh is the same as for base Tesseract. Use --linedata_only option for LSTM training. Note that it is beneficial to have more training text and make more pages though, as neural nets don't generalize as well and need to train on something similar to what they will be running on. If the target domain is severely limited, then all the dire warnings about needing a lot of training data may not apply, but the network specification may need to be changed.

Training data is created using tesstrain.sh as follows: Note that your fonts location may vary.

training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain

The above command makes LSTM training data equivalent to the data used to train base Tesseract for English. For making a general-purpose LSTM-based OCR engine, it is woefully inadequate, but makes a good tutorial demo.

Now try this to make eval data for the 'Impact' font:

training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata \ 
  --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval"



Now i want to proceed further using my previous trained data to do the training but the problem is that the previous trained data had .tr files and box files but tesseract 4 requires .lstmf files .
Requesting for any solution.

ShreeDevi Kumar

oläst,
13 juni 2018 11:08:072018-06-13
till tesser...@googlegroups.com
If you have box tiff pairs in tesseract4 format you can generate the lstmf files by running

tesseract   lang.file.exp0.tif     lang.file.exp0   lstm.train 

lstm.train is  a config file.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

chandra churh chatterjee

oläst,
14 juni 2018 01:36:072018-06-14
till tesser...@googlegroups.com
can you tell me from which directory we have to run the following command and what will be the following arguments if we are using our trained data which contains files as follows:
-07-2016     12:45             11 digits.f4.exp0.txt
-a----       08-07-2016     12:37            198 digits.f5.exp0.box
-a----       08-07-2016     12:10          14044 digits.f5.exp0.jpg
-a----       08-07-2016     12:45          16309 digits.f5.exp0.tr
-a----       08-07-2016     12:45             11 digits.f5.exp0.txt
-a----       08-07-2016     12:31            188 digits.f6.exp0.box
-a----       23-06-2016     13:06           9824 digits.f6.exp0.jpg
-a----       08-07-2016     12:45          17538 digits.f6.exp0.tr
-a----       08-07-2016     12:45             11 digits.f6.exp0.txt
-a----       08-07-2016     12:38            199 digits.f7.exp0.box
-a----       08-07-2016     12:11          13178 digits.f7.exp0.jpg
-a----       08-07-2016     12:45          16019 digits.f7.exp0.tr
-a----       08-07-2016     12:45             11 digits.f7.exp0.txt
-a----       08-07-2016     12:38            198 digits.f8.exp0.box
-a----       23-06-2016     13:06           9485 digits.f8.exp0.jpg
-a----       08-07-2016     12:45          17078 digits.f8.exp0.tr
-a----       08-07-2016     12:45             11 digits.f8.exp0.txt
-a----       08-07-2016     12:38            199 digits.f9.exp0.box
-a----       08-07-2016     12:11          13411 digits.f9.exp0.jpg
-a----       08-07-2016     12:45          15916 digits.f9.exp0.tr
-a----       08-07-2016     12:45             11 digits.f9.exp0.txt
-a----       08-07-2016     12:57            543 digits.font_properties
-a----       08-07-2016     12:59         184521 digits.inttemp
-a----       08-07-2016     13:00           4832 digits.normproto
-a----       08-07-2016     12:59             84 digits.pffmtable
-a----       08-07-2016     12:59           6520 digits.shapetable
-a----       08-07-2016     13:01         196755 digits.traineddata
-a----       08-07-2016     12:59            658 digits.unicharset
-a----       08-07-2016     12:55            648 unicharset

how to convert these files and from where to run the command as sugested by you?

chandra churh chatterjee

oläst,
14 juni 2018 05:56:012018-06-14
till tesser...@googlegroups.com
How to convert the images as stated above into fonts for tesstrain.sh command runnning which generates images files along with box and .lstmf files?

Svara alla
Svara författaren
Vidarebefordra
0 nya meddelanden