Hello,
I am trying to improve accuracy for my use case, by fine tuning. Currently I'm getting between 80-90% accuracy on my scanned images, and around 60% for images taken via phone.
I'm running on a Jetson Nano, using:
```
tesseract 4.1.1-rc2-21-gf4ef
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1
```
I'm training on a single image, just to understand the mechanism, and learn about it.
I'm using a scanned receipt, as an example, 600dpi. Identity, and imagemagick says it's 1696x3930.
I'm confused a bit by this, as the script still runs, and the error rate keeps dropping.
I've read the tutorials and examples, and the scripts, and it's all too much for now, as I've been at it for about 2-3 weeks now.
There are a couple of things that are still unclear to me, and have some questions:
1. Do I need to create single line images from each image I have? (~3000)
2. would it help if I create ground-truth text files - for the entire image, or should I create only for a single line? (that is I must have tiff, box and ground-truth files for each image)
3. some of the words in my images are not found in the eng.training_files.txt, as such would it speed up/help if I add them?
4. is there a way to do fine tuning with my own images and my own eng.training_files.txt data, without running tesstrain.sh?
I could not find details about how to train/fine tune with own tif/box. Meaning, I have created a folder with my data, and passed it to tesstrain.sh via my_box_tiff_dir, but it's not using those, from what I can tell, as it creates synth data.
As said above, it's unclear to me if I need to generate the ground-truth data as well, do I still need to fiddle/fix the box files, etc.
Sorry if I asked too many questions, I've invested so much time in it, and I'm not sure where exactly I'm doing wrong.
I've followed the steps in few of the questions posted in this group, and I am getting decent results, however, they are not as good as using the traineddata_best on its own.
Steps I've done were:
Method 1
1. create box files via lstmbox and fix any mistakes - tesseract img.tif img --dpi 600 lstmbox
2. extract lstm from eng.traneddata_best
3. run lstmtraining for fine tuning - lstmtraining --continue from...
4. generate eng.traineddata - lstmtraining stop...
Method 2
1. create box files via lstmbox and fix any mistakes - tesseract img.tif img --dpi 600 lstmbox
2. create lstmf files - tesseract img.tif img --dpi 600 lstm.train
3. extract unicharset - unicharset_extractor *.box
4. shapeclustering -F font_properties -U unicharset *.tr
5. mftraining -F font_properties -U unicharset -O eng.unicharset *.tr
6. cntraining *.tr
7. rename inttemp, normproto, pffmtable, shapetable
8. combine_tessdata eng.
Thank you for your support and help with my endeavor.