--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/50c6b233-602e-4479-a518-3bfd6baa10c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Re smaller traineddata size, it could possibly be related to the word list dictionary size.You can unpack the original traineddata and compare the word list size with the one you used.
-rw-r--r-- 1 klein staff 11689099 Dec 7 21:22 eng.lstm
-rw-r--r-- 1 klein staff 4738 Dec 7 21:22 eng.lstm-number-dawg
-rw-r--r-- 1 klein staff 4322 Dec 7 21:22 eng.lstm-punc-dawg
-rw-r--r-- 1 klein staff 1012 Dec 7 21:22 eng.lstm-recoder
-rw-r--r-- 1 klein staff 6360 Dec 7 21:22 eng.lstm-unicharset
-rw-r--r-- 1 klein staff 3694794 Dec 7 21:22 eng.lstm-word-dawg
-rw-r--r-- 1 klein staff 80 Dec 7 21:22 eng.version -- CONTENT is 4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
Now I tried to unpack the one I created by adding the characters, and I get
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx eng.lstm is missing!
-rw-r--r-- 1 klein staff 3506 Dec 7 21:26 eng.lstm-number-dawg
-rw-r--r-- 1 klein staff 4322 Dec 7 21:26 eng.lstm-punc-dawg
-rw-r--r-- 1 klein staff 1030 Dec 7 21:26 eng.lstm-recoder
-rw-r--r-- 1 klein staff 9379 Dec 7 21:26 eng.lstm-unicharset
-rw-r--r-- 1 klein staff 4153402 Dec 7 21:26 eng.lstm-word-dawg
-rw-r--r-- 1 klein staff 12 Dec 7 21:26 eng.version -- CONTENT IS '4.00.00alpha'
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0dc37684-c454-4993-9387-ad641f22f016%40googlegroups.com.
But this eng.traineddata was 5MB when the original one was 15.4MB.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1ad4687d-395d-476c-90c4-05d4b99a47cb%40googlegroups.com.
Please check the last section on
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%40googlegroups.com.
I have the same problem, why not the new fine tuned traineddata include the old wordlist? It suppose to do so. I followed the instructions in the wiki but I got the same issue. Any help?
--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/QrEC7IWnwnY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/476ba4aa-8404-48d2-a1b5-b1bfc3940458%40googlegroups.com.