The rules are not bidirectional, so if you want 'rn' to be considered when 'm' is detected and vise versa you need a rule for each.
Version 3.03 and on supports a new, simpler format for the unicharambigs file:
v2
'' " 1
m rn 0
iii m 0
In this format, the "error" and "correction" are simple utf-8 strings separated by a space, and, after another space, the same type specifier as v1 (0 for optional and 1 for mandatory substitution). Note the downside of this simpler format is that Tesseract has to encode the utf-8 strings into the components of the unicharset. In complex scripts, this encoding may be ambiguous. In this case, the encoding is chosen such as to use the least utf-8 characters for each component, ie the shortest unicharset components will make up the encoding.
Like most other files used in training, the 'unicharambigs' file must be encoded as UTF8, and must end with a newline character. The unicharambigs format is also described in the unicharambigs(5) man page.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cb707912-5c46-46c8-8791-340f84e6421a%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/VXdCSnno06w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YxsYjJuvCpc0rPY56ZB2bWo_XFDAY_rzP13k4rD20ZbdA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEH3Qhs1QK3yoAmqR%3Dw-%2B9Bd_BNYgpoNxf%2BCaFNaE1k2zA%40mail.gmail.com.
Tom,thanks for the hints. Just now I tested the eng.unicharambigs created by me and found workable. - attached files will speak itself. I am happy to note that eng.unicharambigs works fine. also attached output "unicharamtest.txt" for perusal - in which however I noticed that last line "luck good" did not changed to "good luck" - where I made mistake?
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7Yyijd2jjzytP8UOytOXUi8YwE6o%2BnzEpVyB1BZyYWBiAQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEHS%2BcQ1LFWCBYbhmX8jH2xc9xSGbAs6PqRaefs0vTjdQA%40mail.gmail.com.