Query possible characters are runtime

20 views
Skip to first unread message

Matt Hill

unread,
Aug 10, 2015, 2:49:17 PM8/10/15
to tesseract-ocr
Is it possible to find out what characters are included in a language set?  Ideally, I'm looking for some function that gives me all possible string values in the charset.  For example, if I just trained Tesseract with the characters ABC123 in my language set, I'd like to get a list of these 6 characters.

I see this function in baseapi.h

  // Returns true if utf8_character is defined in the UniCharset.
  bool IsValidCharacter(const char *utf8_character);

But I'd have to potentially iterate through every utf-8 character to get what I need.  Are there any other ways that would work?
Reply all
Reply to author
Forward
0 new messages