Query possible characters are runtime

20 views

Skip to first unread message

Matt Hill

unread,

Aug 10, 2015, 2:49:17 PM8/10/15

to tesseract-ocr

Is it possible to find out what characters are included in a language set? Ideally, I'm looking for some function that gives me all possible string values in the charset. For example, if I just trained Tesseract with the characters ABC123 in my language set, I'd like to get a list of these 6 characters.

I see this function in baseapi.h

// Returns true if utf8_character is defined in the UniCharset.

bool IsValidCharacter(const char *utf8_character);

But I'd have to potentially iterate through every utf-8 character to get what I need. Are there any other ways that would work?

Reply all

Reply to author

Forward

0 new messages