FIlipino character set (alphabet) support

67 views
Skip to first unread message

Constantine Dokolas

unread,
Sep 20, 2024, 12:03:47 PM9/20/24
to tesseract-ocr
Hi, everyone.

I'm looking into Filipino support by Tesseract OCR. It appears that at least Ñ/ñ is not supported. They should as you can see here.

I'm being told that other latin characters are also used, like those in Spanish. Is this true?

Thanks in advance,
C.D.


Tom Morris

unread,
Sep 22, 2024, 2:29:13 AM9/22/24
to tesseract-ocr
On Friday, September 20, 2024 at 12:03:47 PM UTC-4 cdok...@gmail.com wrote:

I'm looking into Filipino support by Tesseract OCR. It appears that at least Ñ/ñ is not supported. They should as you can see here.

I'm being told that other latin characters are also used, like those in Spanish. Is this true?

The Filipino support definitely looks incomplete. Neither fil.unicharset [1] nor the training text [2] includes. Since it sounds like they are principally used for Spanish loan words, one solution might be to use both languages (ie fil+esp). You could also try the generic Latin script data.

Tom

Constantine Dokolas

unread,
Sep 22, 2024, 9:00:26 AM9/22/24
to tesseract-ocr
Thanks for the feedback.

I've already tried with "fil+spa" with no success :(

One thing that worries me is that I cannot find one sample filipino text image with Ñ/ñ on it, just to have an independently produced sample. All I have is a couple of small snippets of text which produce the plain characters only.

C.D.
Reply all
Reply to author
Forward
0 new messages