Mapping font codes to glyphs - seek volunteer with windows comp

10 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Oct 12, 2021, 10:08:39 AM10/12/21
to sanskrit-ocr, sanskrit-programmers
There is a problem and a solution (read below mails from bottom). The solution requires a windows computer (fontmatrix on linux failed for me) and producing a table of about 500 entries.

Would someone volunteer?  (Multiple people can coordinate and contribute.) The file you produce would unlock multiple useful texts published by the vedicreserve folks.

---------- Forwarded message ---------
Date: Tue, Oct 12, 2021 at 5:05 PM

yes. The font is Unicode. Simple extraction will not work. Need to extract text as CIDs (glyphIDs) and then convert. I have the mechanism for extraction. The CID to text map has to be created which is pretty long. See attached image. It starts from ID 438 (ष्ट) till 946 (ष्ट्व्र). The format is

438<TAB(s)>ष्ट
439<TAB(s)>ष्ठ
...
946<TAB(s)>ष्ट्व्र

If you can get it done, wil import the data and easily convert. I use VOLT (free from MSoft) or Fontmatrix to get this data.

Image 2.png






On Tue, Oct 12, 2021 at 7:03 AM विश्वासो वासुकिजः (Vishvas Vasuki) :
Another request - A course I'm currently taking cited https://vedicreserve.miu.edu/sthapatya_veda/svayambhuvagama.pdf pretty heavily - do you have a way of extracting plain text thence?

Reply all
Reply to author
Forward
0 new messages