Seek help in converting a few kannada files to unicode

27 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Oct 3, 2021, 8:47:14 AM10/3/21
to sanskrit-programmers, Pooja पूजा प्रकाशसुता रामायणप्रिया, Bhasha IME
namo vaH.

pUjA (cc-ed) had many years ago kindly converted some files pertaining to kumAra-vyAsa-bhArata to unicode and provided them to me. I fixed few conversion errors today by text replacement, and there remain a few others which are amenable to similar replacement. But there are some big chunks which require a more systematic approach.

Could some help? See below -
A pull request would be great.


--
--
Vishvas /विश्वासः

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Oct 12, 2021, 10:08:39 AM10/12/21
to sanskrit-ocr, sanskrit-programmers
There is a problem and a solution (read below mails from bottom). The solution requires a windows computer (fontmatrix on linux failed for me) and producing a table of about 500 entries.

Would someone volunteer?  (Multiple people can coordinate and contribute.) The file you produce would unlock multiple useful texts published by the vedicreserve folks.

---------- Forwarded message ---------
Date: Tue, Oct 12, 2021 at 5:05 PM

yes. The font is Unicode. Simple extraction will not work. Need to extract text as CIDs (glyphIDs) and then convert. I have the mechanism for extraction. The CID to text map has to be created which is pretty long. See attached image. It starts from ID 438 (ष्ट) till 946 (ष्ट्व्र). The format is

438<TAB(s)>ष्ट
439<TAB(s)>ष्ठ
...
946<TAB(s)>ष्ट्व्र

If you can get it done, wil import the data and easily convert. I use VOLT (free from MSoft) or Fontmatrix to get this data.

Image 2.png






On Tue, Oct 12, 2021 at 7:03 AM विश्वासो वासुकिजः (Vishvas Vasuki) :
Another request - A course I'm currently taking cited https://vedicreserve.miu.edu/sthapatya_veda/svayambhuvagama.pdf pretty heavily - do you have a way of extracting plain text thence?

उज्ज्वल राजपूत

unread,
Oct 14, 2021, 1:24:40 AM10/14/21
to sanskrit-programmers
Another request - A course I'm currently taking cited https://vedicreserve.miu.edu/sthapatya_veda/svayambhuvagama.pdf pretty heavily - do you have a way of extracting plain text thence?

ए॒वम् अपि॑ सु॒लभ॒स् तत्र॑ पा॒ठः खलु॑। pdftext -raw इत्य॒नेन॑ .txt-स॒ञ्चि॒का ज॑न्यते। तत्रास्था॑ने अवकाश-नवपङ्क्तिनिर्दे॒शाभ्या॑म् ( \n) अ॒न्यो न कोऽपि॒ दोषो॑ दृश्यते।

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Oct 14, 2021, 3:07:47 AM10/14/21
to sanskrit-programmers, venkaTeshaH BhashaIME वेङ्कटेशः पराङ्कुशसूनुरामानुज-सहकर्ता
On Thu, Oct 14, 2021 at 10:54 AM उज्ज्वल राजपूत <ujjwal....@gmail.com> wrote:
Another request - A course I'm currently taking cited https://vedicreserve.miu.edu/sthapatya_veda/svayambhuvagama.pdf pretty heavily - do you have a way of extracting plain text thence?

ए॒वम् अपि॑ सु॒लभ॒स् तत्र॑ पा॒ठः खलु॑। pdftext -raw इत्य॒नेन॑ .txt-स॒ञ्चि॒का ज॑न्यते। तत्रास्था॑ने अवकाश-नवपङ्क्तिनिर्दे॒शाभ्या॑म् ( \n) अ॒न्यो न कोऽपि॒ दोषो॑ दृश्यते।


pdftotext

धन्योऽस्मि सुहृत्!

 

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/67c778ca-95d2-4bc7-b49e-c36736e9b274n%40googlegroups.com.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Oct 15, 2021, 9:31:15 AM10/15/21
to sanskrit-programmers, Pooja पूजा प्रकाशसुता रामायणप्रिया, Bhasha IME
(The above was solved thanks to shrI venkaTesh)
Reply all
Reply to author
Forward
0 new messages