Encoding Issues in Converting HK to Devanagari (संजय उवाच)

33 views
Skip to first unread message

Mārcis Gasūns

unread,
Oct 22, 2013, 2:04:26 AM10/22/13
to sanskrit-p...@googlegroups.com
Namaste,

  What could have gone wrong here?

Bhagavadgita 2
with the commentaries of Sridhara, Madhusudana, Visvanatha and Baladeva
Input by ... (Gaudiya Grantha Mandira)
****************************************************
भ्ग् २.१

संजय उवाच
तं तथा कृपयाविष्टम् अश्रुपूर्णाकुलेक्षणम् ।
विषीदन्तम् इदं वाक्यम् उवाच मधुसूदनः ॥१॥

Other texts of the same author work just perfect.

  Bhagavadgita 14
with the commentaries of Sridhara, Visvanatha and Baladeva
Input by ... (Gaudiya Grantha Mandira)
****************************************************
भ्ग् १४.१

श्री-भगवान् उवाच
परं भूयः प्रवक्ष्यामि ज्ञानानां ज्ञानम् उत्तमम् ।
यज् ज्ञात्वा मुनयः सर्वे परां सिद्धिम् इतो गताः ॥१॥

श्रीधरः :

  UTF8 in the Chrome encoding view did not help. I'm converting it with a php script in my browser, 1/3 of texts get issues. 
On an Indian PC - no issues at all. My local language in PC - Russian. Maybe that's the reason.

Shreevatsa R

unread,
Oct 22, 2013, 2:52:27 AM10/22/13
to Mārcis Gasūns, sanskrit-programmers
It is difficult to debug issues like this without knowing full details, such as the contents of the source text, the code of the tool used for transliteration, where this output is being viewed, and all "intermediate levels" of the text.

Usually the issue is that a stream of bytes is being interpreted according to a different encoding than the one it is actually in, such as if a stream of UTF-8 bytes is interpreted as being in some Russian-specific encoding like KOI8-R. (https://en.wikipedia.org/wiki/Mojibake )

When the text is being viewed in a browser, *if* all the steps before the browser got hold of it went right and only the browser misdetected the encoding, then yes, over-riding the auto-detected encoding in the browser will usually fix things. (E.g. in Chrome, Settings -> Tools -> Encoding -> UTF-8.) As you said that didn't work here, the issue is probably at some earlier level, but there are so many layers in between that it's almost impossible to debug without seeing the intermediate levels.

Some very general links for information's sake (as this is a mailing list for programmers after all, to whom it should be of interest):
The "Introduction to Unicode" in the Python documentation (just the first section, before "Python’s Unicode Support")
"Dizzying but invisible depth" by Jean-Baptiste Queru, an essay about computers in general (useful whenever one is trying to say "there are so many layers..." as above.)


--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages