Request to Samskruth Coders for a technical help

71 views
Skip to first unread message

Dr.BVK Sastry

unread,
Jul 1, 2024, 11:43:42 PM7/1/24
to भारतीयविद्वत्परिषत्

Namaste

 

Request to Samskruth Coders for a technical help .

 

                                         Devanagari Unicode lists the following ‘CHARACTER UNIT’-  called  ‘PRISHTA-MATRA’

Dependent vowel signs -   094E $ DEVANAGARI VOWEL SIGN PRISHTHAMATRA E

character has historic use only

combines with E to form AI, with AA to form O,  and with O to form A

 

How best to use this ‘ allotted number- place holder code’ for current needs beyond ‘ historic archaic value ?

 

The document ref : n3731-kashmiri_n3671-abc (unicode.org)   https://www.unicode.org/wg2/docs/n3731.pdf describes history of this character inclusion as follows:   Doc Type: Working Group Document    Title: Consensus on Kashmiri additions for Devanagari

Source: Michael Everson  Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Replaces: N3480, N3710  Date: 2009-09-18   ::   This document summarizes the Devanagari characters recommended for addition to the UCS by the Kashmiri Ad-Hoc group which met at the WG2 meeting in Tokyo on 2009-10-28. The following 10 characters were agreed for addition.

 

Font development - https://dev.kwayisi.org/apps/unicode/characters/094e.html

 

Question:  Has any one used  this in any of the Samskruth Programming – Document Creation pl?

All available detail on why- how this ‘Character Code got in to the Devanagari Unicode standards would be helpful; along with how Devanagari Programmers have made use of this. 

 

The puzzle I am facing is :  How Unicode ‘Code- alpha numeric unique number allotment logic’  is to be Programmatically used  to CODE –REPRESENT  ‘Character –Combination Modifiers for  <  Swara + Swara >  in a Veracity of Platforms, Programming Languages and converter applications ( which all seem to begin with a Hard Key mapping to a Virtual Key Value by Operating system].

 

The compact statement is : What is Unicode based Programing technicality for Samskruth  ‘SANDHI’ outcome ? ( Without and With Swara is a higher level technicality needed for Text to speech and Speech to Text ]

 

The Technical statement is : What is Programmatic CODING- REPRESENTATION  for Samskruth  ‘SANDHI’ Process ? ( Without and With Swara is a higher level technicality needed for Text to speech and Speech to Text ]

 

 

Resources I have explored in this connection :  

 

·         "" U+094E: Devanagari Vowel Sign Prishthamatra E (Unicode Character) (unicodeplus.com)  

https://unicodeplus.com/U+094E    [The character (Devanagari Vowel Sign Prishthamatra E) is represented by the Unicode codepoint U+094E. It is encoded in the Devanagari block, which belongs to the Basic Multilingual Plane. It was added to Unicode in version 5.2 (October, 2009). It is HTML encoded as &#x094E;.   It is a SPACING MARK. ]  

 

·         Unicode Character 'DEVANAGARI VOWEL SIGN PRISHTHAMATRA E' (U+094E) (fileformat.info)

https://www.fileformat.info/info/unicode/char/094e/index.htm  [character has historic use only

combines with E to form AI, with AA to form O, and with O to form AU  ;   Category : Mark, Spacing Combining ]

 

·         https://www.unicodepedia.com/unicode/devanagari/94e/devanagari-vowel-sign-prishthamatra-e/

[Char type :   ENCLOSING MARK ]     

 

·         - Devanagari Vowel Sign Prishthamatra E, Unicode Number: U+094E 📖 Symbol Meaning Copy & 📋 Paste () SYMBL   https://symbl.cc/en/094E/   Symbol Meaning :   Devanagari Vowel Sign Prishthamatra E. Devanagari.

The symbol “Devanagari Vowel Sign Prishthamatra E” is included in the “Dependent vowel signs” subblock of the “Devanagari” block and was approved as part of Unicode version 5.2 in 2009.

 

·         Indian Script Code for Information Interchange - Wikipedia  https://en.wikipedia.org/wiki/Indian_Script_Code_for_Information_Interchange   [  The Brahmi-derived writing systems have similar structure. So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in Malayalam, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below. One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea.    ISCII is an 8-bit encoding. The lower 128 code points are plain ASCII, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic ATR that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.   

Nukta character ़—code point E9 (233) :   The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have pre-composed forms in Unicode, as shown in the following table.  …….   A code for all the Indian scripts is made possible by their common origin from the Brahmi script. An optimal keyboard overlay for all the Indian scripts, is made possible by the phonetic nature of the alphabet. …..  The 8-bit ISCII code retains the standard ASCII code, while the Indian script keyboard overlay is designed for the standard English QWERTY overlay . This ensures that English can co-exist with the Indian scripts. This approach also makes it feasible to use Indian scripts along with existing English computers and software, so long as 8-bit character codes are allowed.     ….. 

Vowels and Vowel signs (Matras) :    There are separate symbols for all the vowels in Indian scripts which are pronounced independently (either at the beginning of a word, or after a vowel sound). The consonants in the Indian script themselves have an implicit vowel + (a). To indicate a vowel sound other than the implicit one, a vowel-sign (Matra) is attached to the consonant. Thus there are equivalent Matras for all the vowels, excepting the + vowel.  

 

Vowel Omission Sign: Halant #Â:   In Indian scripts consonants are assumed to have an implicit vowel + "a" within them unless an explicit Matra (vowel-sign) is attached. Thus a special sign Halant (#Â) is needed for indicating that the

consonant does not have the implicit + vowel in it.  In Northern languages, the Halant at the end of a word generally gets dropped, though the ending still gets pronounced without a vowel.  ..

This doesn't happen in Southern languages and Sanskrit, where a Halant is always used to indicate a vowel-less ending.  

The ISCII code contains separate vowels and Matras (Vowel signs). While a vowel sign can be used independently, the Matra sign is valid only after a consonant.   ….  In practice, a Halant sign is shown only if the consonants do not change their shape by joining up. Tamil script has no conjuncts, and thus an explicit Halant sign always gets used.   …

 

A Halant is used between consonants to form conjuncts. But many times in Sanskrit and Vedic texts, one may wish to show an Explicit Halant which would be shown on the previous consonant, and which would prevent the consonant from joining with the next one. Two consecutive Halants form an Explicit Halant.  ….

A Soft Halant is formed by typing a Nukta character after a Halant.  In Devanagari the Soft Halant allows retention of the "half form" for the preceding consonant, and prevents it from combining with the following consonant.  

Soft Halant is used in Malayalam along with some consonants to derive separate pure consonant shapes which do not show an attached Halant symbol. ….   In the ISCII code the same Nukta character is thought of as an operator to derive some of the lesser used Sanskrit characters which are not directly available on the Inscript keyboard. A Nukta can be typed after a Halant to form a Soft Halant

 

·         Unique Spellings :   By using only the basic characters in ISCII, there is only one unique way of typing a word. This

would not have been possible if conjuncts like IÉ, jÉ, YÉ etc. had been given separate codes. The spelling of a word is now the phonetic order of the constituent basic characters. This provides a unique spelling for each word, which is not affected by the display rendition. For obtaining unique spellings, Soft Halant, Explicit Halant, and INV characters should not be used. These have been provided only for deriving different display renditions, and are not needed normally. The spelling of a word contains all the information necessary for display composition, which can be automatically done through display algorithms. It becomes possible to type in a text, without even looking at the display. When the tedium of composing goes away, on-line authoring becomes possible, where an author can think out new text while he is typing it. Unique spellings are essential for making spelling checkers and dictionaries. They are also essential to facilitate finding of words in a word-processor, or for information retrieval from a data-base.

 

·          iscii91.pdf (sourceforge.net)          https://varamozhi.sourceforge.net/iscii91.pdf   

·         | devanagari vowel sign prishthamatra e (U+094E) @ Graphemica   https://graphemica.com/%E0%A5%8E    

 

Thanks in advance for the help.

 

Regards

BVK Sastry

 

Narayan Prasad

unread,
Jul 2, 2024, 4:33:05 AM7/2/24
to भारतीयविद्वत्परिषत्
Namaste.
Thank you sir for pointing to this unicode entity.
Hopefully, it is used in case of Bengali, Oriya, Malayalam, Tamil. Examples shown for the devanagari कॆ के कै कॊ को कौ

কে কৈ কো কৌ
କେ କୈ କୋ କୌ
கெ கே கை கொ
 கோ கௌ
കെ കേ കൈ കൊ കോ കൌ

Regards
Narayan Prasad

BVK Sastry (G-S-Pop)

unread,
Jul 3, 2024, 12:49:06 AM7/3/24
to bvpar...@googlegroups.com

Namaste Narayan Prasad ji

 

1.  On

 

<Thank you sir for pointing to this unicode entity. Hopefully, it is used in case of Bengali, Oriya, Malayalam, Tamil.>

 

    Hope fully -  is the issue !  Can we take a reality check ?? Starting with ‘ PRISHTA MAATRA’ ?

 

BVK Sastry (1) :   The illustrations provided by you are ‘ locked in the  OS/ App/ Font – Technology bounds of a specific nature. It works in a limited sense; where ‘damage outweighs the benefit utility achieved’.

 

I prefer not to detail on this; Techno-Linguists have full knowledge and better qualified to assess what this observation points to.

 

In the illustration you have provided, the use of  Unicode:  ‘Halant’,/ Virama / Devanagari, across Indic Language Scripts for ‘Maatra- Combination ( Modification/ deletion) ’ seems to use  CODES like  < 09c7,    0b4d, 0c4d, 004d, 0c4d  >  appropriate for the VISUAL RENDERING, and under specificity of FONT and OS / Text Editor for Different scripts. The copy-paste function does not deliver the document integrity across devices or apps.

 

BVK Sastry (2) :   The Unicode Entry (Year 2000 onwards) exists because Unicode appropriated ‘ISCII’- standards (formulated and freezed around Year 1979). 

The  ‘entity’ –  for getting a unique alpha-numeric code in standards  needs a strong backing /lobby/hard evidence/ language logic.

What might have been this ?  may be best known to ISCII standards committee and Indic language technology – experts.

Is it (Manu-) Scripting standards  evidence based,  language-logic, Future provision ? Historicity, Scholars obsessional preference ??

 

Now,  given the fact of Techno-Linguistic standards providing  a ‘Standard Alpha numeric Code (- a place holder) ’ for  the ‘Language- Visuals – Symbol / Process - Representation’, the practical issue is:

How has the provided symbol  been used ?

How is symbol likely proposed to be used ? in Applications ?

The argument that ‘ a historic value symbol was requested for incorporation in standards sounds illogical. That way the Unicode place holder used by ‘Character symbols  of  CJK ( Chinese- Japanese- Korean) family  outweighs Indic group place holders allotment !

 

‘THIS IS IMPACT- REVIEW ASSESSMENT OF GIVEN -TECHNOLOGY ON DIGITAL INDIC TEXT-REFERRALS ON STUDY- RESEARCH – EDUCATION DEPENDENT ON USE OF TEXT-DOCUMENTS, ADAPTING ROMANIZATION/ TRANSLITERATION / FONT – TECH ISSUES. The Impacted areas being : Printed texts composed using software’s, E-books, Research – Referential , MT, AI, OCR….).  

A Simple Example:  The most commonly used SOCIALLY IMPACTING - VEDIC TEXT: PURUSHA SUKTA ( Rig-Veda:  10-90) does not have a common –uniform text ( By Indic Scripts or Transliteration conventions) : shape and frame for firming up  research work. When primary referral text stands at variance, the interpreters use their preferred reading to interpret ‘Culture’ exercising ‘Academic Freedom’.  

In such cases, the dialogue between a ‘Practicing Vedic Traditionalist reciting –applying Purusha sukta’ per tradition and ‘Professor- Academic teaching Same Purusha sukta in Class room- in isolation and alienated context – language – objective results in ‘CULTURE CHAOS’  for  ‘CLASS- MASS OF STUDENTS’.  

 

Why is this deliberation and for what goal ? :

 

GOAL: To get a ‘DIGITAL DOCUMENT – REPRESENTATION UNIFORMITY ’ Validated Human Language Document for True- Total – Natural Native equivalence.

 

This is  necessity by changes in the Technology facilitating / impacting ‘Language Document Transition-Transmission’ across time.

Muscle Memory and Mind Memory of Language-Basics has undergone changes. Writing has yielded to Keying in. voiced reading is replaced by Screen reading/ voice file play. The integrity of < Eyes- Ears- Mouth- Hand : The four skills of Language : Read- Listen- Speak- Write> are broken and shortened to < See- Key in – Listen>. Smarter Machines seem to dull the human sensorium-skills.

   

    The Validation of current period digital document needs referencing to Human Language Framework of Original document

     by Language  Grammars for Semiotic Appropriateness [Accuracy + Authenticity]-PRACTICAL SOCIO-CULTURAL CONNECT-

     CONTINUITY.

        Techno-Linguists cannot dictate and mutate Human Language standards by  force fitting in to Tech-limitations.

         If Techno-linguists have caused such a damage, then they are responsible for doing course corrections and set right the

        damages caused to transmission of ‘Classical Language Resources in Tech-media.

        Making  ‘short representation of Human Language Basics in Tech- device framework and design,  limited to  constraints of

        ‘historic visual Script in Society’ needs revisit.

        When such short/ unused/ unusable codes occupy place holder positions in standards, there is necessity to address the same.

 

Regards

BVK Sastry

 --
You received this message because you are subscribed to the Google Groups "भारतीयविद्वत्परिषत्" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bvparishat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bvparishat/f1ed765f-7547-4964-8d47-b280b4591053n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages