Request to Samskruth Coders for a technical help

28 views
Skip to first unread message

Dr.BVK Sastry

unread,
Jul 1, 2024, 11:43:42 PM (22 hours ago) Jul 1
to भारतीयविद्वत्परिषत्

Namaste

 

Request to Samskruth Coders for a technical help .

 

                                         Devanagari Unicode lists the following ‘CHARACTER UNIT’-  called  ‘PRISHTA-MATRA’

Dependent vowel signs -   094E $ DEVANAGARI VOWEL SIGN PRISHTHAMATRA E

character has historic use only

combines with E to form AI, with AA to form O,  and with O to form A

 

How best to use this ‘ allotted number- place holder code’ for current needs beyond ‘ historic archaic value ?

 

The document ref : n3731-kashmiri_n3671-abc (unicode.org)   https://www.unicode.org/wg2/docs/n3731.pdf describes history of this character inclusion as follows:   Doc Type: Working Group Document    Title: Consensus on Kashmiri additions for Devanagari

Source: Michael Everson  Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Replaces: N3480, N3710  Date: 2009-09-18   ::   This document summarizes the Devanagari characters recommended for addition to the UCS by the Kashmiri Ad-Hoc group which met at the WG2 meeting in Tokyo on 2009-10-28. The following 10 characters were agreed for addition.

 

Font development - https://dev.kwayisi.org/apps/unicode/characters/094e.html

 

Question:  Has any one used  this in any of the Samskruth Programming – Document Creation pl?

All available detail on why- how this ‘Character Code got in to the Devanagari Unicode standards would be helpful; along with how Devanagari Programmers have made use of this. 

 

The puzzle I am facing is :  How Unicode ‘Code- alpha numeric unique number allotment logic’  is to be Programmatically used  to CODE –REPRESENT  ‘Character –Combination Modifiers for  <  Swara + Swara >  in a Veracity of Platforms, Programming Languages and converter applications ( which all seem to begin with a Hard Key mapping to a Virtual Key Value by Operating system].

 

The compact statement is : What is Unicode based Programing technicality for Samskruth  ‘SANDHI’ outcome ? ( Without and With Swara is a higher level technicality needed for Text to speech and Speech to Text ]

 

The Technical statement is : What is Programmatic CODING- REPRESENTATION  for Samskruth  ‘SANDHI’ Process ? ( Without and With Swara is a higher level technicality needed for Text to speech and Speech to Text ]

 

 

Resources I have explored in this connection :  

 

·         "" U+094E: Devanagari Vowel Sign Prishthamatra E (Unicode Character) (unicodeplus.com)  

https://unicodeplus.com/U+094E    [The character (Devanagari Vowel Sign Prishthamatra E) is represented by the Unicode codepoint U+094E. It is encoded in the Devanagari block, which belongs to the Basic Multilingual Plane. It was added to Unicode in version 5.2 (October, 2009). It is HTML encoded as &#x094E;.   It is a SPACING MARK. ]  

 

·         Unicode Character 'DEVANAGARI VOWEL SIGN PRISHTHAMATRA E' (U+094E) (fileformat.info)

https://www.fileformat.info/info/unicode/char/094e/index.htm  [character has historic use only

combines with E to form AI, with AA to form O, and with O to form AU  ;   Category : Mark, Spacing Combining ]

 

·         https://www.unicodepedia.com/unicode/devanagari/94e/devanagari-vowel-sign-prishthamatra-e/

[Char type :   ENCLOSING MARK ]     

 

·         - Devanagari Vowel Sign Prishthamatra E, Unicode Number: U+094E 📖 Symbol Meaning Copy & 📋 Paste () SYMBL   https://symbl.cc/en/094E/   Symbol Meaning :   Devanagari Vowel Sign Prishthamatra E. Devanagari.

The symbol “Devanagari Vowel Sign Prishthamatra E” is included in the “Dependent vowel signs” subblock of the “Devanagari” block and was approved as part of Unicode version 5.2 in 2009.

 

·         Indian Script Code for Information Interchange - Wikipedia  https://en.wikipedia.org/wiki/Indian_Script_Code_for_Information_Interchange   [  The Brahmi-derived writing systems have similar structure. So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in Malayalam, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below. One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea.    ISCII is an 8-bit encoding. The lower 128 code points are plain ASCII, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic ATR that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.   

Nukta character ़—code point E9 (233) :   The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have pre-composed forms in Unicode, as shown in the following table.  …….   A code for all the Indian scripts is made possible by their common origin from the Brahmi script. An optimal keyboard overlay for all the Indian scripts, is made possible by the phonetic nature of the alphabet. …..  The 8-bit ISCII code retains the standard ASCII code, while the Indian script keyboard overlay is designed for the standard English QWERTY overlay . This ensures that English can co-exist with the Indian scripts. This approach also makes it feasible to use Indian scripts along with existing English computers and software, so long as 8-bit character codes are allowed.     ….. 

Vowels and Vowel signs (Matras) :    There are separate symbols for all the vowels in Indian scripts which are pronounced independently (either at the beginning of a word, or after a vowel sound). The consonants in the Indian script themselves have an implicit vowel + (a). To indicate a vowel sound other than the implicit one, a vowel-sign (Matra) is attached to the consonant. Thus there are equivalent Matras for all the vowels, excepting the + vowel.  

 

Vowel Omission Sign: Halant #Â:   In Indian scripts consonants are assumed to have an implicit vowel + "a" within them unless an explicit Matra (vowel-sign) is attached. Thus a special sign Halant (#Â) is needed for indicating that the

consonant does not have the implicit + vowel in it.  In Northern languages, the Halant at the end of a word generally gets dropped, though the ending still gets pronounced without a vowel.  ..

This doesn't happen in Southern languages and Sanskrit, where a Halant is always used to indicate a vowel-less ending.  

The ISCII code contains separate vowels and Matras (Vowel signs). While a vowel sign can be used independently, the Matra sign is valid only after a consonant.   ….  In practice, a Halant sign is shown only if the consonants do not change their shape by joining up. Tamil script has no conjuncts, and thus an explicit Halant sign always gets used.   …

 

A Halant is used between consonants to form conjuncts. But many times in Sanskrit and Vedic texts, one may wish to show an Explicit Halant which would be shown on the previous consonant, and which would prevent the consonant from joining with the next one. Two consecutive Halants form an Explicit Halant.  ….

A Soft Halant is formed by typing a Nukta character after a Halant.  In Devanagari the Soft Halant allows retention of the "half form" for the preceding consonant, and prevents it from combining with the following consonant.  

Soft Halant is used in Malayalam along with some consonants to derive separate pure consonant shapes which do not show an attached Halant symbol. ….   In the ISCII code the same Nukta character is thought of as an operator to derive some of the lesser used Sanskrit characters which are not directly available on the Inscript keyboard. A Nukta can be typed after a Halant to form a Soft Halant

 

·         Unique Spellings :   By using only the basic characters in ISCII, there is only one unique way of typing a word. This

would not have been possible if conjuncts like IÉ, jÉ, YÉ etc. had been given separate codes. The spelling of a word is now the phonetic order of the constituent basic characters. This provides a unique spelling for each word, which is not affected by the display rendition. For obtaining unique spellings, Soft Halant, Explicit Halant, and INV characters should not be used. These have been provided only for deriving different display renditions, and are not needed normally. The spelling of a word contains all the information necessary for display composition, which can be automatically done through display algorithms. It becomes possible to type in a text, without even looking at the display. When the tedium of composing goes away, on-line authoring becomes possible, where an author can think out new text while he is typing it. Unique spellings are essential for making spelling checkers and dictionaries. They are also essential to facilitate finding of words in a word-processor, or for information retrieval from a data-base.

 

·          iscii91.pdf (sourceforge.net)          https://varamozhi.sourceforge.net/iscii91.pdf   

·         | devanagari vowel sign prishthamatra e (U+094E) @ Graphemica   https://graphemica.com/%E0%A5%8E    

 

Thanks in advance for the help.

 

Regards

BVK Sastry

 

Narayan Prasad

unread,
4:33 AM (17 hours ago) 4:33 AM
to भारतीयविद्वत्परिषत्
Namaste.
Thank you sir for pointing to this unicode entity.
Hopefully, it is used in case of Bengali, Oriya, Malayalam, Tamil. Examples shown for the devanagari कॆ के कै कॊ को कौ

কে কৈ কো কৌ
କେ କୈ କୋ କୌ
கெ கே கை கொ
 கோ கௌ
കെ കേ കൈ കൊ കോ കൌ

Regards
Narayan Prasad

Reply all
Reply to author
Forward
0 new messages