Re: Needed help in writing regular text replacement expression Inbox x

81 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Feb 9, 2017, 5:38:11 PM2/9/17
to ken p, sanskrit-programmers
+sanskrit-programmers

namaste ken,

since i'm in a very busy time, I'll forward this challenge to sanskrit-programmers so that one of them can help you sooner.

2017-02-09 14:01 GMT-08:00 ken p <drk...@gmail.com>:
Hi Vishvaso,

I like to replace first consonant in conjunct consonants with 
a underline consonant in roman transliterated Indic text language.

for example
kiss bulk hinpyaar kyaa kranti  mukhya inglish tren drive patni buddha  >>>>> 

kis buk hipyaar ranti muya in̠g̠lish 
ren rive pani budha

Please correct this RE and rewrite a text replacing regular expression.

([b,d,f,g,h,j,k,l,m,n,p,r,s,t,v,w,x,y,bh,dh,gh,ch,jh,ph,sh])([b,d,f,g,h,j,k,l,m,n,p,r,s,t,v,w,x,y,bh,dh,gh,ch,jh,ph,sh])>>>

$1([b̠,d̠,f̠,g̠,h̠,j̠,k̠,l̠,m̠,n̠,p̠,r̠,s̠,t̠,v̠,w̠,x̠,y̠,b̠h,d̠h,g̠h,c̠h,j̠h,p̠h,s̠h])$2([b,d,f,g,h,j,k,l,m,n,p,r,s,t,v,w,x,y,bh,dh,gh,ch,jh,ph,sh])

(consonants)(consonants) >$1 (first consonants with underline)(consonants)

Thanks,
ken
usa



--
--
Vishvas /विश्वासः

Anunad Singh

unread,
Feb 9, 2017, 11:54:33 PM2/9/17
to sanskrit-p...@googlegroups.com
Though, the 'grammar' of transformation specified above is not sufficiently clear, I propose the following :


If we assume that 'first consonant in conjunct consonants' are those NOT followed by (a, e, i, o, u) or space, then the above conversion can be done in 5 steps- (also choose 'case sensitive' optin)

(1)

change  bh,dh,gh,ch,jh,ph,sh  TO bH dH gH cH jH pH sH etc

([bdgcjps])h  ----> $1H


(2)

([aeiou]) ---> $1_


(3)

FIND expression-

([a-z])([^aeiouH_\s])

REPLACE with

$1\u0320$2


(4)

Replace H with h


(5)

([aeiou])_  --->  $1


--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ken p

unread,
Feb 10, 2017, 11:54:39 AM2/10/17
to sanskrit-programmers
Anunad ji,

With above five REs I am getting this outputs.

kiss bulk hind pyaar kyaa kranti  mukhya inglish tren drive patni buddha
 
ki\u0320ss bu\u0320lk hi\u0320nd p\u0320yaa\u0320r k\u0320yaa k\u0320ra\u0320nti  mu\u0320kh\u0320ya i\u0320ng\u0320li\u0320sh t\u0320re\u0320n d\u0320ri\u0320ve pa\u0320tni bu\u0320ddha ....outputs

Please guide me until it's fixed.





With this RE.........
([bcdfgjklmnprstvwxyz]h?)([bcdfgjklmnprstvwxyz]h?)       $1^$2
 
kiss bulk hinpyaar kyaa kranti  mukhya inglish tren drive patni buddha
kis^s  bul^k  hin^d  p^y aar k^y aa k^r an^t i  mukh^y a in^g lish t^r en d^r ive pat^n i bud^dh a .....output

Or with _  I may get this
kis_s  bul_k  hin_d  p_y aar k_y aa r an_t i  mukh_y a in_g lish t_r en d_r ive pat_n i bud_dh a

But how can I get a output with underlined (b̠c̠d̠f̠g̠h̠j̠k̠l̠m̠n̠p̠r̠s̠t̠v̠w̠x̠y̠z̠) letters?

With this RE.....
([bcdfghjklmnprstvwxyz]h?)([bcdfghjklmnprstvwxyz]h?)    <u>$1</u>$2

kiss bulk hinpyaar kyaa kranti  mukhya inglish tren drive patni buddha

ki<u>s</u>s bu<u>l</u>k hi<u>n</u>d <u>p</u>yaar <u>k</u>yaa <u>k</u>ra<u>n</u>ti  mu<u>kh</u>ya i<u>n</u>gli<u>s</u>h <u>t</u>ren <u>d</u>rive pa<u>t</u>ni bu<u>d</u>dha ........output

Instead of....
kis̠s bul̠k hin̠d p̠yaar k̠ranti muk̠ya in̠g̠lish t̠ren d̠rive pat̠ni bud̠dha

 

Anunad Singh

unread,
Feb 11, 2017, 1:28:31 AM2/11/17
to sanskrit-p...@googlegroups.com
You have not mentioned which text editor or word processor you are using to implement those conversions.  Actually, you should have selected 'regular expression' mode for those conversions which involve regular expression (the output you have got suggests that you have not done so.).

I did test the expressions in Geany in Ubuntu before posting and found giving correct results.

-- anunAda

==========================

--

ken p

unread,
Feb 11, 2017, 12:39:31 PM2/11/17
to sanskrit-programmers
Anunad ji,

I use FoxReplace by Marc, Replace RE mode and match case / no 
I am not familiar with Geany in Ubuntu.
Please test on FoxReplace and give me suggestions. 

damodarreddy challa

unread,
Feb 11, 2017, 6:33:19 PM2/11/17
to sanskrit-programmers
namaste, use this command , which will do work till conjuncts of even 4 conconents like 'क्ष्म्य्र' (kShmyra)...
sed 's/\([kgcjTDtdpbsSx]\)h/\1⁜/g; s/\([kgcjTDNtdnpbmyrlLvwsShRx]\)\(⁜*\)\([kgcjTDNtdnpbmyrlLvwsShRx⁜]*\)\([kgcjTDNtdnpbmyrlLvwsShRx]\)/\1̠\2̠\3̠\4/g; s/̠̠/̠/g; s/\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)̠/\1̠\2̠/g; s/⁜/h/g;' ./input.txt > ./output.txt

here it will underline even 'h' part too for mahapraNas ... like  mukhya >>> muk̠h̠ya .

if you dont want underline 'h' part in mahapranas for what ever reason, then use this second command with very small change..
sed 's/\([kgcjTDtdpbsSx]\)h/\1⁜/g; s/\([kgcjTDNtdnpbmyrlLvwsShRx]\)\(⁜*\)\([kgcjTDNtdnpbmyrlLvwsShRx⁜]*\)\([kgcjTDNtdnpbmyrlLvwsShRx]\)/\1̠\2̠\3̠\4/g; s/̠̠/̠/g; s/\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)̠/\1̠\2̠/g; s/̠/⁜/g;  s/⁜/h/g;' ./input.txt > ./output.txt


if conjunct has even more than 4 consonants(hypotheticle), add  fallowing part before  last  s/⁜/h/g; part in above command
s/\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)\([kgcjTDNtdnpbmyrlLvwsShRx⁜]\)̠/\1̠\2̠/g; 
add this as many times as more than 4 consonants are there in conjunction...

ॐ నమః శివాయ

ken p

unread,
Feb 12, 2017, 1:35:03 AM2/12/17
to sanskrit-programmers
Mr.challa,

How can I add above Regular Expressions here in this linked text replacement tool?

Replace RE....?.......With.......?........Match case?



damodarreddy challa

unread,
Feb 12, 2017, 3:54:16 AM2/12/17
to sanskrit-programmers
number of replacements depends upon how much complex conjuctions are there. i am giving upto 4 consontas conunctions... if that much not needed, we can reduce to 3 consonants too. for above addon there is import/export feature is there. i exported substitutions to fallowing file, which you can import rom addon menu. case sensitivity depends upon your need.. weather you want or not.
underline_FoxReplace.json

Anunad Singh

unread,
Feb 12, 2017, 5:45:08 AM2/12/17
to sanskrit-p...@googlegroups.com
Please find attached the Foxreplace file for the conversions suggested by me.

FoxReplace_for_Dr_Ken.json

ken p

unread,
Feb 13, 2017, 1:32:35 PM2/13/17
to sanskrit-programmers
Mr.Challa,

Very good, 
I did import and it works fine. I will let you know about further improvements if needed. 
Thanks.

By the way can I have these importable addon RE(regular expression) for replacing Schwa-a in Roman Transliteration? ie (all consonants)a > (all consonants) or to (all consonants)  (all consonants)à

ka kha ga gha ca cha ja jha ṭa ṭha ḍa ḍha ṇa
ta tha da dha na pa pha ba bha ma ya ra la va
śa sa ṣa ha ḽa kṣa jña

ken p

unread,
Feb 13, 2017, 1:32:35 PM2/13/17
to sanskrit-programmers
Anunad ji,

Very good, 
I did import and it works fine. I will let you know about further improvements if needed. 
Thanks.

By the way can I have these importable addon RE(regular expression) for replacing Schwa-a in Roman Transliteration? ie (all consonants)a -> (all consonants) or to (all consonants) -> (all consonants)à



Reply all
Reply to author
Forward
0 new messages