hb_regexreplace()

758 views
Skip to first unread message

Qatan

unread,
May 21, 2014, 5:28:23 PM5/21/14
to harbou...@googlegroups.com
Hello,
 
Is there such a function in Harbour or equivalent? I would like to avoid the xhb lib if possible.
 
What I am trying to do is to find a string and replace without affecting the rest and it has to work case insensitive...
 
Example:
 
Let’s say I have this string: “Abc Def Ghi abc DEF ghi ABC def ghi”
 
I want to replace “Def” by “123” without affecting the rest.
 
In the end I want to get: “Abc 123 Ghi abc 123 ghi ABC 123 ghi”
 
Any help is very welcome.
 
Regards,
 
Qatan
 

Klas Engwall

unread,
May 21, 2014, 6:22:21 PM5/21/14
to harbou...@googlegroups.com
Hi Qatan,
It looks like it could be easily borrowed from the xhb contrib and put
in your project as a source file.

Take a look here: contrib\xhb\regexrpl.prg

Regards,
Klas

Mario H. Sabado

unread,
May 21, 2014, 6:39:54 PM5/21/14
to harbou...@googlegroups.com
Hi,

Using Hb_StrReplace(), it would be like:

Hb_StrReplace( “Abc Def Ghi abc DEF ghi ABC def ghi”,{"Def","DEF","def"},{"123","123","123"} )

Regards,
Mario
--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Qatan

unread,
May 21, 2014, 6:59:13 PM5/21/14
to harbou...@googlegroups.com
Hello Klas,


>It looks like it could be easily borrowed from the xhb contrib and put
>in your project as a source file.
>
>Take a look here: contrib\xhb\regexrpl.prg

Good idea, thanks.

Qatan

Qatan

unread,
May 21, 2014, 7:53:15 PM5/21/14
to harbou...@googlegroups.com
Hello Mario,
 
>Using Hb_StrReplace(), it would be like:
>Hb_StrReplace( “Abc Def Ghi abc DEF ghi ABC def ghi”,{"Def","DEF","def"},{"123","123","123"} )

 
 
Hmm... interesting... but this way I will have to add all the possibilities. It may be a bit too big for more complex searches... This is why I thought about regEx.
Of course I could get the most obvious but just the idea that something could get out of control makes me consider carefully any other possibility.
What do you think? Anyway thanks a lot for your help.
 
Qatan

elch

unread,
May 21, 2014, 10:06:45 PM5/21/14
to harbou...@googlegroups.com
Hi Qatan,

show the regex solution, and please a very easy benchmark (measure 100 times or so),
would interest me ...

( aka  cRepl == "dEf", cTrans == "123" )

---
FUNCTION nataq( cSource, cRepl, cTrans )
 LOCAL cTarget := ""
 LOCAL nPos

  DO WHILE LEN( cSource ) > 0
    IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0  // nor more fun
      cTarget += cSource
      EXIT
    ENDIF
    cTarget += LEFT( cSource, nPos - 1 ) + cTrans
    cSource := SUBSTR( cSource, nPos + LEN( cRepl ) )
  ENDDO

RETURN cTarget
---

regards
Rolf

AL67

unread,
May 22, 2014, 1:55:55 AM5/22/14
to harbou...@googlegroups.com


W dniu czwartek, 22 maja 2014 00:22:21 UTC+2 użytkownik Klas Engwall napisał:
Hi Qatan,

Take a look here: contrib\xhb\regexrpl.prg 

Regards,
Klas

or try extend version:

/**************************************************************************
*        Regular expression version of function STRTRAN()
*       --------------------------------------------------
*  hb_RegExStrTran(<cString>,<cpSearch>,[cReplace],[nStart],
*     [nCount],[lRegExCase],[lRegExNewLine])  --> cReturn
*  <cString>  The main string to search
*  <cpSearch> The string/regexp to locate in the main string
*  <cReplace> The string to replace
*  <nStart>   The first occurence to be replaced  (defaut 1)
*  <nCount>   Number of occurence to replace      (default ALL)
*  <lRegExCase> ,<lRegExNewLine> Options for regular expression
*  DESCRIPTION:
*  In <cReplace> sign '$' is extra token: for iclude backreferences (groups)
*   '$&' or '$0' - whole regex match
*   '$1'..'$99'  - backreference to group 1 .. 99  (if group not exist ->'')
*   '$$' - as single '$' (sample to insert '$2', use '$$2' )
*   '$\' - break (empty string)  (sample to insert Group1+'2' no Group12
*          use '$1$\2' )
*   '$x' where x is not digit or '&','$','\' -  as is
*  SAMPLE:
* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","QQ") --> "xxxQQyyyQQzzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","Q$&Q") --> "xxxQA1QyyyQB2Qzzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$1") --> "xxxQ1yyyQ2zzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","$$0") --> "xxx$0yyy$0zzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$3") --> "xxxQyyyQzzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$13") --> "xxxQyyyQzzz"  !not group 13
* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$1$\3") --> "xxxQ13yyyQ23zzz"
* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","Q$Q") --> "xxxQ$QyyyQ$Qzzz"
*************************************************************************/

FUNCTION hb_RegExStrTran(cString,cpSearch,cReplace,nStart,nCount,lCase,lNewLine)
LOCAL aMatch,nFind:=0,cRet:=""
LOCAL cRep:="",cRep0,pos,xG1,xG2,lAll

IF !VALTYPE(cString)$"CM"
   // do error ???
   RETURN nil  // or ""
ENDIF

IF !VALTYPE(cpSearch)$"CM"
   IF !HB_ISREGEX(cpSearch)
      // do error ???
      RETURN nil // or ""
   ENDIF
ENDIF

IF !VALTYPE(cReplace)$"CM"
  cReplace := ""
ENDIF

IF !VALTYPE(nStart) == "N"
  nStart := 1
ENDIF

IF !VALTYPE(nCount) == "N"
  nCount := 0
  lAll := .T.
ELSE
  lAll := .F.
ENDIF

//  StrTran() work this way:
IF !lAll .AND. nCount == 0
    RETURN ""
ENDIF
IF nCount < 0
    RETURN cString
ENDIF
IF nStart < 1
   RETURN cString
ENDIF


// START SEARCH
DO WHILE lAll .OR. nCount > 0
   aMatch:=HB_REGEXATX(cpSearch,cString,lCase,lNewLine)
     //aMatch: { {Find,Start,End} [,{FindGr1,StartGr1,EndGr1},...] }
   IF EMPTY(aMatch) //not find
      EXIT
   ENDIF
   nFind++
   IF nFind>=nStart
      // now change in cReplace "$..."
      cRep0:=cReplace
      cRep:=""
      DO WHILE (pos:=AT("$",cRep0)) > 0
         xG1:=SUBSTR(cRep0,pos+1,1)
         xG2:=SUBSTR(cRep0,pos+2,1)
         IF xG1=="$"                         // '$$' -> '$'
            cRep += LEFT(cRep0,pos)
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1=="\"                         // '$\' -> ''
            cRep += LEFT(cRep0,pos-1)
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1$"&0"                     // all finding text
            cRep += LEFT(cRep0,pos-1)+aMatch[1,1]
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1$"123456789"             // $1 .. $9
            IF xG2$"0123456789"             //  $10 .. $99
               IF (xG2:=VAL(xG1+xG2)+1) <= LEN(aMatch)   //check group 10..99
                  cRep += LEFT(cRep0,pos-1)+aMatch[xG2,1]
                  cRep0 := SUBSTR(cRep0,pos+3)
               ELSE                           //group not exist -> empty
                  cRep += LEFT(cRep0,pos-1)
                  cRep0 := SUBSTR(cRep0,pos+3)
               ENDIF
            ELSE                           // check group 1..9
               IF (xG1:=VAL(xG1)+1) <= LEN(aMatch)   //group exist
                  cRep += LEFT(cRep0,pos-1)+aMatch[xG1,1]
                  cRep0 := SUBSTR(cRep0,pos+2)
               ELSE                           //group not exist -> ''
                  cRep += LEFT(cRep0,pos-1)
                  cRep0 := SUBSTR(cRep0,pos+2)
               ENDIF
            ENDIF
         ELSE                          // '$x' -> copy as is
            cRep += LEFT(cRep0,pos+1)
            cRep0 := SUBSTR(cRep0,pos+2)
         ENDIF
      ENDDO
      cRep += cRep0
      cRet += LEFT(cString,aMatch[1,2]-1)+cRep
      cString := SUBSTR(cString,aMatch[1,3]+1)
      nCount--
   ENDIF
ENDDO
cRet += cString
RETURN cRet
/*  end FUNCTION hb_RegExStrTran() */

Qatan

unread,
May 22, 2014, 3:20:57 AM5/22/14
to harbou...@googlegroups.com
Hello Adam,
 
>...
>or try extend version:
>...
 
Thanks for your nice input.
Really interesting. I search about it and seems that you did such function in 2012 and that it was added to xHarbour... am I right?
Now just tell me what is the difference and/or advantage of it over hb_regExReplace()?
And if there is advantage on it... why it is not added to Harbour? Any special reason?
Thanks for your help
 
Qatan

Qatan

unread,
May 22, 2014, 5:08:33 AM5/22/14
to harbou...@googlegroups.com
Hello Rolf,
 
>show the regex solution, and please a very easy benchmark (measure 100 times or so),
>would interest me ...

 
It does the job and in very good time!
My simple benchmark with a TXT file (5Mb) gave the follow results:
 
RegExStrTran: 237.40 seconds Smile
nataq.......: 240.30 seconds Smile
RegExReplace: 912.30 seconds Sad smile
 
 
Computer used for the test: Average Intel® Core™ i3-2310M CPU @ 2.10GHz with 4Gb RAM and 32-bit Windows 7 Professional
 
------8<------
PROCEDURE Main()
 
   LOCAL nTime
    LOCAL cFile1 := MEMOREAD( 'test1.txt' )
   LOCAL cFile2 := MEMOREAD( 'test2.txt' )
   LOCAL cFile3 := MEMOREAD( 'test3.txt' )
  
    ? 'Start...'
   
   nTime := SECONDS()
   cFile1 := nataq( cFile1, 'QataN', '   Q A T A N   ' )
   ? 'nataq', SECONDS() - nTime
 
   nTime := SECONDS()
   cFile2 := hb_regExReplace( 'QataN', cFile2, '   Q A T A N   ', .F. )
   ? 'regEx', SECONDS() - nTime
 
   nTime := SECONDS()
   cFile3 := hb_regExStrTran( cFile3, 'QataN', '   Q A T A N   ',,, .F. )
   ? 'StrTran', SECONDS() - nTime
 
     MEMOWRIT( 'test1.txt', cFile1 )
    MEMOWRIT( 'test2.txt', cFile2 )
    MEMOWRIT( 'test3.txt', cFile3 )
 
RETURN
 
 
FUNCTION nataq( cSource, cRepl, cTrans )
LOCAL cTarget := ""
LOCAL nPos
 
  DO WHILE LEN( cSource ) > 0
    IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0  // nor more fun
      cTarget += cSource
      EXIT
    ENDIF
    cTarget += LEFT( cSource, nPos - 1 ) + cTrans
    cSource := SUBSTR( cSource, nPos + LEN( cRepl ) )
  ENDDO
 
RETURN cTarget
 
 
 
------>8------
 
It was slightly slower (less than 3 seconds) compared to RegExStrTran but your solution is much simpler / cleaner...
Can we introduce your solution to Harbour?
 
hb_StrTran( <cString> , ; <cSubString>, ; [<cReplace>] , ; [<nStart>] , ; [<nCount>], ; [<lCaseSensitive>] ) –> cNewString
 
...but with non-case sensitive active by default... 
 
Also hb_regExStrTran() seems a very good solution (maybe with more power and flexibility?)
Why it’s not added to Harbour? Any special reason? Maybe it would be enough although your solution is smart!
 
Regards,
 
 
Qatan
 
 
wlEmoticon-smile[1].png
wlEmoticon-sadsmile[1].png

AL67

unread,
May 22, 2014, 5:49:46 AM5/22/14
to harbou...@googlegroups.com
My function can use BACKREFERENCE with speclal token: $

Sample, I wont change in string all numers like 123,45 to 123.45 but not change other commas
MyString : "Sample, ,string. Number1 12,4 also, number2, 67,89"
HB_RegExStrTran(MySting,"(\d),(\d)", "$1.$2")  
result: "Sample, ,string. Number1 12.4 also, number2, 67.89"

or
HB_RegExStrTran("ABC 1 DEF 23 GHI 4","\d","digit:$&") -> "ABC digit:1 DEF digit:2digit:3 GHI digit:4"


Backreference is power of regular expression.

Adam

Qatan

unread,
May 22, 2014, 5:57:24 AM5/22/14
to harbou...@googlegroups.com
Hello Adam,
 
>...
>Backreference is power of regular expression.
 
 
That’s very interesting... why not in Harbour already?
Thanks for sharing such nice job.
Regards,
 
Qatan

elch

unread,
May 22, 2014, 7:46:29 AM5/22/14
to harbou...@googlegroups.com
Hi Qatan,

 
RegExStrTran: 237.40 seconds Smile
nataq.......: 240.30 seconds Smile
RegExReplace: 912.30 seconds Sad smile


so nataQ ;) could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

do it only one time at start and use then in the loop: .. + nLen


Easy game, because for what i know about regex is, that you can do real *crazy* search[ and replace ] with reg[ular]ex[pressions].

You may google for PCRE and may have a look into hbregex.c

I'm completely unexperienced, what Harbour can do -- about grouping, back-referencing etc ...

And we have to distinguish, if they are implemented high at 'prg-level' or low-level ..


best regards

Rolf

Qatan

unread,
May 22, 2014, 10:39:53 AM5/22/14
to harbou...@googlegroups.com
Hello Rolf,
 
>...
>
so nataQ ;) ...
 
That’s funny! I didn’t notice the name before. I thought nataq meant something in your language Smile
 
 
>...
could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

>do it only one time at start and use then in the loop: .. + nLen

 

Well... I do not know how to do that... can you please do the modification and post to us?

 

 

>Easy game, because for what i know about regex is, that you can do real *crazy* search[ and replace ] with reg[ular]ex[pressions].

>You may google for PCRE and may have a look into hbregex.c

>I'm completely unexperienced, what Harbour can do -- about grouping, back-referencing etc ...

>And we have to distinguish, if they are implemented high at 'prg-level' or low-level ..

 

I don’t have experience either but feel like regex is very powerful! But for my specific need nataq does the job very well.

Thanks for all your help in interest.

Regards,

 

Qatan

 

wlEmoticon-smile[1].png

elch

unread,
May 22, 2014, 12:17:22 PM5/22/14
to harbou...@googlegroups.com
Hi Qatan,


>...
could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

>do it only one time at start and use then in the loop: .. + nLen

strategy: spare function calls where possible ..
LEN() in Harbour is very fast, but a variable decrease faster, so:

---
FUNCTION nataQ( cSource, cRepl, cTrans )
 LOCAL cTarget := ""
 LOCAL nLen := LEN( cSource )
 LOCAL nRepl := LEN( cRepl )
 LOCAL nPos

  IF nRepl < 1  /* secure an exception */
    cTarget := cSource
  ELSE
    DO WHILE nLen > 0
      IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0  /* no more fun */

        cTarget += cSource
        EXIT
      ENDIF
      cTarget += LEFT( cSource, nPos - 1 ) + cTrans
      cSource := SUBSTR( cSource, nPos + nRepl )
      nLen--
    ENDDO
  ENDIF
RETURN cTarget
---

BTW, sure you noticed that cTrans can be longer, shorter or even "" for removing cRepl.
And the exception: cRepl == "", is now 'catched' ( would lead to endless loop )


best regards
Rolf

Qatan

unread,
May 22, 2014, 4:21:34 PM5/22/14
to harbou...@googlegroups.com
Hello Rolf,
 
>
strategy: spare function calls where possible ..
>LEN() in Harbour is very fast, but a variable decrease faster, so
>...
 
Nice! I am using it now.
 
I made new tests:
 
    hb_regExReplace: 250.28s
    hb_regExStrTran: 112.39s
    nataQ (1st ver): 111.44s
    nataQ (2nd ver): 110.31s
 
Same computer but with smaller TXT file (it was too long before).
So, as you can see the new version is faster.
 
Thanks for all your help and care.
Regards,
 
Qatan
 
 

Pete

unread,
May 23, 2014, 4:21:16 AM5/23/14
to harbou...@googlegroups.com

Hi Qatan
if you have time, could you try the sample below to see if it makes any sense?

8<--------------------------------------cut

FUNCTION Main()
/* compile with xhb.lib */

LOCAL cString := '("do some functions dream of a lightning fast replacement?")' + hb_EoL() +;
                 "aBc aBcaBcaBcaBcaBcaBcaBc xcr aBcaBcaBcaBc aBcaBcaBcaBcaBcaBcaBc aBcaBcaBcaBc" + hb_EoL() +;
                 "ABC ABCABCABCABCABCABCABC ABCABCABCABC ABCABCABCABCABCABCABC ABCABCABCABC" + hb_EoL() +;
                 "abc abcabcabcabcabcabcabc abcabcabcabc abcabcabcabcabcabcabc abcabcabcabc " + hb_EoL() +;
                 "ABC123 abc123 12abc3"

LOCAL cFind := "AbC"
LOCAL cReplace := "123"
LOCAL nTime, t1, t2

nTime := Seconds()
FOR nI := 1 TO 100000 // a decent 'one hundred thousands' loop
    QuickRepl( cString, cFind, cReplace )
NEXT
? QuickRepl( cString, cFind, cReplace )
t1 := Seconds() - nTime
?

nTime := Seconds()
FOR nI := 1 TO 100000
    nataQ( cString, cFind, cReplace )
NEXT
? nataQ( cString, cFind, cReplace )
t2 := Seconds() - nTime

?
? "QuickRepl() spent about ->", t1 , "seconds"
? "nataQ()     spent about ->", t2 , "seconds"

wait
RETURN


FUNCTION QuickRepl( cString, cFind, cReplace )
/*beware the wolf.. (potential deadloop inside!)*/
LOCAL cSubString := HB_AtX( cFind, cString, .F. )

WHILE ! Empty( cSubString )

    cString := StrTran( cString, cSubString, cReplace )
  
    cSubString := HB_AtX( cFind, cString, .F. )

END

RETURN cString

  
  
FUNCTION nataQ( cSource, cRepl, cTrans )
 LOCAL cTarget := ""
 LOCAL nLen := LEN( cSource )
 LOCAL nRepl := LEN( cRepl )
 LOCAL nPos

  IF nRepl < 1  /* secure an exception */
    cTarget := cSource
  ELSE
    DO WHILE nLen > 0
      IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0  /* no more fun */

        cTarget += cSource
        EXIT
      ENDIF
      cTarget += LEFT( cSource, nPos - 1 ) + cTrans
      cSource := SUBSTR( cSource, nPos + nRepl )
      nLen--
    ENDDO
  ENDIF
RETURN cTarget

cut----------------------------------------->8

---
Pete

elch

unread,
May 23, 2014, 6:44:29 AM5/23/14
to harbou...@googlegroups.com

Hi Pete,


well done !, convincing fast solution


hb_AtX() is new to me, but so we can work around the case-sensitive-ness of StrTran() ...

Very thanks for the tip !


best regards

Rolf


elch

unread,
May 23, 2014, 8:14:01 AM5/23/14
to harbou...@googlegroups.com
Hi again Pete,

nevertheless thanks for tip with hb_AtX() !,
but there is a hidden 'trapdoor':

try to replace:

"tat" with "Tet"


in string:

"you will see the pitfall in Tatat"


nataQ: "... Tetat"

QuckRepl: " .. TeTet"


best regards

Rolf

Qatan

unread,
May 23, 2014, 9:30:30 AM5/23/14
to harbou...@googlegroups.com
Hello Pete,
 
>...
>...QuickRepl( cString, cFind, cReplace )
>...
 
Follows the result I got testing my simple way with a big TXT file (5MB):
 
nataQ()........... 105s
nataQ2().......... 104s
hb_regExReplace(). 258s
hb_regExStrTran(). 121s
QuickRepl()....... 0.11s (!)
 
Your solution is really fast and seems to work fine.
Thanks for suggesting it!
 
Qatan

Qatan

unread,
May 23, 2014, 9:30:30 AM5/23/14
to harbou...@googlegroups.com
Hello Rolf,
 
>nevertheless thanks for tip with hb_AtX() !,
>but there is a hidden 'trapdoor':

 
Good point... but is there anyway to try to use it without the ‘trapdoor’?
I just thought about some solution because it’s amazing how fast this one goes...
Thanks for your care and for taking your precious time to test it!
 
Qatan

Qatan

unread,
May 23, 2014, 11:05:20 AM5/23/14
to harbou...@googlegroups.com
Hello Pete,
 
>...
>cSubString := HB_AtX( cFind, cString, .F. )
>...
 
Just one question about hb_AtX()... according to xHarbour documentation it returns the first substring contained in <cString> that matches the regular expression <cRegEx>. If no match is found, the return value is NIL... so we need to add a protection in this case, right?
 
Qatan
 

Qatan

unread,
May 23, 2014, 11:20:53 AM5/23/14
to harbou...@googlegroups.com
>nataQ: "... Tetat"

>QuckRepl: " .. TeTet"

 
Sorry Rolf but why it does that?
I do not understand...
 
Qatan

Pete

unread,
May 23, 2014, 12:30:13 PM5/23/14
to harbou...@googlegroups.com

Hi Rolf,


On Friday, May 23, 2014 3:14:01 PM UTC+3, elch wrote:
but there is a hidden 'trapdoor':

(sigh) always there is one. 'trapdoor'. everywhere. (perhaps that keeps game excitement)

 
try to replace:

"tat" with "Tet"


in string:

"you will see the pitfall in Tatat"


nataQ: "... Tetat"

QuckRepl: " .. TeTet"


I'm not sure which one is the correct. it depends on how you see it or what is the intended result.

If you want to replace every "tat" with "Tet" then nataQ does a half work, since "Tetat" still contains a "tat".
If you want to replace the first 'tat' in every 'Tatat' then you probably could search for 'Tatat' s and manipulate them (perhaps with an other function?)
(Not to mention that instead of 'tat', you should better choose to search for 'ta' since the last 't' seems redundant..)

On the other hand, QuickRepl, (one could claim that) it returns a rather unwanted ('dumb') capitalized result. (second capital 'T')
I suppose that this is due to insensitive search that HB_AtX is forced to do. (the .F. on the 3rd arg.) but IMO is an expected result.
Anyway the case is a bit puzzling.. ;->

P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.

regards,

---
Pete

Pete

unread,
May 23, 2014, 12:53:09 PM5/23/14
to harbou...@googlegroups.com


On Friday, May 23, 2014 7:30:13 PM UTC+3, Pete wrote:
(Not to mention that instead of 'tat', you should better choose to search for 'ta' since the last 't' seems redundant..)

That's a wrong remark! Please Ignore..
(got really puzzled  with  those tat's and tet's) ;->
 
---
Pete

Pete

unread,
May 23, 2014, 1:15:39 PM5/23/14
to harbou...@googlegroups.com
Hi Qatan


On Friday, May 23, 2014 6:05:20 PM UTC+3, Qatan wrote: 
Just one question about hb_AtX()... according to xHarbour documentation it returns the first substring contained in <cString> that matches the regular expression <cRegEx>. If no match is found, the return value is NIL... so we need to add a protection in this case, right?
 

I don't understand; Protection from what?
Isn't  WHILE ! Empty( cSubString ) enough in the case of a returned NIL ?

(unless you mean something else which you might want to clarify since I can't guess.)

regards,

---
Pete

elch

unread,
May 23, 2014, 2:34:34 PM5/23/14
to harbou...@googlegroups.com
Hi again Pete,


I'm not sure which one is the correct. it depends on how you see it or what is the intended result.

If you want to replace every "tat" with "Tet" then nataQ does a half work, since "Tetat" still contains a "tat".
If you want to replace the first 'tat' in every 'Tatat' then you probably could search for 'Tatat' s and manipulate them (perhaps with an other function?)

my example with 'tat' to 'tet' ;-))

is certainly deliberately constructed: i tried an example as easy and short as possible.

The idea behind is to generate with the replace loop before a new cFind which wasn't there before.

So the inner 't' of 'tatat belongs either to the first 'tat' which is replaced, or to the remaining 'tat' in the word

-- in my example it is an overlapping letter, and in this kind your really fast QuickRepl() behave 'different'.



P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.

yes: if cFind is part of cReplace, we will get a nearly infinite loop up to memory is filled ...


best regards

Rolf

Qatan

unread,
May 24, 2014, 3:33:56 PM5/24/14
to harbou...@googlegroups.com
Hello Pete,
 
>I don't understand; Protection from what?
>Isn't  WHILE ! Empty( cSubString ) enough in the case of a returned NIL ?
(unless you mean something else which you might want to clarify since I can't guess.)

 
Do you mean that this would work: EMPTY( NIL ) returns .T.?
I tested and you are right... I thought it would give an error...
Never mind. I learned one more!
Thanks for your help and care.
 
Regards,
 
Qatan
 

Qatan

unread,
May 25, 2014, 2:36:22 AM5/25/14
to harbou...@googlegroups.com
Hello all,
 
>(sigh) always there is one. 'trapdoor'. everywhere. (perhaps that keeps game excitement)
 
Now I understood why QuickRepl() does that...
 
 
>Anyway the case is a bit puzzling.. ;->
 
For me also... so what to do?
 
 
>P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.
That’s IMHO dangerous... how to overcome it?
It’s nice to exchange with you all. I learn a lot.
 
Regards,
 
Qatan

Qatan

unread,
May 25, 2014, 2:36:23 AM5/25/14
to harbou...@googlegroups.com
Hello all,
 
>yes: if cFind is part of cReplace, we will get a nearly infinite loop up to memory is filled ...
 
Nice point but how to protect against such situation (and if possible not loose much performance)?
You all are smart! (or maybe I am getting a bit rusty or should I say LAZY?) Winking smile
 
 
Qatan
wlEmoticon-winkingsmile[1].png

elch

unread,
May 25, 2014, 4:16:17 AM5/25/14
to harbou...@googlegroups.com

Hi Qatan,


re-inventing the wheel: heureka !, it must be round ;-)


I like to call my technic 'the string eater' - because what is eaten can't be chewed again -- usually ;-)

Maybe there is a bunch of logic possible, that QuickRepl() can be fixed -- but if that is then afterwards still faster ?


---

So your easy task is to take src/rtl/strtran.c,

duplicate ! it, and add a codepage depending case insensitive behaviour

- but please as a duplicate, because it will rob the original StrTran() quite a remarkable amount of speed.

Then you have the fastest possible solution.
And please, contribute it !


best regards

Rolf

Qatan

unread,
May 25, 2014, 4:28:27 AM5/25/14
to harbou...@googlegroups.com
Hello Rolf,
 
 

>I like to call my technic 'the string eater' - because what is eaten can't be chewed again -- usually ;-)

>Maybe there is a bunch of logic possible, that QuickRepl() can be fixed -- but if that is then afterwards still faster ?

 

I like your technic also and I agree that QuickRepl() isn’t a good solution right now because it is unsafe or will not be faster...

 

 

>So your easy task is to take src/rtl/strtran.c,

>duplicate ! it, and add a codepage depending case insensitive behavior

 

...hmmm, at least I was obedient and opened it to see how it looks like but I have to confess that I felt like an old and inflexible “redneck” trying to speak Chinese... to much for me to start with C at this point... sorry.

 

I think I will stay with nataQ() for it’s simplicity and good performance. After all it does the job!

 

Qatan

 

Charly 9000

unread,
Sep 3, 2021, 1:42:46 PM9/3/21
to Harbour Users
Hi friends,

My problem is the following: I need to search in a text for the string

  [size=18] and replace with -> style="font-size:18px;"

18 is the variable part, so the expression would be 

[size=xx]

According to your tests, what would be the fastest technique to execute this test.

Thank you
Carles.

Antonio Linares

unread,
Sep 4, 2021, 4:36:03 AM9/4/21
to Harbour Users
Dear Charly,

function Main()

   local aTokens := hb_regexAll( '\[size=[1-9]+\]', "before [size=18] after",,,,,.F. )
   local aToken
   
   if Len( aTokens ) > 0
      for each aToken in aTokens
         ? aToken[ 1 ]
      next      
   endif

return nil

Antonio Linares

unread,
Sep 4, 2021, 4:50:27 AM9/4/21
to Harbour Users

Charly 9000

unread,
Sep 5, 2021, 5:41:13 AM9/5/21
to Harbour Users
Dear Antonio,

In your example this function returns an array of array. Each element is made up of {token, start, end} -> {"[size=18]", 8, 16}

But is there a possibility to know the variable value, in this case 18? (without having to use at (), substr (), ...).

Ok, you can create a little function, but I don't know if we can extract this value -> [1-9] +

Regards.
Charly

Antonio Linares

unread,
Sep 6, 2021, 6:51:41 AM9/6/21
to Harbour Users
Charly,

Using the above excellent function hb_RegExStrTran()

function Main()

   ? hb_RegExStrTran( "before [size=18] after", '\[size=[1-9]+\]', "[size=555]" )

return nil

FUNCTION hb_RegExStrTran(cString,cpSearch,cReplace,nStart,nCount,lCase,lNewLine)
LOCAL aMatch,nFind:=0,cRet:=""
LOCAL cRep:="",cRep0,pos,xG1,xG2,lAll

IF !VALTYPE(cString)$"CM"
   // do error ???
   RETURN nil  // or ""
ENDIF

IF !VALTYPE(cpSearch)$"CM"
   IF !HB_ISREGEX(cpSearch)
      // do error ???
      RETURN nil // or ""
   ENDIF
ENDIF

IF !VALTYPE(cReplace)$"CM"
  cReplace := ""
ENDIF

IF !VALTYPE(nStart) == "N"
  nStart := 1
ENDIF

IF !VALTYPE(nCount) == "N"
  nCount := 0
  lAll := .T.
ELSE
  lAll := .F.
ENDIF

//  StrTran() work this way:
IF !lAll .AND. nCount == 0
    RETURN ""
ENDIF
IF nCount < 0
    RETURN cString
ENDIF
IF nStart < 1
   RETURN cString
ENDIF


// START SEARCH
DO WHILE lAll .OR. nCount > 0
   aMatch:=HB_REGEXATX(cpSearch,cString,lCase,lNewLine)
     //aMatch: { {Find,Start,End} [,{FindGr1,StartGr1,EndGr1},...] }
   IF EMPTY(aMatch) //not find
      EXIT
   ENDIF
   nFind++
   IF nFind>=nStart
      // now change in cReplace "$..."
      cRep0:=cReplace
      cRep:=""
      DO WHILE (pos:=AT("$",cRep0)) > 0
         xG1:=SUBSTR(cRep0,pos+1,1)
         xG2:=SUBSTR(cRep0,pos+2,1)
         IF xG1=="$"                         // '$$' -> '$'
            cRep += LEFT(cRep0,pos)
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1=="\"                         // '$\' -> ''
            cRep += LEFT(cRep0,pos-1)
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1$"&0"                     // all finding text
            cRep += LEFT(cRep0,pos-1)+aMatch[1,1]
            cRep0 := SUBSTR(cRep0,pos+2)
         ELSEIF xG1$"123456789"             // $1 .. $9
            IF xG2$"0123456789"             //  $10 .. $99
               IF (xG2:=VAL(xG1+xG2)+1) <= LEN(aMatch)   //check group 10..99
                  cRep += LEFT(cRep0,pos-1)+aMatch[xG2,1]
                  cRep0 := SUBSTR(cRep0,pos+3)
               ELSE                           //group not exist -> empty
                  cRep += LEFT(cRep0,pos-1)
                  cRep0 := SUBSTR(cRep0,pos+3)
               ENDIF
            ELSE                           // check group 1..9
               IF (xG1:=VAL(xG1)+1) <= LEN(aMatch)   //group exist
                  cRep += LEFT(cRep0,pos-1)+aMatch[xG1,1]
                  cRep0 := SUBSTR(cRep0,pos+2)
               ELSE                           //group not exist -> ''
                  cRep += LEFT(cRep0,pos-1)
                  cRep0 := SUBSTR(cRep0,pos+2)
               ENDIF
            ENDIF
         ELSE                          // '$x' -> copy as is
            cRep += LEFT(cRep0,pos+1)
            cRep0 := SUBSTR(cRep0,pos+2)
         ENDIF
      ENDDO
      cRep += cRep0
      cRet += LEFT(cString,aMatch[1,2]-1)+cRep
      cString := SUBSTR(cString,aMatch[1,3]+1)
      nCount--
   ENDIF
ENDDO
cRet += cString
RETURN cRet
/*  end FUNCTION hb_RegExStrTran() */

regards,

Charly 9000

unread,
Sep 6, 2021, 8:03:28 AM9/6/21
to Harbour Users
Antonio,

Thanks for tip. The goal was to create a function to create my own code for html, similar to bb_code in my application. As I have commented, I had the problem of knowing the variable value of the expression. In the end I also built a function that works properly. If someone wants to use it, I copy it here 

function main()

local cTxt := "And before [size=36] after [/size] [size=54]Ciao.[/size]"
? bb_code_size( cTxt )
cTxt := "And before [color=red] after [/color] [color=green]Ciao.[/color]"
? bb_code_color( cTxt )
cTxt := "And before [size=36][color=red] after [/color][/size] [size=54][color=green]Ciao.[/color][/size]"
cTxt := bb_code_size( cTxt )
cTxt := bb_code_color( cTxt )
? cTxt
retu nil 


function bb_code_size( cTxt )

cTxt := RegExStrTran( cTxt, '\[size=[1-9]+\]', '<span style="font-size:$px;">' ) 
cTxt := StrTran( cTxt, '[/size]', '</span>')

retu cTxt 

function bb_code_color( cTxt )

cTxt := RegExStrTran( cTxt, '\[color=[a-z]+\]', '<span style="color:$;">' ) 
cTxt := StrTran( cTxt, '[/color]', '</span>')

retu cTxt 

function RegExStrTran( cTxt, cPatron, cTag, cSeparator, cWillcard ) 

   local aTokens, cToken, nStart, nEnd, cBlockA, cBlockB, nOffset, cValue       
   
   hb_default( @cSeparator, '=' )
   hb_default( @cWillcard, '$' )      
   
   while Len( aTokens := hb_regexAll( cPatron, cTxt,,,1,,.F. ) ) > 0
   
cToken := aTokens[1][1][1]   
nStart := aTokens[1][1][2]
nEnd    := aTokens[1][1][3]

cBlockA := Substr( cTxt, 0, nStart - 1 )
cBlockB := Substr( cTxt, nEnd + 1 )
nOffset := At( cSeparator, cToken ) + 1
cValue := Substr( cToken, nOffSet, len( cToken ) - nOffset  )
cTxt := cBlockA + StrTran( cTag, cWillcard, cValue ) + cBlockB
   end
retu cTxt 

Thanks to all
Regards.

Charly.

AL67

unread,
Sep 15, 2021, 5:06:37 AM9/15/21
to Harbour Users
piątek, 3 września 2021 o 19:42:46 UTC+2 Charly 9000 napisał(a):
Hi friends,

My problem is the following: I need to search in a text for the string

  [size=18] and replace with -> style="font-size:18px;"

18 is the variable part, so the expression would be 

[size=xx]

 Use backreference of regular expession

aRet := HB_RegEx(  cString , "(.*)\[size=(\d+)\](.*)" ) // any text (gr1) + [size= + digits (gr2) + ] + any text (gr3)
aRet  ->  {cAllFound,cGroup1,cGroup2,cGroup3}
cNewString := aRet[2] + 'style="font-size:' +aRet[3] + 'px;"' + aRet[4]

Or use my function   HB_RegExStrTran()  (source in this thread)

cNewString := HB_RegExStrTran( cString , "\[size=(\d+)\]" , 'style="font-size:$1px;"')   //change all found

Adam



Charly 9000

unread,
Oct 12, 2021, 12:33:44 PM10/12/21
to Harbour Users
Hi Adam,

Thanks for your response. I really didn't know how to do this with HB_RegEx(), thanks.
But I don't understand it or I do something wrong, I'm sorry

This code does not work. Where this error?

function main()

    local cString   := "Hello [size=18], how are you ?"
    local aRet      := HB_RegEx(  cString , "(.*)\[size=(\d+)\](.*)" ) 

    ? aRet
return nil


This code return empty array()

Thanks.
Charly.




AL67

unread,
Oct 14, 2021, 5:18:08 AM10/14/21
to Harbour Users
wtorek, 12 października 2021 o 18:33:44 UTC+2 Charly 9000 napisał(a):
Hi Adam,

Thanks for your response. I really didn't know how to do this with HB_RegEx(), thanks.
But I don't understand it or I do something wrong, I'm sorry

This code does not work. Where this error?

function main()

    local cString   := "Hello [size=18], how are you ?"
    local aRet      := HB_RegEx(  cString , "(.*)\[size=(\d+)\](.*)" ) 

    ? aRet
aRet is type array. If HB_RegEx() not found string array is empty  {}
If found 1-st element is founded string.
If regex include groups 2-nd and next elements is founded groups

So in example
aRet := { "Hello [size=18], how are you ?" , "Hello " , "18" ,  ", how are you ?"}

Adam

Charly 9000

unread,
Oct 20, 2021, 1:38:57 AM10/20/21
to Harbour Users
Adam

    ? valtype(aRet)         // Return  A
    ? len(aRet)                // Return 0 

Thanks.
Charly

AL67

unread,
Oct 20, 2021, 4:39:45 AM10/20/21
to Harbour Users
Hi.
Correct syntax:
aResult := hb_RegEx( cRegex, cString, [lCase], [lNewLine] )

So example:
function main()

    local cString   := "Hello [size=18], how are you ?"
    local aRet      := HB_RegEx(  "(.*)\[size=(\d+)\](.*)"  , cString ) 

    ? aRet[1]
    ? aRet[2]
    ? aRet[3]
    ? aRet[4]
return

Adam

Charly 9000

unread,
Oct 20, 2021, 4:50:58 AM10/20/21
to Harbour Users
Adam,

Yes now :-)
I understand how it works

Thank you.
C.

Reply all
Reply to author
Forward
0 new messages