hb_regexreplace()

Qatan

unread,

May 21, 2014, 5:28:23 PM5/21/14

to harbou...@googlegroups.com

Hello,

Is there such a function in Harbour or equivalent? I would like to avoid the xhb lib if possible.

I found info about it here: http://www.creasolgroup.com/xOraclipLanguageReferenceGuide/xOraClip%20Language%20Reference/Functions/Hb_regexreplace_f.en.html

What I am trying to do is to find a string and replace without affecting the rest and it has to work case insensitive...

Example:

Let’s say I have this string: “Abc Def Ghi abc DEF ghi ABC def ghi”

I want to replace “Def” by “123” without affecting the rest.

In the end I want to get: “Abc 123 Ghi abc 123 ghi ABC 123 ghi”

Any help is very welcome.

Regards,

Qatan

Klas Engwall

unread,

May 21, 2014, 6:22:21 PM5/21/14

to harbou...@googlegroups.com

Hi Qatan,

It looks like it could be easily borrowed from the xhb contrib and put
in your project as a source file.

Take a look here: contrib\xhb\regexrpl.prg

Regards,
Klas

Mario H. Sabado

unread,

May 21, 2014, 6:39:54 PM5/21/14

to harbou...@googlegroups.com

Hi,

Using Hb_StrReplace(), it would be like:

Hb_StrReplace( “Abc Def Ghi abc DEF ghi ABC def ghi”,{"Def","DEF","def"},{"123","123","123"} )

Regards,
Mario

--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Qatan

unread,

May 21, 2014, 6:59:13 PM5/21/14

to harbou...@googlegroups.com

Hello Klas,

>It looks like it could be easily borrowed from the xhb contrib and put
>in your project as a source file.
>
>Take a look here: contrib\xhb\regexrpl.prg

Good idea, thanks.

Qatan

Qatan

unread,

May 21, 2014, 7:53:15 PM5/21/14

to harbou...@googlegroups.com

Hello Mario,

>Using Hb_StrReplace(), it would be like:
>Hb_StrReplace( “Abc Def Ghi abc DEF ghi ABC def ghi”,{"Def","DEF","def"},{"123","123","123"} )

Hmm... interesting... but this way I will have to add all the possibilities. It may be a bit too big for more complex searches... This is why I thought about regEx.

Of course I could get the most obvious but just the idea that something could get out of control makes me consider carefully any other possibility.

What do you think? Anyway thanks a lot for your help.

Qatan

elch

unread,

May 21, 2014, 10:06:45 PM5/21/14

to harbou...@googlegroups.com

Hi Qatan,

show the regex solution, and please a very easy benchmark (measure 100 times or so),
would interest me ...

( aka cRepl == "dEf", cTrans == "123" )

---
FUNCTION nataq( cSource, cRepl, cTrans )
LOCAL cTarget := ""
LOCAL nPos

DO WHILE LEN( cSource ) > 0
    IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0 // nor more fun
      cTarget += cSource
      EXIT
    ENDIF
    cTarget += LEFT( cSource, nPos - 1 ) + cTrans
    cSource := SUBSTR( cSource, nPos + LEN( cRepl ) )
ENDDO

RETURN cTarget
---

regards
Rolf

AL67

unread,

May 22, 2014, 1:55:55 AM5/22/14

to harbou...@googlegroups.com

W dniu czwartek, 22 maja 2014 00:22:21 UTC+2 użytkownik Klas Engwall napisał:

Hi Qatan,

Take a look here: contrib\xhb\regexrpl.prg

Regards,
Klas

or try extend version:

/**************************************************************************

* Regular expression version of function STRTRAN()

* --------------------------------------------------

* hb_RegExStrTran(<cString>,<cpSearch>,[cReplace],[nStart],

* [nCount],[lRegExCase],[lRegExNewLine]) --> cReturn

* <cString> The main string to search

* <cpSearch> The string/regexp to locate in the main string

* <cReplace> The string to replace

* <nStart> The first occurence to be replaced (defaut 1)

* <nCount> Number of occurence to replace (default ALL)

* <lRegExCase> ,<lRegExNewLine> Options for regular expression

* DESCRIPTION:

* In <cReplace> sign '$' is extra token: for iclude backreferences (groups)

* '$&' or '$0' - whole regex match

* '$1'..'$99' - backreference to group 1 .. 99 (if group not exist ->'')

* '$$' - as single '$' (sample to insert '$2', use '$$2' )

* '$\' - break (empty string) (sample to insert Group1+'2' no Group12

* use '$1$\2' )

* '$x' where x is not digit or '&','$','\' - as is

* SAMPLE:

* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","QQ") --> "xxxQQyyyQQzzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","Q$&Q") --> "xxxQA1QyyyQB2Qzzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$1") --> "xxxQ1yyyQ2zzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","$$0") --> "xxx$0yyy$0zzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$3") --> "xxxQyyyQzzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$13") --> "xxxQyyyQzzz" !not group 13

* hb_RegExStrTran("xxxA1yyyB2zzz",".(\d)","Q$1$\3") --> "xxxQ13yyyQ23zzz"

* hb_RegExStrTran("xxxA1yyyB2zzz",".\d","Q$Q") --> "xxxQ$QyyyQ$Qzzz"

*************************************************************************/

FUNCTION hb_RegExStrTran(cString,cpSearch,cReplace,nStart,nCount,lCase,lNewLine)

LOCAL aMatch,nFind:=0,cRet:=""

LOCAL cRep:="",cRep0,pos,xG1,xG2,lAll

IF !VALTYPE(cString)$"CM"

// do error ???

RETURN nil // or ""

ENDIF

IF !VALTYPE(cpSearch)$"CM"

IF !HB_ISREGEX(cpSearch)

// do error ???

RETURN nil // or ""

ENDIF

IF !VALTYPE(cReplace)$"CM"

cReplace := ""

ENDIF

IF !VALTYPE(nStart) == "N"

nStart := 1

ENDIF

IF !VALTYPE(nCount) == "N"

nCount := 0

lAll := .T.

ELSE

lAll := .F.

ENDIF

// StrTran() work this way:

IF !lAll .AND. nCount == 0

RETURN ""

ENDIF

IF nCount < 0

RETURN cString

ENDIF

IF nStart < 1

RETURN cString

ENDIF

// START SEARCH

DO WHILE lAll .OR. nCount > 0

aMatch:=HB_REGEXATX(cpSearch,cString,lCase,lNewLine)

//aMatch: { {Find,Start,End} [,{FindGr1,StartGr1,EndGr1},...] }

IF EMPTY(aMatch) //not find

EXIT

ENDIF

nFind++

IF nFind>=nStart

// now change in cReplace "$..."

cRep0:=cReplace

cRep:=""

DO WHILE (pos:=AT("$",cRep0)) > 0

xG1:=SUBSTR(cRep0,pos+1,1)

xG2:=SUBSTR(cRep0,pos+2,1)

IF xG1=="$" // '$$' -> '$'

cRep += LEFT(cRep0,pos)

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1=="\" // '$\' -> ''

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1$"&0" // all finding text

cRep += LEFT(cRep0,pos-1)+aMatch[1,1]

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1$"123456789" // $1 .. $9

IF xG2$"0123456789" // $10 .. $99

IF (xG2:=VAL(xG1+xG2)+1) <= LEN(aMatch) //check group 10..99

cRep += LEFT(cRep0,pos-1)+aMatch[xG2,1]

cRep0 := SUBSTR(cRep0,pos+3)

ELSE //group not exist -> empty

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+3)

ENDIF

ELSE // check group 1..9

IF (xG1:=VAL(xG1)+1) <= LEN(aMatch) //group exist

cRep += LEFT(cRep0,pos-1)+aMatch[xG1,1]

cRep0 := SUBSTR(cRep0,pos+2)

ELSE //group not exist -> ''

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+2)

ENDIF

ELSE // '$x' -> copy as is

cRep += LEFT(cRep0,pos+1)

cRep0 := SUBSTR(cRep0,pos+2)

ENDIF

ENDDO

cRep += cRep0

cRet += LEFT(cString,aMatch[1,2]-1)+cRep

cString := SUBSTR(cString,aMatch[1,3]+1)

nCount--

ENDIF

ENDDO

cRet += cString

RETURN cRet

/* end FUNCTION hb_RegExStrTran() */

Qatan

unread,

May 22, 2014, 3:20:57 AM5/22/14

to harbou...@googlegroups.com

Hello Adam,

>...

>or try extend version:

>...

Thanks for your nice input.

Really interesting. I search about it and seems that you did such function in 2012 and that it was added to xHarbour... am I right?

Now just tell me what is the difference and/or advantage of it over hb_regExReplace()?

And if there is advantage on it... why it is not added to Harbour? Any special reason?

Thanks for your help

Qatan

unread,

May 22, 2014, 5:08:33 AM5/22/14

to harbou...@googlegroups.com

Hello Rolf,

>show the regex solution, and please a very easy benchmark (measure 100 times or so),
>would interest me ...

It does the job and in very good time!

My simple benchmark with a TXT file (5Mb) gave the follow results:

RegExStrTran: 237.40 seconds Smile

nataq.......: 240.30 seconds Smile

RegExReplace: 912.30 seconds Sad smile

Computer used for the test: Average Intel® Core™ i3-2310M CPU @ 2.10GHz with 4Gb RAM and 32-bit Windows 7 Professional

------8<------

PROCEDURE Main()

LOCAL nTime

LOCAL cFile1 := MEMOREAD( 'test1.txt' )

LOCAL cFile2 := MEMOREAD( 'test2.txt' )

LOCAL cFile3 := MEMOREAD( 'test3.txt' )

? 'Start...'

nTime := SECONDS()

cFile1 := nataq( cFile1, 'QataN', ' Q A T A N ' )

? 'nataq', SECONDS() - nTime

nTime := SECONDS()

cFile2 := hb_regExReplace( 'QataN', cFile2, ' Q A T A N ', .F. )

? 'regEx', SECONDS() - nTime

nTime := SECONDS()

cFile3 := hb_regExStrTran( cFile3, 'QataN', ' Q A T A N ',,, .F. )

? 'StrTran', SECONDS() - nTime

MEMOWRIT( 'test1.txt', cFile1 )

MEMOWRIT( 'test2.txt', cFile2 )

MEMOWRIT( 'test3.txt', cFile3 )

RETURN

FUNCTION nataq( cSource, cRepl, cTrans )

LOCAL cTarget := ""

LOCAL nPos

DO WHILE LEN( cSource ) > 0

IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0 // nor more fun

cTarget += cSource

EXIT

ENDIF

cTarget += LEFT( cSource, nPos - 1 ) + cTrans

cSource := SUBSTR( cSource, nPos + LEN( cRepl ) )

ENDDO

RETURN cTarget

------>8------

It was slightly slower (less than 3 seconds) compared to RegExStrTran but your solution is much simpler / cleaner...

Can we introduce your solution to Harbour?

hb_StrTran( <cString> , ; <cSubString>, ; [<cReplace>] , ; [<nStart>] , ; [<nCount>], ; [<lCaseSensitive>] ) –> cNewString

...but with non-case sensitive active by default...

Also hb_regExStrTran() seems a very good solution (maybe with more power and flexibility?)

Why it’s not added to Harbour? Any special reason? Maybe it would be enough although your solution is smart!

Regards,

Qatan

wlEmoticon-smile[1].png

wlEmoticon-sadsmile[1].png

AL67

unread,

May 22, 2014, 5:49:46 AM5/22/14

to harbou...@googlegroups.com

My function can use BACKREFERENCE with speclal token: $

Sample, I wont change in string all numers like 123,45 to 123.45 but not change other commas

MyString : "Sample, ,string. Number1 12,4 also, number2, 67,89"

HB_RegExStrTran(MySting,"(\d),(\d)", "$1.$2")

result: "Sample, ,string. Number1 12.4 also, number2, 67.89"

or

HB_RegExStrTran("ABC 1 DEF 23 GHI 4","\d","digit:$&") -> "ABC digit:1 DEF digit:2digit:3 GHI digit:4"

Backreference is power of regular expression.

Adam

Qatan

unread,

May 22, 2014, 5:57:24 AM5/22/14

to harbou...@googlegroups.com

Hello Adam,

>...

>Backreference is power of regular expression.

That’s very interesting... why not in Harbour already?

Thanks for sharing such nice job.

Regards,

Qatan

elch

unread,

May 22, 2014, 7:46:29 AM5/22/14

to harbou...@googlegroups.com

Hi Qatan,

RegExStrTran: 237.40 seconds

nataq.......: 240.30 seconds

RegExReplace: 912.30 seconds

so nataQ ;) could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

do it only one time at start and use then in the loop: .. + nLen

Easy game, because for what i know about regex is, that you can do real *crazy* search[ and replace ] with reg[ular]ex[pressions].

You may google for PCRE and may have a look into hbregex.c

I'm completely unexperienced, what Harbour can do -- about grouping, back-referencing etc ...

And we have to distinguish, if they are implemented high at 'prg-level' or low-level ..

best regards

Rolf

Qatan

unread,

May 22, 2014, 10:39:53 AM5/22/14

to harbou...@googlegroups.com

Hello Rolf,

>...

>

so nataQ ;) ...

That’s funny! I didn’t notice the name before. I thought nataq meant something in your language Smile

>...

could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

>do it only one time at start and use then in the loop: .. + nLen

Well... I do not know how to do that... can you please do the modification and post to us?

>Easy game, because for what i know about regex is, that you can do real *crazy* search[ and replace ] with reg[ular]ex[pressions].

>You may google for PCRE and may have a look into hbregex.c

>I'm completely unexperienced, what Harbour can do -- about grouping, back-referencing etc ...

>And we have to distinguish, if they are implemented high at 'prg-level' or low-level ..

I don’t have experience either but feel like regex is very powerful! But for my specific need nataq does the job very well.

Thanks for all your help in interest.

Regards,

Qatan

wlEmoticon-smile[1].png

elch

unread,

May 22, 2014, 12:17:22 PM5/22/14

to harbou...@googlegroups.com

Hi Qatan,

>...

could win the *easy* game -- if we pull out of the DO WHILE loop the .. + LEN( cRepl ),

>do it only one time at start and use then in the loop: .. + nLen

strategy: spare function calls where possible ..
LEN() in Harbour is very fast, but a variable decrease faster, so:

---
FUNCTION nataQ( cSource, cRepl, cTrans )
LOCAL cTarget := ""
LOCAL nLen := LEN( cSource )
LOCAL nRepl := LEN( cRepl )
LOCAL nPos

IF nRepl < 1 /* secure an exception */
    cTarget := cSource
ELSE
    DO WHILE nLen > 0
      IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0 /* no more fun */

        cTarget += cSource
        EXIT
      ENDIF
      cTarget += LEFT( cSource, nPos - 1 ) + cTrans

      cSource := SUBSTR( cSource, nPos + nRepl )
      nLen--
    ENDDO
ENDIF
RETURN cTarget
---

BTW, sure you noticed that cTrans can be longer, shorter or even "" for removing cRepl.
And the exception: cRepl == "", is now 'catched' ( would lead to endless loop )

best regards
Rolf

Qatan

unread,

May 22, 2014, 4:21:34 PM5/22/14

to harbou...@googlegroups.com

Hello Rolf,

>

strategy: spare function calls where possible ..
>LEN() in Harbour is very fast, but a variable decrease faster, so

>...

Nice! I am using it now.

I made new tests:

hb_regExReplace: 250.28s

hb_regExStrTran: 112.39s

nataQ (1st ver): 111.44s

nataQ (2nd ver): 110.31s

Same computer but with smaller TXT file (it was too long before).

So, as you can see the new version is faster.

Thanks for all your help and care.

Regards,

Qatan

Pete

unread,

May 23, 2014, 4:21:16 AM5/23/14

to harbou...@googlegroups.com

Hi Qatan
if you have time, could you try the sample below to see if it makes any sense?

8<--------------------------------------cut

FUNCTION Main()
/* compile with xhb.lib */

LOCAL cString := '("do some functions dream of a lightning fast replacement?")' + hb_EoL() +;
                 "aBc aBcaBcaBcaBcaBcaBcaBc xcr aBcaBcaBcaBc aBcaBcaBcaBcaBcaBcaBc aBcaBcaBcaBc" + hb_EoL() +;
                 "ABC ABCABCABCABCABCABCABC ABCABCABCABC ABCABCABCABCABCABCABC ABCABCABCABC" + hb_EoL() +;
                 "abc abcabcabcabcabcabcabc abcabcabcabc abcabcabcabcabcabcabc abcabcabcabc " + hb_EoL() +;
                 "ABC123 abc123 12abc3"

LOCAL cFind := "AbC"
LOCAL cReplace := "123"
LOCAL nTime, t1, t2

nTime := Seconds()
FOR nI := 1 TO 100000 // a decent 'one hundred thousands' loop
    QuickRepl( cString, cFind, cReplace )
NEXT
? QuickRepl( cString, cFind, cReplace )
t1 := Seconds() - nTime
?

nTime := Seconds()
FOR nI := 1 TO 100000
    nataQ( cString, cFind, cReplace )
NEXT
? nataQ( cString, cFind, cReplace )
t2 := Seconds() - nTime

?
? "QuickRepl() spent about ->", t1 , "seconds"
? "nataQ()     spent about ->", t2 , "seconds"

wait
RETURN

FUNCTION QuickRepl( cString, cFind, cReplace )
/*beware the wolf.. (potential deadloop inside!)*/
LOCAL cSubString := HB_AtX( cFind, cString, .F. )

WHILE ! Empty( cSubString )

    cString := StrTran( cString, cSubString, cReplace )

    cSubString := HB_AtX( cFind, cString, .F. )

END

RETURN cString

FUNCTION nataQ( cSource, cRepl, cTrans )
LOCAL cTarget := ""
LOCAL nLen := LEN( cSource )
LOCAL nRepl := LEN( cRepl )
LOCAL nPos

IF nRepl < 1 /* secure an exception */
    cTarget := cSource
ELSE
    DO WHILE nLen > 0
      IF ( nPos := hb_AtI( cRepl, cSource ) ) == 0 /* no more fun */

        cTarget += cSource
        EXIT
      ENDIF
      cTarget += LEFT( cSource, nPos - 1 ) + cTrans
      cSource := SUBSTR( cSource, nPos + nRepl )
      nLen--
    ENDDO
ENDIF
RETURN cTarget

cut----------------------------------------->8

---
Pete

elch

unread,

May 23, 2014, 6:44:29 AM5/23/14

to harbou...@googlegroups.com

Hi Pete,

well done !, convincing fast solution

hb_AtX() is new to me, but so we can work around the case-sensitive-ness of StrTran() ...

Very thanks for the tip !

best regards

Rolf

elch

unread,

May 23, 2014, 8:14:01 AM5/23/14

to harbou...@googlegroups.com

Hi again Pete,

nevertheless thanks for tip with hb_AtX() !,
but there is a hidden 'trapdoor':

try to replace:

"tat" with "Tet"

in string:

"you will see the pitfall in Tatat"

nataQ: "... Tetat"

QuckRepl: " .. TeTet"

best regards

Rolf

Qatan

unread,

May 23, 2014, 9:30:30 AM5/23/14

to harbou...@googlegroups.com

Hello Pete,

>...

>...QuickRepl( cString, cFind, cReplace )

>...

Follows the result I got testing my simple way with a big TXT file (5MB):

nataQ()........... 105s

nataQ2().......... 104s

hb_regExReplace(). 258s

hb_regExStrTran(). 121s

QuickRepl()....... 0.11s (!)

Your solution is really fast and seems to work fine.

Thanks for suggesting it!

Qatan

unread,

May 23, 2014, 9:30:30 AM5/23/14

to harbou...@googlegroups.com

Hello Rolf,

>nevertheless thanks for tip with hb_AtX() !,
>but there is a hidden 'trapdoor':

Good point... but is there anyway to try to use it without the ‘trapdoor’?

I just thought about some solution because it’s amazing how fast this one goes...

Thanks for your care and for taking your precious time to test it!

Qatan

unread,

May 23, 2014, 11:05:20 AM5/23/14

to harbou...@googlegroups.com

Hello Pete,

>...

>cSubString := HB_AtX( cFind, cString, .F. )

>...

Just one question about hb_AtX()... according to xHarbour documentation it returns the first substring contained in <cString> that matches the regular expression <cRegEx>. If no match is found, the return value is NIL... so we need to add a protection in this case, right?

Qatan

unread,

May 23, 2014, 11:20:53 AM5/23/14

to harbou...@googlegroups.com

>nataQ: "... Tetat"

>QuckRepl: " .. TeTet"

Sorry Rolf but why it does that?

I do not understand...

Qatan

Pete

unread,

May 23, 2014, 12:30:13 PM5/23/14

to harbou...@googlegroups.com

Hi Rolf,

On Friday, May 23, 2014 3:14:01 PM UTC+3, elch wrote:

but there is a hidden 'trapdoor':

(sigh) always there is one. 'trapdoor'. everywhere. (perhaps that keeps game excitement)

try to replace:
"tat" with "Tet"

in string:

"you will see the pitfall in Tatat"

nataQ: "... Tetat"

QuckRepl: " .. TeTet"

I'm not sure which one is the correct. it depends on how you see it or what is the intended result.

If you want to replace every "tat" with "Tet" then nataQ does a half work, since "Tetat" still contains a "tat".
If you want to replace the first 'tat' in every 'Tatat' then you probably could search for 'Tatat' s and manipulate them (perhaps with an other function?)
(Not to mention that instead of 'tat', you should better choose to search for 'ta' since the last 't' seems redundant..)

On the other hand, QuickRepl, (one could claim that) it returns a rather unwanted ('dumb') capitalized result. (second capital 'T')
I suppose that this is due to insensitive search that HB_AtX is forced to do. (the .F. on the 3rd arg.) but IMO is an expected result.
Anyway the case is a bit puzzling.. ;->

P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.

regards,

---
Pete

Pete

unread,

May 23, 2014, 12:53:09 PM5/23/14

to harbou...@googlegroups.com

On Friday, May 23, 2014 7:30:13 PM UTC+3, Pete wrote:

(Not to mention that instead of 'tat', you should better choose to search for 'ta' since the last 't' seems redundant..)

That's a wrong remark! Please Ignore..
(got really puzzled with those tat's and tet's) ;->

---
Pete

Pete

unread,

May 23, 2014, 1:15:39 PM5/23/14

to harbou...@googlegroups.com

Hi Qatan

On Friday, May 23, 2014 6:05:20 PM UTC+3, Qatan wrote:

Just one question about hb_AtX()... according to xHarbour documentation it returns the first substring contained in <cString> that matches the regular expression <cRegEx>. If no match is found, the return value is NIL... so we need to add a protection in this case, right?

I don't understand; Protection from what?
Isn't WHILE ! Empty( cSubString ) enough in the case of a returned NIL ?

(unless you mean something else which you might want to clarify since I can't guess.)

regards,

---
Pete

elch

unread,

May 23, 2014, 2:34:34 PM5/23/14

to harbou...@googlegroups.com

Hi again Pete,

I'm not sure which one is the correct. it depends on how you see it or what is the intended result.

If you want to replace every "tat" with "Tet" then nataQ does a half work, since "Tetat" still contains a "tat".
If you want to replace the first 'tat' in every 'Tatat' then you probably could search for 'Tatat' s and manipulate them (perhaps with an other function?)

my example with 'tat' to 'tet' ;-))

is certainly deliberately constructed: i tried an example as easy and short as possible.

The idea behind is to generate with the replace loop before a new cFind which wasn't there before.

So the inner 't' of 'tatat belongs either to the first 'tat' which is replaced, or to the remaining 'tat' in the word

-- in my example it is an overlapping letter, and in this kind your really fast QuickRepl() behave 'different'.

P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.

yes: if cFind is part of cReplace, we will get a nearly infinite loop up to memory is filled ...

best regards

Rolf

Qatan

unread,

May 24, 2014, 3:33:56 PM5/24/14

to harbou...@googlegroups.com

Hello Pete,

>I don't understand; Protection from what?
>Isn't WHILE ! Empty( cSubString ) enough in the case of a returned NIL ?
(unless you mean something else which you might want to clarify since I can't guess.)

Do you mean that this would work: EMPTY( NIL ) returns .T.?

I tested and you are right... I thought it would give an error...

Never mind. I learned one more!

Thanks for your help and care.

Regards,

Qatan

unread,

May 25, 2014, 2:36:22 AM5/25/14

to harbou...@googlegroups.com

Hello all,

>(sigh) always there is one. 'trapdoor'. everywhere. (perhaps that keeps game excitement)

Now I understood why QuickRepl() does that...

>Anyway the case is a bit puzzling.. ;->

For me also... so what to do?

>P.S. The pitfall i see with QuickRepl as it is now, is a possible dead-loop which could show up in some circumstances, but can easily overcome it.

That’s IMHO dangerous... how to overcome it?

It’s nice to exchange with you all. I learn a lot.

Regards,

Qatan

unread,

May 25, 2014, 2:36:23 AM5/25/14

to harbou...@googlegroups.com

Hello all,

>yes: if cFind is part of cReplace, we will get a nearly infinite loop up to memory is filled ...

Nice point but how to protect against such situation (and if possible not loose much performance)?

You all are smart! (or maybe I am getting a bit rusty or should I say LAZY?) Winking smile

Qatan

wlEmoticon-winkingsmile[1].png

elch

unread,

May 25, 2014, 4:16:17 AM5/25/14

to harbou...@googlegroups.com

Hi Qatan,

re-inventing the wheel: heureka !, it must be round ;-)

I like to call my technic 'the string eater' - because what is eaten can't be chewed again -- usually ;-)

Maybe there is a bunch of logic possible, that QuickRepl() can be fixed -- but if that is then afterwards still faster ?

---

So your easy task is to take src/rtl/strtran.c,

duplicate ! it, and add a codepage depending case insensitive behaviour

- but please as a duplicate, because it will rob the original StrTran() quite a remarkable amount of speed.

Then you have the fastest possible solution.
And please, contribute it !

best regards

Rolf

Qatan

unread,

May 25, 2014, 4:28:27 AM5/25/14

to harbou...@googlegroups.com

Hello Rolf,

>I like to call my technic 'the string eater' - because what is eaten can't be chewed again -- usually ;-)

>Maybe there is a bunch of logic possible, that QuickRepl() can be fixed -- but if that is then afterwards still faster ?

I like your technic also and I agree that QuickRepl() isn’t a good solution right now because it is unsafe or will not be faster...

>So your easy task is to take src/rtl/strtran.c,

>duplicate ! it, and add a codepage depending case insensitive behavior

...hmmm, at least I was obedient and opened it to see how it looks like but I have to confess that I felt like an old and inflexible “redneck” trying to speak Chinese... to much for me to start with C at this point... sorry.

I think I will stay with nataQ() for it’s simplicity and good performance. After all it does the job!

Qatan

Charly 9000

unread,

Sep 3, 2021, 1:42:46 PM9/3/21

to Harbour Users

Hi friends,

My problem is the following: I need to search in a text for the string

[size=18] and replace with -> style="font-size:18px;"

18 is the variable part, so the expression would be

[size=xx]

According to your tests, what would be the fastest technique to execute this test.

Thank you

Carles.

Antonio Linares

unread,

Sep 4, 2021, 4:36:03 AM9/4/21

to Harbour Users

Dear Charly,

function Main()

local aTokens := hb_regexAll( '\[size=[1-9]+\]', "before [size=18] after",,,,,.F. )

local aToken

if Len( aTokens ) > 0

for each aToken in aTokens

? aToken[ 1 ]

return nil

Antonio Linares

unread,

Sep 4, 2021, 4:50:27 AM9/4/21

to Harbour Users

https://regex101.com/r/Vef046/1

Charly 9000

unread,

Sep 5, 2021, 5:41:13 AM9/5/21

to Harbour Users

Dear Antonio,

In your example this function returns an array of array. Each element is made up of {token, start, end} -> {"[size=18]", 8, 16}

But is there a possibility to know the variable value, in this case 18? (without having to use at (), substr (), ...).

Ok, you can create a little function, but I don't know if we can extract this value -> [1-9] +

Regards.

Charly

Antonio Linares

unread,

Sep 6, 2021, 6:51:41 AM9/6/21

to Harbour Users

Charly,

Using the above excellent function hb_RegExStrTran()

function Main()

? hb_RegExStrTran( "before [size=18] after", '\[size=[1-9]+\]', "[size=555]" )

return nil

FUNCTION hb_RegExStrTran(cString,cpSearch,cReplace,nStart,nCount,lCase,lNewLine)

LOCAL aMatch,nFind:=0,cRet:=""

LOCAL cRep:="",cRep0,pos,xG1,xG2,lAll

IF !VALTYPE(cString)$"CM"

// do error ???

RETURN nil // or ""

ENDIF

IF !VALTYPE(cpSearch)$"CM"

IF !HB_ISREGEX(cpSearch)

// do error ???

RETURN nil // or ""

ENDIF

IF !VALTYPE(cReplace)$"CM"

cReplace := ""

ENDIF

IF !VALTYPE(nStart) == "N"

nStart := 1

ENDIF

IF !VALTYPE(nCount) == "N"

nCount := 0

lAll := .T.

ELSE

lAll := .F.

ENDIF

// StrTran() work this way:

IF !lAll .AND. nCount == 0

RETURN ""

ENDIF

IF nCount < 0

RETURN cString

ENDIF

IF nStart < 1

RETURN cString

ENDIF

// START SEARCH

DO WHILE lAll .OR. nCount > 0

aMatch:=HB_REGEXATX(cpSearch,cString,lCase,lNewLine)

//aMatch: { {Find,Start,End} [,{FindGr1,StartGr1,EndGr1},...] }

IF EMPTY(aMatch) //not find

EXIT

ENDIF

nFind++

IF nFind>=nStart

// now change in cReplace "$..."

cRep0:=cReplace

cRep:=""

DO WHILE (pos:=AT("$",cRep0)) > 0

xG1:=SUBSTR(cRep0,pos+1,1)

xG2:=SUBSTR(cRep0,pos+2,1)

IF xG1=="$" // '$$' -> '$'

cRep += LEFT(cRep0,pos)

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1=="\" // '$\' -> ''

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1$"&0" // all finding text

cRep += LEFT(cRep0,pos-1)+aMatch[1,1]

cRep0 := SUBSTR(cRep0,pos+2)

ELSEIF xG1$"123456789" // $1 .. $9

IF xG2$"0123456789" // $10 .. $99

IF (xG2:=VAL(xG1+xG2)+1) <= LEN(aMatch) //check group 10..99

cRep += LEFT(cRep0,pos-1)+aMatch[xG2,1]

cRep0 := SUBSTR(cRep0,pos+3)

ELSE //group not exist -> empty

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+3)

ENDIF

ELSE // check group 1..9

IF (xG1:=VAL(xG1)+1) <= LEN(aMatch) //group exist

cRep += LEFT(cRep0,pos-1)+aMatch[xG1,1]

cRep0 := SUBSTR(cRep0,pos+2)

ELSE //group not exist -> ''

cRep += LEFT(cRep0,pos-1)

cRep0 := SUBSTR(cRep0,pos+2)

ENDIF

ELSE // '$x' -> copy as is

cRep += LEFT(cRep0,pos+1)

cRep0 := SUBSTR(cRep0,pos+2)

ENDIF

ENDDO

cRep += cRep0

cRet += LEFT(cString,aMatch[1,2]-1)+cRep

cString := SUBSTR(cString,aMatch[1,3]+1)

nCount--

ENDIF

ENDDO

cRet += cString

RETURN cRet

/* end FUNCTION hb_RegExStrTran() */

regards,

Charly 9000

unread,

Sep 6, 2021, 8:03:28 AM9/6/21

to Harbour Users

Antonio,

Thanks for tip. The goal was to create a function to create my own code for html, similar to bb_code in my application. As I have commented, I had the problem of knowing the variable value of the expression. In the end I also built a function that works properly. If someone wants to use it, I copy it here

function main()

local cTxt := "And before [size=36] after [/size] [size=54]Ciao.[/size]"

? bb_code_size( cTxt )

cTxt := "And before [color=red] after [/color] [color=green]Ciao.[/color]"

? bb_code_color( cTxt )

cTxt := "And before [size=36][color=red] after [/color][/size] [size=54][color=green]Ciao.[/color][/size]"

cTxt := bb_code_size( cTxt )

cTxt := bb_code_color( cTxt )

? cTxt

retu nil

function bb_code_size( cTxt )

cTxt := RegExStrTran( cTxt, '\[size=[1-9]+\]', '<span style="font-size:$px;">' )

cTxt := StrTran( cTxt, '[/size]', '</span>')

retu cTxt

function bb_code_color( cTxt )

cTxt := RegExStrTran( cTxt, '\[color=[a-z]+\]', '<span style="color:$;">' )

cTxt := StrTran( cTxt, '[/color]', '</span>')

retu cTxt

function RegExStrTran( cTxt, cPatron, cTag, cSeparator, cWillcard )

local aTokens, cToken, nStart, nEnd, cBlockA, cBlockB, nOffset, cValue

hb_default( @cSeparator, '=' )

hb_default( @cWillcard, '$' )

while Len( aTokens := hb_regexAll( cPatron, cTxt,,,1,,.F. ) ) > 0

cToken := aTokens[1][1][1]

nStart := aTokens[1][1][2]

nEnd := aTokens[1][1][3]

cBlockA := Substr( cTxt, 0, nStart - 1 )

cBlockB := Substr( cTxt, nEnd + 1 )

nOffset := At( cSeparator, cToken ) + 1

cValue := Substr( cToken, nOffSet, len( cToken ) - nOffset )

cTxt := cBlockA + StrTran( cTag, cWillcard, cValue ) + cBlockB

end

retu cTxt

Thanks to all

Regards.

Charly.

AL67

unread,

Sep 15, 2021, 5:06:37 AM9/15/21

to Harbour Users

piątek, 3 września 2021 o 19:42:46 UTC+2 Charly 9000 napisał(a):

Hi friends,

My problem is the following: I need to search in a text for the string

[size=18] and replace with -> style="font-size:18px;"

18 is the variable part, so the expression would be

[size=xx]

Use backreference of regular expession

aRet := HB_RegEx( cString , "(.*)\[size=(\d+)\](.*)" ) // any text (gr1) + [size= + digits (gr2) + ] + any text (gr3)

aRet -> {cAllFound,cGroup1,cGroup2,cGroup3}

cNewString := aRet[2] + 'style="font-size:' +aRet[3] + 'px;"' + aRet[4]

Or use my function HB_RegExStrTran() (source in this thread)

cNewString := HB_RegExStrTran( cString , "\[size=(\d+)\]" , 'style="font-size:$1px;"') //change all found

Adam

Charly 9000

unread,

Oct 12, 2021, 12:33:44 PM10/12/21

to Harbour Users

Hi Adam,

Thanks for your response. I really didn't know how to do this with HB_RegEx(), thanks.

But I don't understand it or I do something wrong, I'm sorry

This code does not work. Where this error?

function main()

local cString := "Hello [size=18], how are you ?"

local aRet := HB_RegEx( cString , "(.*)\[size=(\d+)\](.*)" )

? aRet

return nil

This code return empty array()

Thanks.

Charly.

AL67

unread,

Oct 14, 2021, 5:18:08 AM10/14/21

to Harbour Users

wtorek, 12 października 2021 o 18:33:44 UTC+2 Charly 9000 napisał(a):

Hi Adam,

Thanks for your response. I really didn't know how to do this with HB_RegEx(), thanks.
But I don't understand it or I do something wrong, I'm sorry

This code does not work. Where this error?

function main()

local cString := "Hello [size=18], how are you ?"
local aRet := HB_RegEx( cString , "(.*)\[size=(\d+)\](.*)" )

? aRet

aRet is type array. If HB_RegEx() not found string array is empty {}

If found 1-st element is founded string.

If regex include groups 2-nd and next elements is founded groups

So in example

aRet := { "Hello [size=18], how are you ?" , "Hello " , "18" , ", how are you ?"}

Adam

Charly 9000

unread,

Oct 20, 2021, 1:38:57 AM10/20/21

to Harbour Users

Adam

? valtype(aRet) // Return A

? len(aRet) // Return 0

Thanks.

Charly

AL67

unread,

Oct 20, 2021, 4:39:45 AM10/20/21

to Harbour Users

Hi.

Correct syntax:

aResult := hb_RegEx( cRegex, cString, [lCase], [lNewLine] )

So example:

function main()

local cString := "Hello [size=18], how are you ?"

local aRet := HB_RegEx( "(.*)\[size=(\d+)\](.*)" , cString )

? aRet[1]

? aRet[2]

? aRet[3]

? aRet[4]

return

Adam

Charly 9000

unread,

Oct 20, 2021, 4:50:58 AM10/20/21

to Harbour Users

Adam,

Yes now :-)

I understand how it works

Thank you.

C.

Reply all

Reply to author

Forward