Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regex alternative to nested functions

164 views
Skip to first unread message

sarah

unread,
Aug 28, 2003, 9:12:24 AM8/28/03
to
I am trying to write a formula that takes the contents of a cell,
removes a bunch of characters and prints it in another cell.

For example,

Take: Jim's Roofing..."The best roofing in town!"

Remove certain characters: ' . " !

And return: Jims Roofing The best roofing in town

There are a lot of characters I want to remove and some characters that
I want to replace with other text.

Basically, I have too many SUBSTITUTE fuctions to use in one formula.
VB is not my native language, so I am sorry if this is a stupid
question. Some languages allow you to use regular expressions so that
instead of having a separate SUBSTITUTE function for each character I
want to remove, I can use one.

Is this possible in VB too?

Daniel.M

unread,
Aug 28, 2003, 10:46:47 AM8/28/03
to
Hi Sarah,

You don't need RegEx for what you're doing.
I wrote MultiSubstitute a while back.
If ByChars argument is shorter than ToBeReplacedChars, the 'extra'
characters in ToBeReplacedChars will be eliminated.

Example:
=MultiSubstitute("Hello Mommy","elm","ab") will produce: "Habbo Moy"

Notice the "m" are eliminated because, being at the end of elm, they don't
have any corresponding letter in "ab".

' ===========================
'MultiSubstitute
'In an InputStr, replace all chars belonging to ToBeReplacedChars
'by the corresponding chars in ByChars
'ToBeReplacedChars and ByChars should be of the same length
'or else it removes the found chars
'Returns the string with the characters replaced (or removed)
'By Daniel M.
Function MultiSubstitute(InputStr As String, _
ToBeReplacedChars As String, ByChars As String) As String

Dim s As String, i As Integer
Dim anOffset As Integer, len_Input As Integer, len_ByChars As Integer

len_Input = Len(InputStr)
len_ByChars = Len(ByChars)
For i = 1 To len_Input
s = mid(InputStr, i, 1)
anOffset = InStr(ToBeReplacedChars, s)
If anOffset > 0 Then
If anOffset <= len_ByChars Then
MultiSubstitute = MultiSubstitute & _
mid(ByChars, anOffset, 1)
End If
Else
MultiSubstitute = MultiSubstitute & s
End If
Next

End Function
' ===========================

Regards,

Daniel M.


"sarah" <sarah...@excelforum.com> wrote in message
news:sarah...@excelforum.com...

Harlan Grove

unread,
Aug 28, 2003, 2:24:56 PM8/28/03
to
"Daniel.M" wrote...

>You don't need RegEx for what you're doing.

No. But she doesn't *need* anything beyond what Excel worksheet functions
provide. As written in her example,

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE("Jim's Roofing...""The best roofing
in town!""","'",""),".",""),"""",""),"!","")

would suffice. As for your example,

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE("Hello Mommy","e","a"),"l","b"),"m","")

would suffice. Granted nested SUBSTITUTE calls don't nest well, but if there are
8 or fewer characters to replace, nested SUBSTITUTE calls would be faster than
udfs.

I suspect the OP's using an overly simplified example. I'd guess only letters
and whitespace should remain. While it'd be possible to feed your UDF a string
of all the other characters as 2nd argument and "" as 3rd argument, that's be
much more difficult than deleting all characters in the class [^A-Za-z \t].
Regexps may not be necessary, strictly speaking, but they wipe the floor with
all other text matching approaches in terms of economy of expression.

A udf is also required to make use of the RegExp object provided by VBScript. If
the OP has Internet Explorer 5.0 or later installed, then VBScript 5 or higher
was installed with it. If not, it's a free download from Microsoft's web site.

http://msdn.microsoft.com/library/default.asp?url=/downloads/list/webdev.asp

Once installed, the udf below could be used in a formula like

=Subst("Jim's Roofing...""The best roofing in town!""","[^A-Za-z \t]+","")

Note that this doesn't give a space between the words Roofing and The. To
replace multiple chars with one space and delete single chars, try

=Subst(Subst("Jim's Roofing...""The best roofing in town!""",
"[^A-Za-z \t]{2,}"," "),"[^A-Za-z \t]","")

This wrinkle would pose nontrivial problems using MultiSubstitute if *single*
unwanted chars should be deleted while *multiple* unwanted chars should be
replaced by a single space. There's a place for transliteration, but it's not a
replacement for regular expressions.


'Similar to Excel's SUBSTITUTE but using VBScript's perl-like regexps
'Required:
' orig_text - string to search through,
' match_pat - regexp to find,
' replace_pat - replacement pattern
'Optional:
' instance - which matched substring to replace or 0 for all (default)
'-----------------------------------------
Function Subst(orig_text As String, _
match_pat As String, _
replace_pat As String, _
Optional instance As Variant) As Variant
'-----------------------------------------
Dim regex As Object, matches As Object, m As Object

If IsMissing(instance) Then
instance = 0#

ElseIf TypeName(instance) <> "Double" Then
Subst = CVErr(xlErrValue) 'invalid instance type
instance = -1#

ElseIf CDbl(instance) <= 0.5 Then
Subst = CVErr(xlErrNum) 'invalid instance value
instance = -1#

Else
instance = Int(instance + 0.5)

End If

If instance = -1# Then Exit Function 'do nothing quickly

Set regex = CreateObject("vbscript.regexp")
regex.Pattern = match_pat
regex.Global = True

If instance = 0# Then
Subst = regex.Replace(orig_text, replace_pat)

Else
Set matches = regex.Execute(orig_text)

If instance > matches.Count Then
Subst = orig_text 'matchnum out of bounds - do nothing

Else
Set m = matches.Item(instance - 1)
Subst = Left(orig_text, m.FirstIndex) & _
regex.Replace(m.Value, replace_pat) & _
Right(orig_text, Len(orig_text) - m.FirstIndex - m.Length)

End If

End If

End Function

--
Never attach files.
Snip unnecessary quoted text.
Never multipost (though crossposting is usually OK).
Don't change subject lines because it corrupts Google newsgroup archives.

Daniel.M

unread,
Aug 28, 2003, 5:22:40 PM8/28/03
to
Hi Harlan,

> "Daniel.M" wrote...
> >You don't need RegEx for what you're doing.
>
> No. But she doesn't *need* anything beyond what Excel worksheet functions

> provide. ...
>
> snip
>
> ... Granted nested SUBSTITUTE calls don't nest well, but if there are


> 8 or fewer characters to replace, nested SUBSTITUTE calls would be faster
than
> udfs.
>
> I suspect the OP's using an overly simplified example. I'd guess only
> letters and whitespace should remain.

I discarted the RegExp quite fast. Maybe too fast. :-)
But my take on this was the OP may need to replace/substitute more than 8
different characters. I think MultiSubstitute is adequate for the OP's
request as stated.

As for regular expressions, the link and Subst() UDF you provided will
enable the OP to use those powerful tools if she needs/wants it. I would
also suggest she downloads the help file related to Windows Scripting
Documentation (for v.5.6, it's SCRIPT56.CHM) if she wants to be sure about
the pattern metacharacters the regular expression engine is using.

Regards,

Daniel M.

Harlan Grove

unread,
Aug 28, 2003, 7:53:41 PM8/28/03
to
"Daniel.M" wrote...

>Hi Harlan,
>>"Daniel.M" wrote...
>>>You don't need RegEx for what you're doing.
>>
>>No. But she doesn't *need* anything beyond what Excel worksheet functions
>>provide. ...
..

>>I suspect the OP's using an overly simplified example. I'd guess only
>>letters and whitespace should remain.
>
>I discarted the RegExp quite fast. Maybe too fast. :-)
>But my take on this was the OP may need to replace/substitute more than 8
>different characters. I think MultiSubstitute is adequate for the OP's
>request as stated.
..

Then show how you'd use it to convert the OP's sample text,

Jim's Roofing..."The best roofing in town!"

into the OP's expected result,

Jims Roofing The best roofing in town

noting that in the result there's no space between Jim and s but there's one
space between Roofing and The. I realize it could be done using SUBSTITUTE to
replace the ellipsis with a space, then using MultiSubstitute to delete
remaining punctuation. MultiSubstitute calls alone won't do it.

Perhaps a better rule would be to delete the single quotes/apostrophes, change
all other punctuation to spaces, then pass that result through TRIM. But that
may not handle hyphens between words as intended. So maybe the ideal would be to
delete any punctuation between letters or after the last letter when there's no
intervening whitespace, then replace all other sequences of punctuation with
single spaces. In RegExp terms,

=Subst(Subst(A1,"\b[^A-Za-z \t]\b|[^A-Za-z \t]+$",""),"[^A-Za-z \t]+"," ")

Transliteration (including character deletion) doesn't function in a context of
surrounding characters. Maybe the OP doesn't need that functionality.

Tushar Mehta

unread,
Aug 28, 2003, 10:39:08 PM8/28/03
to
Take a look at my post (under the name tusharm) in
http://test.mrexcel.com/phpbb2/viewtopic.php?t=52283&start=10 for how
to make the Windows Scripting Host RegExp capability available as XL
user defined functions (UDFs).

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article <sarah...@excelforum.com>, sarah...@excelforum.com
says...

Daniel.M

unread,
Aug 29, 2003, 1:15:57 AM8/29/03
to
Hi Harlan,

> Then show how you'd use it to convert the OP's sample text,
> Jim's Roofing..."The best roofing in town!"
>
> into the OP's expected result,
> Jims Roofing The best roofing in town

> noting that in the result there's no space between Jim and s but there's
> one space between Roofing and The. I realize it could be done
> using SUBSTITUTE to replace the ellipsis with a space, then
> using MultiSubstitute to delete remaining punctuation.
> MultiSubstitute calls alone won't do it.

Ok. I now see what you meant.
I didn't pick up the 'special case' about treatment of words to prevent
'word collisions'.

As with string processing in general, there are a lot of cases involved. it
would be appreciated to know the OP's choices here.

Regards,

Daniel M.

Tushar Mehta

unread,
Aug 29, 2003, 6:43:04 AM8/29/03
to
Aarrgh! That URL should have been
http://www.mrexcel.com/board2/viewtopic.php?t=52283&start=10

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article <MPG.19b886dbb...@msnews.microsoft.com>,
ng_p...@bigfoot.com says...

0 new messages