I'm having some problems with the following regsub expression. Given a list
of words, eg. "the a an of and", I wish to remove these words from a
sentence. How should I do it?
For eg, "the quick brown fox and the lazy dog" will become "quick brown fox
lazy dog" after the substitution.
I tried
regsub -all -nocase "( (the|a|an|of) )" $text $replaced
but it did not work for all expressions. Words at the beginning and at the
end of the sentence would not be replaced. Also, if two of the words
appeared side by side, like "and the" in the above sentence, only one word
gets replaced.
Thanx for any help.
Regards,
Grafstrom
Try this:
set re {(?:^|\s)(?:the|a|an|of)(?:$|\s)}
regsub -all -nocase $re $text { } newText
set newText [string trim newText]
You were requesting that the word to be substituted have a space both
before and after; this here requests (either a space or string start)
before the word, and also (either a space or string end) afterwards. The
selected words are replaced by a space, which may add a space at the
start and another one at the end - hence the trimming.
Punctuation and line breaks may also have to be taken care of ...
Hope this helps
Miguel
Let's say $text = "fox and the lazy dog"
Using the RE you suggested, only "and" will be removed, and the resultant
$text becomes "fox the lazy dog". How may the RE be changed such that both
"and" and "the" are removed?
Regards,
Grafstrom
"miguel sofer" <m...@utdt.edu> wrote in message
news:3C20D9EF...@utdt.edu...
Use the special escape characters to anchor the beginning and ending of a
word :
regsub -all -- {\m()\M} $text {} result
You may need an additionnal step to remove duplicate spaces :
regsub -all -- {\s+} $result {} result
To play with regexp and help you to find the "magical expression", you can
use
my little VisualREGEXP tool (http://laurent.riesterer.free.fr/regexp)
Laurent.
I just forgot the pattern to match ...
regsub -all -- {\m(an?|the|of)\M} $text {} result
Laurent Riesterer points out the special escape characters to anchor at
the beginning and ending of a word; it can be combined with a trick to
remove the extra white space in one fell swoop:
% set RE {(\s)?\s*\m(?:the|of|a|and?)\M\s*}
(\s)?\s*\m(?:the|of|a|and?)\M\s*
% set text "the quick brown fox and the lazy dog"
the quick brown fox and the lazy dog
% regsub -all -nocase -- $RE $text {\1} replacement
3
% set replacement