regular expressions and operations other than replace

Mike Wexler

unread,

Oct 17, 1986, 6:38:47 PM10/17/86

to

Since I have switched from vi to EMACS, there is one thing that I missed
more than anything else. The ability to perform an operation on all
the lines that met a particular criteria(specified by a regular expression).
For instance in vi, I could type in "/[A-Z][a-z]*/d" to delete all lines
that met the specified criteria or I could type in
"/$[A-Za-z][A-Za-z]*($.*$)$/s//\1\2". How would I do similar operations
in EMACS?

--
Mike Wexler
(trwrb|scgvaxd)!felix!peregrine!mike
(714)855-3923

Bill Wohler

unread,

Oct 18, 1986, 5:23:03 PM10/18/86

to

mike,

i had the same problem you did when i switch to gnuemacs,
but, since gnuemacs is fully extensible, i wrote my first
lisp function (no laughter from the peanut gallery, please).
in addition, emacs offers a richer regular expression set
than vi does. check it out!

anyway, here's the function that will get you started. load
it in and bind it to a key if you wish. currently you have to
set the region around the section of text that you want to
massage and then set your point at the beginning of the
region. therefore, if you wanted to do the whole file, you
could use C-x h to select the whole buffer.

it works fine, but could be more elegant. for instance, it
should work on the region even if you aren't at the
beginning of it. if any of you can improve on it, please be
sure to share your findings with us. thanks!

--bw
----- 8< re-relace-region follows -----
(defun re-replace-region () nil
(interactive)
(setq old (read-input "Replace string: " nil))
(setq new (read-input "with: " nil))
(while (< (dot) (mark))
(re-search-forward old (+ (mark) 1) nil)
(setq end (dot))
(re-search-backward old)
(delete-region (dot) end)
(insert new)
)
)

e...@lewey.uucp

unread,

Oct 18, 1986, 8:31:26 PM10/18/86

to

> Since I have switched from vi to EMACS, there is one thing that I missed
> more than anything else. The ability to perform an operation on all
> the lines that met a particular criteria(specified by a regular expression).
> For instance in vi, I could type in "/[A-Z][a-z]*/d" to delete all lines
> that met the specified criteria or I could type in
> "/$[A-Za-z][A-Za-z]*($.*$)$/s//\1\2". How would I do similar operations
> in EMACS?

There are several answers to this question. The easiest way to do
exactly what you want is to pipe the file through sed: set the mark at
the top, move to the bottom of the file, then use (in GNUmacs) the
"shell-command-on-region" command -- ^U ESC-|, passing the appropriate
agruments to sed. The buffer will be replaced by the results of the sed.

However, this isn't really in the "spirit of EMACS". When I have to do some
global operation, I usually use a keyboard macro. Start the macro with ^X-(,
do a single instance of the operation, then ^X-) to terminate the macro
definition. Repeat it a few times with ^X-e to make sure it works right,
then run it a bunch of times with ^U 999 ^X-e. If you have a search at the
beginning of the macro, it will stop executing as soon as the search fails.

For more interesting problems, there's the "grep" command. This allows you
do a search through many files for anything that grep can find. After
execution, calls to the "next-error" command (usually ^X-`) puts the cursor
on the next line containing the grep expression, in whatever file. If
that line is one that needs to be changed, you can easily generate a
keyboard macro that does the operation then moves to the next instance.

Your last example can be performed almost verbatim with the GNUmacs
"query-replace-regexp" command, which I bind to ESC-Q.

--
Ed Post {hplabs,voder,pyramid}!lewey!evp
American Information Technology
10201 Torre Ave. Cupertino CA 95014
(408)252-8713

John Robinson

unread,

Oct 20, 1986, 10:34:46 AM10/20/86

to

Then again, you could use this built-in function in the loop:

replace-regexp:
Replace things after point matching REGEXP with TO-STRING.
Preserve case in each match if case-replace and case-fold-search
are non-nil and REGEXP has no uppercase letters.
Third arg DELIMITED (prefix arg if interactive) non-nil means replace
only matches surrounded by word boundaries.
In TO-STRING, \& means insert what matched REGEXP,
and \<n> means insert what matched <n>th $...$ in REGEXP.

/jr

Paul Rubin

unread,

Oct 20, 1986, 3:48:20 PM10/20/86

to

In article <4...@lewey.UUCP> e...@lewey.UUCP (Ed Post) writes:
>> Since I have switched from vi to EMACS, there is one thing that I missed
>> more than anything else. The ability to perform an operation on all
>> the lines that met a particular criteria(specified by a regular expression).
>> For instance in vi, I could type in "/[A-Z][a-z]*/d" to delete all lines
>> that met the specified criteria or I could type in
>> "/$[A-Za-z][A-Za-z]*($.*$)$/s//\1\2". How would I do similar operations
>> in EMACS?
>
>There are several answers to this question. The easiest way to do

>exactly what you want is to pipe the file through sed...

Ugh!! The first thing to try when figuring out things like this is the
Apropos command. You can also get more detailed documentation from the
Emacs Info file: type C-h I, then look through the Command Index and
Concept Index nodes til you find what you want. In this case, you get
the following descriptions. You can also bring in the Lisp source
for these commands to see how they work or to modify them to do anything
you want on matching (non-matching, etc.) lines.

File: emacs Node: Other Repeating Search, Prev: Replace, Up: Search

Other Search-and-Loop Commands
==============================

Here are some other commands that find matches for a regular expression.
They all operate from point to the end of the buffer.

`M-x list-matching-lines'
Print each line that follows point and contains a match for the
specified regexp. A numeric argument specifies the number of context
lines to print before and after each matching line; the default is
none.

`M-x count-matches'
Print the number of matches following point for the specified regexp.

`M-x delete-non-matching-lines'
Delete each line that follows point and does not contain a match for
the specified regexp.

`M-x delete-matching-lines'
Delete each line that follows point and contains a match for the
specified regexp.

Walter Underwood

unread,

Oct 21, 1986, 3:19:36 PM10/21/86

to

> Since I have switched from vi to EMACS, there is one thing that I missed
> more than anything else. The ability to perform an operation on all
> the lines that met a particular criteria(specified by a regular expression).
>

> Mike Wexler

Just use keyboard macros. Start the macro, do a regex search, do the action,
and close the macro. Give it a very large arg (type ^U ten times) and
then execute the macro. Voila! Your buffer has been munged!

I had the same problem when converting from vi, but I find that keyboard
macros are actually more powerful than regular expressions, and easier to
specify. They are also much easier to teach to a novice.

wunderwood
wunder@hplabs

Chris Torek

unread,

Oct 21, 1986, 5:01:48 PM10/21/86

to

(Warning: the following article will tell you more than you ever
wanted to know about playing with regular expressions.)

In article <11...@peregrine.UUCP> someone writes:
>Since I have switched from vi to EMACS, there is one thing that I missed
>more than anything else. The ability to perform an operation on all
>the lines that met a particular criteria(specified by a regular expression).
>For instance in vi, I could type in "/[A-Z][a-z]*/d" to delete all lines
>that met the specified criteria or I could type in
>"/$[A-Za-z][A-Za-z]*($.*$)$/s//\1\2". How would I do similar operations
>in EMACS?

(You left out the `g': `g/[A-Z][a-z]*/d'.) Some of these operations
are best done by writing MLisp or elisp code, but note that a global
delete operation is trivial due to the way regular expressions work,
with the addition that Emacs can match newlines explicitly. Simply
add `.*' at the front of your R.E., and add `.*<^J>' at the end:

<ESC>x
: re-replace-string<RET>
Old pattern: .*[A-Z][a-z]*.*<^Q><^J><RET>
New string: <RET>

(Note that this should be done after moving to the top of the
buffer, since Emacs's replace operations work from wherever you
are now to the end of the buffer.) Since `.' matches any character
but newline, and `{class}*' matches the longest possible sequence
of {class}, this will always match full lines containing at least
one [A-Z].

The pattern can be simplified as well. The [a-z]* part is unnecessary,
as it matches zero or more `a's, `b's, ..., `z's. Yet the implied
`.*' in vi's global, or the explicit one in Emacs, subsumes this:

Old pattern: .*[A-Z].*<^Q><^J><RET>

There is one final possible optimisation that is very useful when
dealing with large files. Emacs's search code runs faster when it
can do an `anchored search'. (I am not using `anchored' in quite
the same sense as Snobol here. There may be a better term, but I
cannot think of it offhand.) By this I mean that a first character
that is considered `literal' speeds the matching operation.

For example, searching for `[A-Z][A-Z]*' is slow, but searching
for `A[A-Z]*' is fast. The reason is that a literal match (the
first `A' here) is a common case, and has been optimised by having
the search code first find one `A' before trying the full-blown
regular expression match operation.

But look at this: our original pattern is required to match a full
line! It must start at the beginning of a line, find one character
in [A..Z], match the rest of the line, then pick up a newline.
So we should be able to `anchor' it to the beginning of a line.
What begins a line? Well, `^' in a regular expression should do
this. We could use the pattern

^.*[A-Z].*<^J>

Unfortunately, this does not run any faster. Peeking at the innards
of the regular expression matcher shows why: `^' is not considered
a literal character. Curses! (No, not the library.) But lo! there
is another way to denote the beginning of a line. Every line begins
after the previous line ends, and every previous line ends with a
newline! We can use instead the pattern

<^J>.*[A-Z].*<^J>

But---oops!---we forgot something. The very first line does not have
a previous line. Now what can we do?

When all else fails, cheat: Add a blank line at the top of the file.
Now we have a previous line, and can use our modified pattern:

Old pattern: <^Q><^J>.*[A-Z].*<^Q><^J><RET>
New string: <RET>

Whoops, that seems to have deleted all the newlines as well. That
anchor we added came from the previous line, so we must put it back:

New string: <^Q><^J><RET>

But this is not necessary. Since we know all about how .* matches
everything it can, we simply notice that that final newline on the
original pattern is not necessary. If we leave it out, Emacs will
not match the newline between the line we wanted to delete and the
next. But that is all right: If we have Emacs leave that newline
behind, it will make up for the newline we stole from the previous
line. Thus the final pattern is:

Old pattern: <^Q><^J>.*[A-Z].*<RET>
New string: <RET>

Of course, when we are all done we have to clean up: we stuck an
extra blank line at the top of the buffer so that we could cheat.

The ultimate sequence of commands, then, is

ESC-< (top of buffer)
^O (add that extra blank line)
ESC-x re-replace-string (do the replace)
^Q ^J .*[A-Z].* RET (type in the old pattern)
RET (specify a blank new string)
^D (delete that extra blank line)

And lo! Emacs deletes every line containing an uppercase letter.
Not only that, it even does it faster than vi! :-)

(Actually, chances are that typing

ESC-< ^@ ESC-> ESC-x filter-region egrep -v "[A-Z]" RET

is just as fast, and easier to remember. We can use a wrench
as a hammer, but having the hammer too is nice.)
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: ch...@mimsy.umd.edu

pre...@uicsl.uucp

unread,

Oct 23, 1986, 11:24:00 AM10/23/86

to

All the responses to the question about using regular
expressions to select lines seem to be non-responsive.
While phr gave a useful summary from the manual that
points out that there IS a command to do the particular
example the questioner used, that was just an example,
not the whole question.

There ought to be a function to map another function
over the lines of the file, preferably with an re
filter. On the other hand, it really wouldn't be hard to
write one. You just need a function that takes a
string (the regular expression) and a function (to be
applied to matching lines) and loops over the buffer
looking for matching lines and applying the function.
I haven't memorized GNU function names, so I won't try
to write the function here.

You could also have a map-over-re-strings function that
stepped through the matching REs in the buffer and applied
the function to each.

For non-GNU users, you can delete matching lines in Unipress
by using hard newlines. That is, turn
newline.*pattern.*newline into newline. You can get a
newline in your patterns by quoting control-j.

--
scott preece
PRE...@GSWD-VMS.ARPA

Ian Merritt

unread,

Oct 24, 1986, 7:54:30 PM10/24/86

to

There are some kinds of replacement functions for which I would really
like the good ol' MIT-TECO (or even a reasonable subset) minibuffer. I
realize this wouldn't be of much value for the newcomers to the EMACS
world, but for many of us who have been using EMACS since ITS/TOPS-20,
that was a really quick escape mechanism for certain transformations of
medium complexity which probably could be performed with regex or other
scenarios, but not as quickly. How long it has been since I have seen
something like:

jsfoo$.,.+4uxsbar$xi$$

------------------------------

It has been so long that I am not even sure I remember the command set
quite correcltly, but I found it quite useful back then and there have
been times recently when I would have found it much faster than a ^X(
macro or other method.

Oh well...

<>IHM<>
--

uucp: ihnp4!nrcvax!ihm

Mike Wexler

unread,

Oct 29, 1986, 6:36:24 PM10/29/86

to

In article <4300002@uicsl> pre...@uicsl.UUCP writes:
>All the responses to the question about using regular
>expressions to select lines seem to be non-responsive.

Enclosed is a summary of the responses I got via e-mail summary of them
are quite helpful. BTW I have edited these.

>scott preece
>PRE...@GSWD-VMS.ARPA
-------------------------------------------------------------------------------
From: felix!hplabs!wee...@brahms.berkeley.edu (Matthew P Wiener)

(while (looking-at "regular expression") (action-1) (action 2) ...)

You can execute this without programming, by using ESC ESC to do a one-time
evaluation. If you don't know what the names associated to certain key
actions are and can't find them in the manual, look at C-h c to get the
name. Sometimes you need specific arguments for an action, C-h k or C-h f
will give you the details. Have fun.

ucbvax!brahms!weemba Matthew P Wiener/UCB Math Dept/Berkeley CA 94720
-----------------------------------------------------------------------------
From: felix!hplabs!ucbvax!p...@ernie.berkeley.edu (Paul Rubin)

>From the Emacs Info file (found by looking for "delete-matching-lines"
in the Command Index):

File: emacs Node: Other Repeating Search, Prev: Replace, Up: Search

Other Search-and-Loop Commands
==============================

Here are some other commands that find matches for a regular expression.
They all operate from point to the end of the buffer.

`M-x list-matching-lines'
Print each line that follows point and contains a match for the
specified regexp. A numeric argument specifies the number of context
lines to print before and after each matching line; the default is
none.

`M-x count-matches'
Print the number of matches following point for the specified regexp.

`M-x delete-non-matching-lines'
Delete each line that follows point and does not contain a match for
the specified regexp.

`M-x delete-matching-lines'
Delete each line that follows point and contains a match for the
specified regexp.

From: felix!trwrb!trwspp!spp2!urban (Mike Urban)

Easy. Filter the buffer through a "sed" command. As with many
Unix techniques, I haven't decided whether this method is wonderfully
elegant, or an awful kludge. It *is*, however, very useful. If you
find yourself doing it a lot, you can even have some key bound to a function
that prompts you for the sed command and does the work.

Mike Urban
-------------------------------------------------------------------------------
From: Bob Chassell <ccicpg!seismo!harvard!lmi-angel!bob>

The easiest thing to do is write a keyboard macro and then apply it
generally. Look up the online documentation with control-h i, get into
the manual and then look for regexps and for keyboard macros. (It
looks like there are a lot of nodes in the table of contents for the
manual but you can search for regexp and macro.)

Bob Chassell