echo "foo" | sed 's/(foo|bar)/foobar/'
print "foo", not "foobar". How does one do `or' with sed? And why
doesn't grep and sed use the same regular expressions? Grep seems to
understand the above or construction.
Both grep and sed use basic regular expressions.
() and | are a feature of extended regular expressions (as in
grep -E or awk (though awk has a special flavour of them)) not
basic regexps. Some sed implementations have \|, but that's not
standard. Note that () are only for grouping (and backrefs) (use
\(\) in BREs), they shouldn't be necessary here.
echo foo | awk '{gsub(/foo|bar/, "foobar");print}'
Or you can use perl regexps that are yet another flavour or
regexps:
echo foo | perl -pe 's/foo|bar/foobar/'
--
Stéphane
> Grep seems to understand the above or construction.
No, it doesn't. Only egrep(1) does, unless you are using a
relabelled GNU grep.
> And why doesn't grep and sed use the same regular expressions?
They (mostly) do. What you mean is: why don't they use the
same RE _syntax_?
Grep(1) was created as a standalone software tool from the
RE component of ed(1) by Ken Thompson (q.v. Thompson's
Algorithm) for Unix v4 (1973); sed(1) was authored by Lee M.
McMahon for Unix v7 (1978). POSIX (and other standards)
and RFCs were then mostly a promise for the future.
> > echo foo | awk '{gsub(/foo|bar/, "foobar");print}'
> > echo foo | perl -pe 's/foo|bar/foobar/'
As efficient as awk(1) is, it has a hundred times the parsing
overhead (excluding the building of the DFAs) as sed(1), and
perl(1) has up to ten times even that:
"Timing Trials, or, the Trials of Timing: Experiments with
Scripting and User-Interface Languages"
http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html
Actually, it is quite easy to simulate the logical ANDs and
ORs of extended regular expression alternations.
# simulate OR across arbitrary number of expressions
# (not tested)
/foo/b do
/bar/b do
b dont
: do
s///foobar/
: dont
# simulate AND across arbitrary number of expressions
# (not tested)
/foo/!b dont
/bar/!b dont
s///foobar/
: dont
The above is the general solution; applicable cases may
use "/foo.*bar/{...}" and/or "/bar.*foo/{..}" or the "t" command
(RTFM) of sed(1). Also, Boole's Law -- which states that
a || b is equivalent to !a && !b -- is of course always applicable
to the control logic of such code.
=Brian