Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

backreferences when using sed in a backquoted command

0 views
Skip to first unread message

jond...@gmail.com

unread,
Jul 21, 2008, 2:22:56 PM7/21/08
to
Can someone explain why the two following commands have different
outputs, and how to change the second so that it behaves the same as
the first (I'm using a cygwin bash shell)?

$ echo dogdog | sed -r s/\(.*\)\\1/cat/
cat

$ a=`echo dogdog | sed -r s/\(.*\)\\1/cat/`; echo $a
dogdog

Dan Stromberg

unread,
Jul 21, 2008, 4:28:37 PM7/21/08
to

I believe this is because the backquotes are going to send what's between
them into another level of shell processing (which includes quoting
stripping), so you need to add in another layer of backslashes - The
pattern of how many backslashes you need for a given number of levels of
stripping is 0, 1, 3, 7, 15... IOW, powers of 2 minus 1.

$ a=`echo dogdog | sed -r s/\\(.*\)\\\\1/cat/`; echo $a

cat

Dave B

unread,
Jul 21, 2008, 5:40:47 PM7/21/08
to
jond...@gmail.com wrote:

I think cygwin can quote strings, so just do

$ echo dogdog | sed -r 's/(.*)\1/cat/'
cat

and

$ a=$(echo dogdog | sed -r 's/(.*)\1/cat/'); echo "$a"
cat

--
echo 0|sed 's909=oO#3u)o19;s0#0ooo)].O0;s()(0bu}=(;s#}#.1m"?0^2{#;
s)")9v2@3%"9$);so%op]t(p$e#!o;sz(z^+.z;su+ur!z"au;sxzxd?_{h)cx;:b;
s/\(\(.\).\)\(\(..\)*\)\(\(.\).\)\(\(..\)*#.*\6.*\2.*\)/\5\3\1\7/;
tb'|awk '{while((i+=2)<=length($1)-18)a=a substr($1,i,1);print a}'

jond...@gmail.com

unread,
Jul 21, 2008, 5:46:18 PM7/21/08
to

Thanks! It's strange that a variable number of backslashes works. Any
number from 3 to 6 of them seems to work for the backreference, and
either 1 or 2 works for the parentheses in the grouping operators. Is
this due to some kind of ambiguity in the lexical analysis when the
backquotes feed it to the other shell?

Also, if the escape backslashes are stripped during that lexical
analysis, do you know why one backslash still suffices for the parens
(why isn't it stripped as well)?

Michael Paoli

unread,
Jul 22, 2008, 10:41:32 AM7/22/08
to

Going through it a bit at a time ...
sed -r
Can probably drop the -r option, with a slight change to the sed
script,
unless there's good reason to use the -r option.

It's generally good defensive programming to use the -e option,
so there's no ambiguity to sed (or reader of the program) that the
option argument to -e is sed expression to be executed.

Unless one wants the sed script subject to globbing, paramater
substitution, etc., it's generally prudent to quote it, e.g. typically
the whole thing in single quotes - otherwise one may not get the
results
and behavior one expects.

So, applying what I suggested, let's try a bit of a rewrite, from:


echo dogdog | sed -r s/\(.*\)\\1/cat/

to:
echo dogdog | sed -e 's/\(.*\)\1/cat/'
It's a bit easier to see what the above does,
most notably, the part within single quotes is passed literally to
sed.
In the prior case:


echo dogdog | sed -r s/\(.*\)\\1/cat/

The sed script is first interpreted by the shell, resulting in:
s/(.*)\1/cat/
as the argument passed to the shell ... if the glob pattern in fact
matches nothing by the shell, and default shell globbing behavior is
in
effect.
So, we could get some rather unexpected behavior in the earlier case,
e.g.:


$ echo dogdog | sed -r s/\(.*\)\\1/cat/
cat

$ mkdir -p 's/(.at)\1/cat'


$ echo dogdog | sed -r s/\(.*\)\\1/cat/

dogdog
$ echo ratrat | sed -r s/\(.*\)\\1/cat/
cat
$ ls -d s/*/cat/
s/(.at)\1/cat/
To better see what happens here:
$ set -xv; echo dogdog | sed -r s/\(.*\)\\1/cat/; set -xv
+ echo dogdog
+ sed -r 's/(.at)\1/cat/'
dogdog
+ set -xv
$ rmdir 's/(.at)\1/cat' 's/(.at)\1' s

Now, back to the original question:


> $ echo dogdog | sed -r s/\(.*\)\\1/cat/

vs.


> $ a=`echo dogdog | sed -r s/\(.*\)\\1/cat/`; echo $a

Within the command substitution (``), unless quoted, the shell
continues
to interpret and parse the contents, so - again if * doesn't match
shell
glob and default globbing is in place:
`echo dogdog | sed -r s/\(.*\)\\1/cat/`
becomes the to-be-executed shell command:


echo dogdog | sed -r s/(.*)\1/cat/

which is then interpreted again by the shell, with the sed script
becoming:
s/(.*)1/cat/
Working backwards, if one wanted that to be, option and literal script
for sed:
-r s/(.*)\1/cat/
then for the shell, that could be:
-r 's/(.*)\1/cat/'
or:
-r s/\(.\*\)\\1/cat/
and going back one more layer,:
-r 's/(.*)\1/cat/'
or:
-r s/\(.\*\)\\\\1/cat/
So, back to our original, but with new sed script, and preusming we
don't
want the shell to glob the * here:
$ a=`echo dogdog | sed -r 's/(.*)\1/cat/'`; echo $a
cat
$ a=`echo dogdog | sed -r s/\(.\*\)\\\\1/cat/`; echo $a
cat
or as I might suggest:
$ a=`echo dogdog | sed -e 's/\(.*\)\1/cat/'`; echo $a
cat
We can also see fairly easily what happens with the original:
set -xv; a=`echo dogdog | sed -r s/\(.*\)\\1/cat/`; echo $a


echo dogdog | sed -r s/\(.*\)\1/cat/

++ echo dogdog
++ sed -r 's/(.*)1/cat/'
+ a=dogdog
+ echo dogdog
dogdog
Notice the original \1 is stripped down to 1 long before sed executes
that as part of the sed script.

Maxwell Lol

unread,
Jul 22, 2008, 4:03:53 PM7/22/08
to
jond...@gmail.com writes:

> Can someone explain why the two following commands have different
> outputs, and how to change the second so that it behaves the same as
> the first (I'm using a cygwin bash shell)?
>
> $ echo dogdog | sed -r s/\(.*\)\\1/cat/

You have to learn that what you type is not what sed (or whatever) sees.
It's what the shell sees.

To debug this, use echo. This lets you see what the sed program sees:

echo sed -r s/\(.*\)\\1/cat/

and this returns

sed -r s/(.*)\1/cat/

but if you type
echo `echo sed -r s/\(.*\)\\1/cat/`
you get
sed -r s/(.*)1/cat/

So the first \ gets past the shell, but when inside `...` you
effectively have two shells, and the second doesn't see any \ characters

So when you try

echo `echo sed -r s/\(.*\)\\\1/cat/`
This returns
sed -r s/(.*)\1/cat/

And that is what you want.

0 new messages