Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parenthesis brackets in a regular expression

5 views
Skip to first unread message

Mark Hobley

unread,
Aug 31, 2008, 3:18:20 PM8/31/08
to
I was expecting to be able to strip whitespace from the beginning and
end of a line using a regular expression within the stream editor as
follows:

foostring=`echo "$foostring" | sed 's/(^[ \t]*)|([ \t]*$)//g'`

This does not appear to work. I am just curious why. The parenthesis
brackets seem to be a proble. I try the following:

foostring=`echo "$foostring" | sed 's/(^[ \t]*)//g'`

This does not strip any characters, but removing the brackets strips
whitespace from the beginning of the line.

foostring=`echo "$foostring" | sed 's/^[ \t]*//g'` # this works

I try to strip the beginning and ending whitespace without brackets:

foostring=`echo "$foostring" | sed 's/^[ \t]*|[ \t]*$//g'`

This does not work. However, the following does:

foostring=`echo "$foostring" | sed 's/^[ \t]*//;s/[ \t]*$//'` # this works

What do you make of all this?

Mark.

--
Mark Hobley,
393 Quinton Road West,
Quinton, BIRMINGHAM.
B32 1QE.

guen...@gmail.com

unread,
Aug 31, 2008, 6:22:11 PM8/31/08
to
On Aug 31, 1:18 pm, markhob...@hotpop.donottypethisbit.com (Mark

Hobley) wrote:
> I was expecting to be able to strip whitespace from the beginning
> and end of a line using a regular expression within the stream
> editor as follows:
>
> foostring=`echo "$foostring" | sed 's/(^[ \t]*)|([ \t]*$)//g'`
>
> This does not appear to work. I am just curious why. The
> parenthesis brackets seem to be a proble.

sed uses BREs (Basic Regular Expressions), not EREs (Extended Regular
Expressions). There are several differences between BREs and EREs;
the relevant ones here are that in former, parens are only special if
they are escaped, alternation ("|") isn't supported, and '^' and '$'
are only required to behave as anchors when at the very beginning and
end (respectively) of the entire pattern.

So, what you specified may be treated as matching the string
consisting of literal open paren, a literal circumflex, zero or more
spaces and tabs, a literal close paren, a literal vertical bar, etc...

On the systems that I'm familiar with, the 'sed' manpage documents
this behavior.

Oh, and I've been assuming that when you write "[ \t]" that you mean
"literal space and tab characters between square brackets". The
meaning of a literal "\t" in a sed expression is left undefined by the
standard and is not portable.


<elided>

> This does not work. However, the following does:
>
> foostring=`echo "$foostring" | sed 's/^[ \t]*//;s/[ \t]*$//'`
> # this works
>
> What do you make of all this?

That the portable way to do what you want is either the last sed
command you gave above, or the equivalent version using two -e
options: sed -e 's/^[ \t]*//' -e 's/[ \t]*$//'

(Again, using real tabs and not literal "\t")


Philip Guenther

Wayne C. Morris

unread,
Aug 31, 2008, 9:05:47 PM8/31/08
to
In article <sc7ro5-...@neptune.markhobley.yi.org>,
markh...@hotpop.donottypethisbit.com (Mark Hobley) wrote:

> I was expecting to be able to strip whitespace from the beginning and
> end of a line using a regular expression within the stream editor as
> follows:
>
> foostring=`echo "$foostring" | sed 's/(^[ \t]*)|([ \t]*$)//g'`
>
> This does not appear to work. I am just curious why. The parenthesis
> brackets seem to be a proble.

It's not the parenthesis, it's the "|" in your regular expression. By
default sed uses basic regular expressions in which "|" is an ordinary
character.

Use the -E option to tell sed to use extended regular expressions.

Scott Lurndal

unread,
Sep 3, 2008, 2:41:40 PM9/3/08
to

This should also work:

sed -e "s/^[ \t]*\(.*\)[ \t]*$/\1/"

(real tabs)

scott

guen...@gmail.com

unread,
Sep 4, 2008, 12:20:13 AM9/4/08
to
On Sep 3, 11:41 am, sc...@slp53.sl.home (Scott Lurndal) wrote:
> "guent...@gmail.com" <guent...@gmail.com> writes:
...

> >That the portable way to do what you want is either the last sed
> >command you gave above, or the equivalent version using two -e
> >options: sed -e 's/^[ \t]*//' -e 's/[ \t]*$//'
...

> This should also work:
>
> sed -e "s/^[ \t]*\(.*\)[ \t]*$/\1/"

Nope, that fails to trim trailing whitespace: the ".*" always matches
to the end of the line and the "[ \t]*" never matches anything. To do
it with one expression you have to force the capturing subexpression
to end with a non-whitespace character, if it matches anything at all,
ala:
sed -e "s/^[ \t]*\(\(.*[^ \t]\)*\)[ \t]*$/\1/"

(\t == a real tab)

...but I would rather do the two expression version, because it's
clearer what the intent is.


Philip Guenther

0 new messages