Simple regexp : \s or \S

moo...@yahoo.co.uk

unread,

Jun 26, 2007, 8:10:51 AM6/26/07

to

Hi,

I have a small regexp issue that I hope someone can clarify. I am
attempting to use a regular expression that identifies lines that
contain only comments.

e.g.
set a { # Comment Only (with some spaces)}
set b { Stuff # Comment at end of line}

If I use

% regexp {^\s*#} $a
1
% regexp {^\s*#} $b
0

The question is, why does the following not work

% regexp {\S*#} $b
1

man re_syntax implies that \S* should be equivilant to ^\s*.

Clarification greatly appreciated.

Steven

p.s. I assume that it would be easier using the string commands?

Jonathan Bromley

unread,

Jun 26, 2007, 8:41:58 AM6/26/07

to

On Tue, 26 Jun 2007 05:10:51 -0700, moo...@yahoo.co.uk wrote:

>man re_syntax implies that \S* should be equivilant to ^\s*.

I suspect you're getting confused between ^ used to negate
a character class, and ^ used to mean "start of string".

The RE [^abc] means "any character that's not in the set [abc]".
So [^\s] is indeed the same as \S - it means "any character
that's not whitespace". But ^\s means something quite different -
it matches a single whitespace character at the start of the string.
If your string begins with something other than whitespace, it
simply won't match.

>The question is, why does the following not work
>% regexp {\S*#} $b

The "negated" form you were looking for should probably be

regexp {^[^\S]*#}

but, since \S is itself a shorthand for [^\s], this is illegal;
in any case, it's kinda silly :-)
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan...@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

moo...@yahoo.co.uk

unread,

Jun 26, 2007, 10:04:46 AM6/26/07

to

On 26 Jun, 14:41, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

> On Tue, 26 Jun 2007 05:10:51 -0700, moo...@yahoo.co.uk wrote:
> >man re_syntax implies that \S* should be equivilant to ^\s*.
>
> I suspect you're getting confused between ^ used to negate
> a character class, and ^ used to mean "start of string".
>
> The RE [^abc] means "any character that's not in the set [abc]".
> So [^\s] is indeed the same as \S - it means "any character
> that's not whitespace". But ^\s means something quite different -
> it matches a single whitespace character at the start of the string.
> If your string begins with something other than whitespace, it
> simply won't match.
>
> >The question is, why does the following not work
> >% regexp {\S*#} $b
>
> The "negated" form you were looking for should probably be
>
> regexp {^[^\S]*#}
>
> but, since \S is itself a shorthand for [^\s], this is illegal;
> in any case, it's kinda silly :-)
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK

> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.

Thanks Jonathan,

You are correct in that this was the source of my confusion. Having re-
read the man page, it starts to make sense

Steven