regex question

Jim Green

unread,

Dec 22, 2009, 12:49:35 AM12/22/09

to begi...@perl.org

Hi,

I have a text file with lines like this

� Experience in C/C++ realtime system programming

� Experience in ACE, FIX

Could anybody tell me which regex to use to get rid of the dot and the
leading spaces before each Line?

Thanks for any help!

Jim

Uri Guttman

unread,

Dec 22, 2009, 12:53:33 AM12/22/09

to Jim Green, begi...@perl.org

>>>>> "JG" == Jim Green <student.no...@gmail.com> writes:

JG> I have a text file with lines like this

JG> · Experience in C/C++ realtime system programming

JG> · Experience in ACE, FIX

JG> Could anybody tell me which regex to use to get rid of the dot and the
JG> leading spaces before each Line?

what have you tried? do you have any code at all to show? this list
isn't for coding for you but to help perl beginners. this is a fairly
easy problem so why don't you do a basic s/// op on a line, anchor it to
the beginning and try to match what you don't want and replace it with
a null string. i wrote it in english so you have to translate that to
perl.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Jim Gibson

unread,

Dec 22, 2009, 1:16:13 AM12/22/09

to begi...@perl.org

At 11:49 PM -0600 12/21/09, Jim Green wrote:
>Hi,
>
>I have a text file with lines like this
>
>
>· Experience in C/C++ realtime system programming
>
>· Experience in ACE, FIX
>
>
>Could anybody tell me which regex to use to get rid of the dot and the
>leading spaces before each Line?

s/^\.\s*//;

Jim Green

unread,

Dec 22, 2009, 1:04:01 AM12/22/09

to Uri Guttman, begi...@perl.org

2009/12/21 Uri Guttman <u...@stemsystems.com>:

>>>>>> "JG" == Jim Green <student.no...@gmail.com> writes:
>
> JG> I have a text file with lines like this
>
>
> JG> · Experience in C/C++ realtime system programming
>
> JG> · Experience in ACE, FIX
>
>
> JG> Could anybody tell me which regex to use to get rid of the dot and the
> JG> leading spaces before each Line?
>
> what have you tried? do you have any code at all to show? this list
> isn't for coding for you but to help perl beginners. this is a fairly
> easy problem so why don't you do a basic s/// op on a line, anchor it to
> the beginning and try to match what you don't want and replace it with
> a null string. i wrote it in english so you have to translate that to
> perl.

I am reading "learning perl" but have not proceeded to regex chapters.
I googled and got to know hot to delete preceding white space. but now
the problem is that there is an odd dot at the beginning...
I don't know how to write the pattern in s/pattern//, so a regex for
that will do. Sorry if it sounds stupid, anyway, Thanks for the help!

Jim

Jim Green

unread,

Dec 22, 2009, 1:23:18 AM12/22/09

to begi...@perl.org

2009/12/22 Jim Gibson <jimsg...@gmail.com>:
> s/^\.\s*//;

Thanks Jim, I will figure out as I read "learning perl".
>
>
> --
> To unsubscribe, e-mail: beginners-...@perl.org
> For additional commands, e-mail: beginne...@perl.org
> http://learn.perl.org/
>
>
>

Message has been deleted

Parag Kalra

unread,

Dec 22, 2009, 7:45:58 AM12/22/09

to Jim Green, begi...@perl.org

You can try following:

$_ =~ s/^\.(\s)+//g;

Cheers,
Parag

On Tue, Dec 22, 2009 at 10:59 AM, Jim Green <zhang.z...@gmail.com>wrote:

> Hi,

>
> I have a text file with lines like this
>
>

> · Experience in C/C++ realtime system programming
>

> · Experience in ACE, FIX

>
>
> Could anybody tell me which regex to use to get rid of the dot and the

> leading spaces before each Line?
>

> Thanks for any help!
>
> Jim
>
>
>

Philip Potter

unread,

Dec 22, 2009, 9:47:07 AM12/22/09

to Parag Kalra, Jim Green, begi...@perl.org

2009/12/22 Parag Kalra <parag...@gmail.com>:

> You can try following:
>
> $_ =~ s/^\.(\s)+//g;

This isn't quite right.

There are two ways in which you might use this substitution: either $_
will contain a single line, or it will contain multiple lines. The
single line case might look something like this:

while (<>) {
s/^\.(\s)+//g;
print;
}

In this case, however, the /g modifier is redundant and confusing,
since you only want the regex to match once at the start of each line.
Take the /g modifier off.

The multiple line case might look like this:

$_ = do {local $/; <>;}; # but better to use File::Slurp
s/^\.(\s)+//g;
print;

This code is broken. The problem is that ^ matches start-of-string,
not start-of-line. Therefore, even with the /g modifier the regex can
only match at the beginning of the string, and so will only match
once.

To change ^ to match start-of-line, you need the /m modifier:

s/^\.(\s)+//gm;
print;

Perl Best Practices makes the interesting recommendation that all
regexes should use the /m modifier, because having ^$ match line
boundaries seems to be what most programmers expect and seems to be
useful much more often. If you still need to match start and end of
string you can use \A and \z.

Phil

Parag Kalra

unread,

Dec 22, 2009, 10:13:57 AM12/22/09

to Philip Potter, Jim Green, begi...@perl.org

Thanks Philip for sharing this excellent piece of information.

Cheers,
Parag

Shawn H Corey

unread,

Dec 22, 2009, 10:04:49 AM12/22/09

to Philip Potter, Parag Kalra, Jim Green, begi...@perl.org

I should point out that this may not work. Text in question seems to be
UTF-8. If so, this should be added before any input:

use utf8;
binmode ARGV, ':encoding(utf8)';

--
Just my 0.00000002 million dollars worth,
Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

John W. Krahn

unread,

Dec 22, 2009, 12:22:20 PM12/22/09

to Perl Beginners

Parag Kalra wrote:
> You can try following:
>
> $_ =~ s/^\.(\s)+//g;

You are using capturing parentheses but you are not using the results of
that capture anywhere so why use them? You are using capturing
parentheses in LIST context so even if there are multiple whitespace
characters you are only capturing ONE.

If you are using the parentheses for grouping then it would be more
efficient to use non-capturing parentheses instead, but grouping only
makes sense if you have a GROUP of characters to match, but you only
have ONE.

The use of the /g option is superfluous because the pattern is anchored
to the beginning of the string and there is only ONE beginning of string
in any string.

John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway