Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NEED: Fast, Fast string trim()

33 views
Skip to first unread message

Kevin Hawley

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

Hi, I need a trim() function that will take a string and remove the
white space from the front and the rear of the string, but preserve
white space within the string. I will be processing lots of strings,
so I need something efficient, anybody have a good suggestion?
TIA

Here is what I have so far: (I'm hoping for a one line solution)

$string=" White Space ";
printf ("Before++%s++ After++%s++\n",$string, &trim ($string));

$string="NO_White_Space";
printf ("Before++%s++ After++%s++\n",$string, &trim ($string));

sub trim {
local ($s = @_[0]);
$s = &rtrim($s);
$s = &ltrim($s);
return $s;
}

sub ltrim {
local ($str = @_[0]);
if($str =~ /\s+(.*)/) {
return $1;
}else {
return $str;
}
}

sub rtrim {
local ($str = @_[0]);
local ($ret = @_[0]);
$str = reverse($str);
if($str =~ /\s+(.*)/) {
$ret = reverse ($1);
}
return ($ret);
}

Mike Stok

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

In article <34953A3A...@ARCO.com>, Kevin Hawley <cch...@ARCO.com> wrote:
>Hi, I need a trim() function that will take a string and remove the
> white space from the front and the rear of the string, but preserve
> white space within the string. I will be processing lots of strings,
> so I need something efficient, anybody have a good suggestion?

You might want to look at the FAQ at http://www.perl.com and find the
appropriate Q & A.

If you're really interested in finding out where time's going then you can
use the Benchmark module which comes with recent distributions to time
various solutions. I usually consider

$string =~ s/^\s+//;
$string =~ s/\s+$//;

to be short enough to use inline and avoid the overhead of subroutine
calls, and clear enough to mean something to a maintainer in the future
(assuming they know perl basics.)

Hope this helps,

Mike
--
mi...@stok.co.uk | The "`Stok' disclaimers" apply.
http://www.stok.co.uk/~mike/ | PGP fingerprint FE 56 4D 7D 42 1A 4A 9C
http://www.tiac.net/users/stok/ | 65 F3 3F 1D 27 22 B7 41
st...@colltech.com | Collective Technologies (work)

Ken Holm

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to Kevin Hawley

Kevin Hawley wrote:
>
> Hi, I need a trim() function that will take a string and remove the
> white space from the front and the rear of the string, but preserve
> white space within the string. I will be processing lots of strings,
> so I need something efficient, anybody have a good suggestion?
> TIA
>
> Here is what I have so far: (I'm hoping for a one line solution)
>
> $string=" White Space ";

No need for trim().

$string =~ s/^\s+|\s+$//g;

perl5 -e '$A = " THIS is a test . "; $A =~ s/^\s+|\s+$//g;print
"[$A]\n\n"'

yields

[THIS is a test .]

-K
--
Kennneth A Holm | META 3 - Webmaster |webm...@meta3.com
PO Box 1508 |----------------------------------|(601)948.3399 x 227
Jackson, MS 39215|PGP Key finger webm...@meta3.com|(601)948.5999 (fax)

Mark S. Reibert

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

Mike Stok wrote:

> In article <34953A3A...@ARCO.com>, Kevin Hawley <cch...@ARCO.com> wrote:
> >Hi, I need a trim() function that will take a string and remove the
> > white space from the front and the rear of the string, but preserve
> > white space within the string. I will be processing lots of strings,
> > so I need something efficient, anybody have a good suggestion?
>

> If you're really interested in finding out where time's going then you can
> use the Benchmark module which comes with recent distributions to time
> various solutions. I usually consider
>
> $string =~ s/^\s+//;
> $string =~ s/\s+$//;
>
> to be short enough to use inline and avoid the overhead of subroutine
> calls, and clear enough to mean something to a maintainer in the future
> (assuming they know perl basics.)

I usually shorten this to

$string =~ s/^\s*(.*?)\s*$/$1/;

I don't know if this executes any faster than your option, which is admittedly more
easy to understand, but it offers an alternative for those of us who like nice
short cryptic one-liners!

Mark Reibert

-----------------------------
Mark S. Reibert, Ph.D.

Mystech Associates, Inc.
3233 East Brookwood Court
Phoenix, Arizona 85044

Tel: (602) 732-3752
Fax: (602) 706-5120
E-mail: rei...@mystech.com
-----------------------------

Matthew Cravit

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

In article <34956B42...@mystech.com>,
Mark S. Reibert <rei...@mystech.com> wrote:
>Mike Stok wrote:

>> $string =~ s/^\s+//;
>> $string =~ s/\s+$//;
>

>I usually shorten this to
>
>$string =~ s/^\s*(.*?)\s*$/$1/;
>
>I don't know if this executes any faster than your option,

Well, let's find out. :)

use Benchmark;

$stringone = "This is a test" x 100;
$stringtwo = "This is a test" x 100;

sub TwoExp {
$_[0] =~ s/^\s+//;
$_[0] =~ s/\s+$//;
$_[0];
}

sub OneExp {
$_[0] =~ s/^\s*(.*?)\s*$/$1/;
}

timethese(1000, {
'OneExp' => sub { &OneExp($stringone); },
'TwoExp' => sub { &TwoExp($stringtwo); },
});

produces the following results:

Benchmark: timing 1000 iterations of OneExp, TwoExp...
OneExp: 4 secs ( 3.76 usr 0.00 sys = 3.76 cpu)
TwoExp: 1 secs ( 0.99 usr 0.00 sys = 0.99 cpu)

So, your version executes almost 4 times slower than the other. I suspect
this is due to the backreference ($1), but I'm not sure. For what it's worth,
I've also tried optimizing the pair of s/// expressions to one like this:

$foo =~ s/(^\s+)|(\s+$)//;

and found that to also be slower than execting the two expressions one at a
time.

/MC

--
Matthew Cravit, N9VWG | Experience is what allows you to
E-mail: mcr...@best.com (home) | recognize a mistake the second
mcr...@taos.com (work) | time you make it.

Mark S. Reibert

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

> Benchmark: timing 1000 iterations of OneExp, TwoExp...
> OneExp: 4 secs ( 3.76 usr 0.00 sys = 3.76 cpu)
> TwoExp: 1 secs ( 0.99 usr 0.00 sys = 0.99 cpu)
>
> So, your version executes almost 4 times slower than the other.

Nice work! It looks like I'll be using the two-op version from now on!

Chipmunk

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

Matthew Cravit wrote:
>
> In article <34956B42...@mystech.com>,
> Mark S. Reibert <rei...@mystech.com> wrote:
> >Mike Stok wrote:
>
> >> $string =~ s/^\s+//;
> >> $string =~ s/\s+$//;
> >
> >I usually shorten this to
> >
> >$string =~ s/^\s*(.*?)\s*$/$1/;
> >
> >I don't know if this executes any faster than your option,
>
> Benchmark: timing 1000 iterations of OneExp, TwoExp...
> OneExp: 4 secs ( 3.76 usr 0.00 sys = 3.76 cpu)
> TwoExp: 1 secs ( 0.99 usr 0.00 sys = 0.99 cpu)
>
> So, your version executes almost 4 times slower than the other. I suspect
> this is due to the backreference ($1), but I'm not sure.

I'd bet that the nongreedy quantifier plays a key part as well.

After each time (.*?) matches an additional character, the regexp engine
satisfies the nongreediness by seeing if the rest of the regexp can match
at that point. That means it must leave and reenter the parentheses for
every character in the test string.

Chipmunk

Aaron Harsh

unread,
Dec 15, 1997, 3:00:00 AM12/15/97
to

Matthew Cravit wrote in message <673se4$307$1...@shell3.ba.best.com>...

>In article <34956B42...@mystech.com>,
>Mark S. Reibert <rei...@mystech.com> wrote:
>>Mike Stok wrote:
>
>>> $string =~ s/^\s+//;
>>> $string =~ s/\s+$//;
>>
>>I usually shorten this to
>>
>>$string =~ s/^\s*(.*?)\s*$/$1/;
> ...

> Benchmark: timing 1000 iterations of OneExp, TwoExp...
> OneExp: 4 secs ( 3.76 usr 0.00 sys = 3.76 cpu)
> TwoExp: 1 secs ( 0.99 usr 0.00 sys = 0.99 cpu)
>
>So, your version executes almost 4 times slower than the other.

Someone else pointed out the one-liner is slowed down by the nongreedy
quantifier. Here's a one-liner that's 4 times faster on Matthew's test (but
slower on smaller strings), and keeps the backreference:

$string =~ s/^\s*(|.*\S)?\s*$/$1/;

This looks even more cryptic than the original one-liner, so Mark should be
happy :-)

Aaron Harsh
a...@rtk.com

Mark S. Reibert

unread,
Dec 16, 1997, 3:00:00 AM12/16/97
to

Aaron Harsh wrote:

> Someone else pointed out the one-liner is slowed down by the nongreedy
> quantifier. Here's a one-liner that's 4 times faster on Matthew's test (but
> slower on smaller strings), and keeps the backreference:
>
> $string =~ s/^\s*(|.*\S)?\s*$/$1/;
>
> This looks even more cryptic than the original one-liner, so Mark should be
> happy :-)

I like it! This is consistent with the minimal-match slowness idea, since you
are using a maximal match here.

Andrew Johnson

unread,
Dec 16, 1997, 3:00:00 AM12/16/97
to

Tony Bowden wrote:
>
> Aaron Harsh (a...@rtk.com) wrote:
> : $string =~ s/^\s*(|.*\S)?\s*$/$1/;

> : This looks even more cryptic than the original one-liner, so Mark should be
> : happy :-)
>
> Excellent ...
>
> Now, what's the fastest way of removing all unwanted whitespace, which
> means triming all leading and trailing spaces, _and_ trimming multiple
> whitespace down to single spaces? Can this be done easily in one pass?

here's a quickie one pass version (quickie in terms of thinking it up, not
running time) ---should be easy to improve upon:

$_=' blah blah blah blah ';
s/(^\s+|\s+$)|(\s+)/$1?'':' '/eg;
print;

regards
andrew

Aaron Harsh

unread,
Dec 16, 1997, 3:00:00 AM12/16/97
to

Andrew Johnson wrote in message <3496BE6C...@gpu.srv.ualberta.ca>...


>Tony Bowden wrote:
>> Now, what's the fastest way of removing all unwanted whitespace, which
>> means triming all leading and trailing spaces, _and_ trimming multiple
>> whitespace down to single spaces? Can this be done easily in one pass?
>
>here's a quickie one pass version (quickie in terms of thinking it up, not
>running time) ---should be easy to improve upon:
>
>$_=' blah blah blah blah ';
>s/(^\s+|\s+$)|(\s+)/$1?'':' '/eg;
>print;


Is it cheating to do some of it outside a regex? Here's my entry, which is
about four times faster than Andrew's (to run, not to type :-):

$_ = join(" ", /\S+/g);

Aaron Harsh

unread,
Dec 16, 1997, 3:00:00 AM12/16/97
to

Aaron Harsh wrote in message <676mfm$hl3$1...@brokaw.wa.com>...

>Is it cheating to do some of it outside a regex? Here's my entry, which is
>about four times faster than Andrew's (to run, not to type :-):
>
>$_ = join(" ", /\S+/g);


Hmm.. This can be optimized for the likely case of lots of words with
single-spaces between them, which doubles the speed on the original test:

$_ = join(" ", /\S+(?: \S+)*/g);

and has the side effect of making the line a little less legible (I'm not
going to make a judgement about whether this is a good thing or a bad thing
:-) Probably about as good as responding to your own posts).

Aaron Harsh
a...@rtk.com

Andrew Johnson

unread,
Dec 16, 1997, 3:00:00 AM12/16/97
to

Aaron Harsh wrote:
[snip]

>
> Is it cheating to do some of it outside a regex? Here's my entry, which is
> about four times faster than Andrew's (to run, not to type :-):
>
> $_ = join(" ", /\S+/g);

cheating?? guess it depends on what the poster meant
by doing it in 'one-pass' ... its certainly an improvement...
and in the same vein, here's a slight adjustment:

$_=join(' ',split);

regards
andrew

Charles DeRykus

unread,
Dec 17, 1997, 3:00:00 AM12/17/97
to

In article <88228944...@sparc.tibus.net>,

Tony Bowden <to...@crux.blackstar.co.uk> wrote:
> Aaron Harsh (a...@rtk.com) wrote:
> : $string =~ s/^\s*(|.*\S)?\s*$/$1/;
> : This looks even more cryptic than the original one-liner, so Mark should be
> : happy :-)
>
> Excellent ...
>
> Now, what's the fastest way of removing all unwanted whitespace, which
> means triming all leading and trailing spaces, _and_ trimming multiple
> whitespace down to single spaces? Can this be done easily in one pass?
>

This's fast but, alas, not very cryptic :)


join " ",split(" ",$string);


Cheers,
--
Charles DeRykus

Chipmunk

unread,
Dec 17, 1997, 3:00:00 AM12/17/97
to

Aaron Harsh wrote:
>
> Aaron Harsh wrote in message <676mfm$hl3$1...@brokaw.wa.com>...
> >Is it cheating to do some of it outside a regex? Here's my entry, which is
> >about four times faster than Andrew's (to run, not to type :-):
> >
> >$_ = join(" ", /\S+/g);
>
> Hmm.. This can be optimized for the likely case of lots of words with
> single-spaces between them, which doubles the speed on the original test:
>
> $_ = join(" ", /\S+(?: \S+)*/g);
>
> and has the side effect of making the line a little less legible

This appears to be at least as fast as either of your solutions:

$_ = join(" ", split);

Chipmunk

Matz Kindahl

unread,
Dec 17, 1997, 3:00:00 AM12/17/97
to

Chipmunk <r...@coos.dartmouth.edu> writes:

Maybe I've misunderstood something, but if you don't have to use
regular expressions, I'd write something along the following lines.

$_ = " $_ "; tr/ \n\r\t/ /s; substr($_,0,1) = substr($_,-1,1) = "";

I belive it's faster than splitting into a list and then joining the
list.

--
Mats Kindahl ! mat...@docs.uu.se
Department of Computer Systems ! mat...@acm.org
Box 325 ! Tel +46 18 18 10 66
S-751 05 Uppsala ! Fax +46 18 55 02 25
SWEDEN ! URL http://www.docs.uu.se/~matkin/

PGP Key fingerprint = 92 5C FC 39 32 A8 7F 91 01 56 A0 D3 9C A9 6C 81
PGP key available under finger mat...@kay.docs.uu.se

"People do strange things when you give them money."
-- Simple Minds

Matz Kindahl

unread,
Dec 18, 1997, 3:00:00 AM12/18/97
to

to...@crux.blackstar.co.uk (Tony Bowden) writes:

> Re: removing 'unwanted' space:
>
> Summary of options to date:
>
> 1) $_ = join(" ", /\S+/g);
> 2) $_ = join(" ", /\S+(?: \S+)*/g); #optimized for single spaces between words
> 3) $_ = join(" ", split);
> 4) $_ = " $_ "; tr/ \n\r\t/ /s; substr($_,0,1) = substr($_,-1,1) = "";
>
> For short strings:
> Benchmark: timing 50000 iterations of Exp1, Exp2, Exp3, Exp4...
> Exp1: 2 secs ( 2.23 usr 0.00 sys = 2.23 cpu)
> Exp2: 2 secs ( 2.22 usr 0.00 sys = 2.22 cpu)
> Exp3: 0 secs ( 1.82 usr 0.00 sys = 1.82 cpu)
> Exp4: 0 secs ( 1.72 usr 0.00 sys = 1.72 cpu)
>
> For long strings:
> Benchmark: timing 5000 iterations of Exp1, Exp2, Exp3, Exp4...
> Exp1: 12 secs (10.92 usr 0.00 sys = 10.92 cpu)
> Exp2: 11 secs (10.91 usr 0.00 sys = 10.91 cpu)
> Exp3: 9 secs ( 8.84 usr 0.00 sys = 8.84 cpu)
> Exp4: 1 secs ( 0.96 usr 0.00 sys = 0.96 cpu)
>
> So option 4 is better by quite a long shot...
>
> Any advance?

Yup!

I tried the following version; seems to be ~25% faster than my
previous version.

5) $_ = " $_ "; tr/ \n\r\t/ /s; $_ = substr($_,1,-1)

--
Mats Kindahl ! mat...@docs.uu.se
Department of Computer Systems !

Matz Kindahl

unread,
Dec 18, 1997, 3:00:00 AM12/18/97
to

to...@crux.blackstar.co.uk (Tony Bowden) writes:

> Matz Kindahl (mat...@Owein.DoCS.UU.SE) wrote:
> : > Any advance?
>
> : I tried the following version; seems to be ~25% faster than my


> : previous version.
> : 5) $_ = " $_ "; tr/ \n\r\t/ /s; $_ = substr($_,1,-1)
>

> On short strings:
> Benchmark: timing 50000 iterations of Exp1, Exp2...
> Exp1: 2 secs ( 1.69 usr 0.00 sys = 1.69 cpu)
> Exp2: 1 secs ( 1.21 usr 0.00 sys = 1.21 cpu)

Hmmm.... ~28% reduction. ;/

> On long strings:
> Benchmark: timing 50000 iterations of Exp1, Exp2...
> Exp1: 10 secs ( 9.44 usr 0.00 sys = 9.44 cpu)
> Exp2: 10 secs ( 9.25 usr 0.00 sys = 9.25 cpu)

Hmmmm.... ~2% reduction. :(

> A nice advance, but unfortunately not that noticable on long strings.

Seems my test string wasn't long enough. :(

Well, at least I can comfort myself with that it certainly was more
readable. ;)

> Still the best to date though ;)
>
> Thanks

You're welcome. :)

Regards,

0 new messages