Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Regex Match very slow...

51 views

Skip to first unread message

Tom

unread,

May 24, 2016, 8:11:49 AM5/24/16

Hi,

I have the following code:

$Tel_File = gc '.\out.tel'

foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}

The $Tel_File is a little over 2000 lines long and looks like:

'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39

Some lines are much longer..

The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...

How can I get this to run faster?

Thank you for any help in advance!

-Tom

Jürgen Exner

unread,

May 24, 2016, 8:26:55 PM5/24/16

On Tue, 24 May 2016 05:11:48 -0700 (PDT), Tom <lyn...@gmail.com> wrote
in microsoft.public.windows.powershell:

>I have the following code:
>
>$Tel_File = gc '.\out.tel'
>
>foreach ($r in $array_ref) {
> Write-Host "Looking for:" $r
> if ($Tel_File -match ".*_AC_.*$r.*")
> {
> write-host "$r Connect to AC"
> }
> elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
> {
> write-host "$r Connect to VCC"
> }
>}
>
>The $Tel_File is a little over 2000 lines long and looks like:
>
>'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39
>
>Some lines are much longer..
>
>The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...

Well, yes, evaluating REs is complex and expensive.

>How can I get this to run faster?

There are 3 steps that come to my mind.

First the leading and trailing '.*' don't to anything useful except
making the RE more complex and more expensive to execute.
So get rid of them.

Second you are running this expensive combined RE match 2000x100x2
times. This can be optimized.
First filter for those elements which match just e.g. "_AC_" and in this
much smaller result list filter again for the matches from $r.

And third with one exception you are not really using any RE features
but you are doing a simple string compare. So why kicking off the
expensive RE engine when a simple substring compare will do already?
Please see https://technet.microsoft.com/en-us/library/ee692804.aspx,
section "Checking For Strings Within Strings" for the cheap and fast way
how to test if $b is a substring of $a.

jue

Tom

unread,

May 25, 2016, 8:14:18 AM5/25/16

Awesome, thank you very much. Just removing the leading and trailing .* saw a huge performance boost. I try the other suggestions as well.

-Tom

0 new messages