Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

simple -replace question

2 views
Skip to first unread message

Leo Tohill

unread,
Nov 7, 2009, 9:06:02 PM11/7/09
to

Why does

"one" -replace ".*", "two"
return
"twotwo"
?

WHere is the replace operator fully documented? All I can find is the
mention within about_comparison_operators and it is lacking.


Al Fansome

unread,
Nov 7, 2009, 10:50:59 PM11/7/09
to
My guess that ".*" first matches "one", and that gets replaced by "two".
Then, the empty string after "one" also gets replaced by "two", because
".*" matches the empty string as well as all other strings, so you get
"twotwo". If you try

"one" -replace ".+", "two"

You get "two".

Al Fansome

unread,
Nov 7, 2009, 11:09:29 PM11/7/09
to
Of interest, if you use the regex functions in Python,

re.sub(".*, "two", "one")

returns

'two'

Martin Zugec

unread,
Nov 8, 2009, 9:07:35 AM11/8/09
to

Well, .NET itself returns twotwo (even though it doesn't make too much sense
to me):
[regex]::Replace("one", ".*", "two")
twotwo

But I agree with Al Fansome - first replacement is because of empty string:

[regex]::Replace("", ".*", "two")
two

Martin

"Al Fansome" <al_fa...@hotmail.com> wrote in message
news:mJGdnRaCWr7k2WvX...@supernews.com...

RichS [MVP]

unread,
Nov 8, 2009, 9:30:01 AM11/8/09
to

The important part of -replace to note is that it uses regular expressions to
define how the replace works.

For a simple replace
PS> "one" -replace "one", "two"
two

PS> "one" -replace "e", "i"
oni

we get exactly what you would expect

Your replace
PS> "one" -replace ".*", "two"
twotwo

is using a regular expression as the characters to replace. So we need to
work out what ".*" means as a regular expression

. means any character

* means zero or more matches, matching as much as possible

so if we did this
PS> "one" -replace ".", "two"
twotwotwo

we would replace each character whereas

PS> "one" -replace "..", "two"
twoe

replaces the first two charcters and

PS> "one" -replace "...", "two"
two

replaces all 3 characters in one go.

So by doing

"one" -replace ".*", "two"

it looks like in effect you are doing a replace on the first 2 characters
followed by a replace on the last character to give your result

Hope this helps
--
Richard Siddaway
All scripts are supplied "as is" and with no warranty
PowerShell MVP
Blog: http://richardsiddaway.spaces.live.com/
PowerShell User Group: http://www.get-psuguk.org.uk

Larry__Weiss

unread,
Nov 8, 2009, 1:33:49 PM11/8/09
to
I still can't find any place (online or book format) that fully documents
PowerShell in an authoritarian way. There does not seem to be a published
formal specification for the PowerShell language.

Discussing PowerShell reminds me of the parable of the blind men and an elephant.

http://en.wikipedia.org/wiki/Blind_men_and_an_elephant

- Larry

Martin Zugec

unread,
Nov 8, 2009, 2:54:10 PM11/8/09
to

Based on my experiences, Posh RegEx is just shortcut to .NET RegEx.

Martin

"Larry__Weiss" <l...@airmail.net> wrote in message
news:exvzGIKY...@TK2MSFTNGP05.phx.gbl...

Larry__Weiss

unread,
Nov 8, 2009, 3:04:15 PM11/8/09
to
Yes, I would hoping that could be relied upon.

I see a .NET Framework Developer's Guide sewction on RegEX at
http://msdn.microsoft.com/en-us/library/hs600312.aspx

But does Microsoft publish a more formal specification?

- Larry

Leo Tohill

unread,
Nov 8, 2009, 4:04:02 PM11/8/09
to

Rich,

I'm with you on all that except the final part. You say "So by doing

"one" -replace ".*", "two" it looks like in effect you are doing a replace
on the first 2 characters followed by a replace on the last character to

give your result."

Are you saying that the length of the regex literal determines the length of
the match? That would be wrong.

Also note that "onexxx" -replace ".*", "two" still returns "twotwo" so the
rule would have to something like "replace the first two characters and then
also replace all the rest of the characters' which is also a completely
abitrary thing to do. Unless there's a better explanation, I think maybe
this is a bug.

In breaking news, i found that using "^.*" as the regex gives the expected
result of "two". I can't think of any reason the line-begin assertion should
make a diff here, but it does.

- leo

tojo2000

unread,
Nov 8, 2009, 6:25:16 PM11/8/09
to
On Nov 8, 1:04 pm, Leo Tohill <LeoToh...@discussions.microsoft.com>
wrote:

That's definitely a bug. There is no reason why the following
shouldn't all return the same result:

"one" -replace ".*", "two" # Replace zero or more of any character
appearing anywhere with the string "two"
"one" -replace ".+", "two" # Replace one or more of any character
appearing anywhere with the string "two"
"one" -replace "^.*$", "two" # Replace zero or more of any character,
starting at the beginning, with the string "two"

All of these work fine except for the ".*" condition.

Bob Landau

unread,
Nov 8, 2009, 6:45:01 PM11/8/09
to

Rich had the answer however he was just too close to see it

Here is what he said

>>is using a regular expression as the characters to replace. So we need to
>>work out what ".*" means as a regular expression

>> . means any character

>>* means zero or more matches, matching as much as possible

^^ ^^^
so to be obvious lets try this with an empty string

'' -replace '.*', 'two'

You need to be careful in using this regex '.*' for a -match and I'm sure
there's a place but I can't think of one when it comes to -replace

so if you were instead to use '.+' which means

1) any character except the end of line
2) one or more times

you'll get the behavior you want.

But even this will get you into a #$^ load of trouble both * and + are
greedy so be very careful what preceeds these won't generate false positivies.

bob

Larry__Weiss

unread,
Nov 8, 2009, 7:22:35 PM11/8/09
to

PS C:> "`none" -replace ".*", "two"
two
twotwo

gives us some hints about how .* is interpreted.

. cannot match a newline, so the number and order of matches are revealed as the
newline in "`none" is not replaced by using the ".*" regex but apparently two
zero-character matches are seen, along with a three characters at once match to
"two".

I'm not absolutely sure that is the standard way to define what should have
happened, but .NET is doing it that way.

- Larry

Larry__Weiss

unread,
Nov 8, 2009, 7:31:25 PM11/8/09
to

PowerShell is certainly giving .NET a thorough testing!

Never before has it been so easy to invoke .NET methods.

"So easy even a command-line man can do it!"

- Larry

Leo Tohill

unread,
Nov 8, 2009, 7:46:01 PM11/8/09
to
Ok, I get it:

.* produces two matches, the zero-length match and the full string match.
The replacement string replaces each of the matches.

BTW, my goal is to prefix each string in a string array with a blank. Is
there any easier way than this?

$x = "one", "two", "three" #test data
$x -replace ".+", " $&"

Leo Tohill

unread,
Nov 8, 2009, 7:51:01 PM11/8/09
to

Ok, I get it. .* produces two matches, the zero-length match and the full
match. The replace string replaces each of these.

What I'm really doing is prefixing each string in an array with a blank. Is
there an easier way than:

$x -replace ".+", "$&"

?

Larry__Weiss

unread,
Nov 8, 2009, 7:57:16 PM11/8/09
to

$x -replace '^', ' '

- Larry

Larry__Weiss

unread,
Nov 8, 2009, 9:36:33 PM11/8/09
to
The plot thickens! Consider:

PS C:> $regex = [regex]'.*'
PS C:> $regex.Matches("one")

Groups : {one}
Success : True
Captures : {one}
Index : 0
Length : 3
Value : one

Groups : {}
Success : True
Captures : {}
Index : 3
Length : 0
Value :

PS C:> $regex.Matches("`none")

Groups : {}
Success : True
Captures : {}
Index : 0
Length : 0
Value :

Groups : {one}
Success : True
Captures : {one}
Index : 1
Length : 3
Value : one

Groups : {}
Success : True
Captures : {}
Index : 4
Length : 0
Value :

- Larry


Larry__Weiss wrote:
> PowerShell is certainly giving .NET a thorough testing!
> Never before has it been so easy to invoke .NET methods.
> "So easy even a command-line man can do it!"
>
>

Bob Landau

unread,
Nov 8, 2009, 10:44:06 PM11/8/09
to

this will add a space to every word in an array of words
(\b\w+\b)

if all you want is to add a space to the first word here it is
(gc <file>) -replace '(^\b\w+\b.*)', ' $1'

you will need to adjust this to match how you are defining a "word"

here is a reference
http://msdn.microsoft.com/en-us/library/az24scfc.aspx

bob

Leo Tohill

unread,
Nov 8, 2009, 11:57:01 PM11/8/09
to
Really I was wondering if there was any completely different approach, not
using regex and perhaps not using -replace, that would be better. I think
you are confirming that this approach is fine - thanks for that.

I don't care about words, just the entire line, so it seems to me that
- replace ".+", "$&"

should work nicely.

Regards,

Leo

01MDM

unread,
Nov 9, 2009, 2:02:45 AM11/9/09
to
If you want to replace entire line:

tojo2000

unread,
Nov 9, 2009, 4:41:21 PM11/9/09
to
On Nov 8, 4:51 pm, Leo Tohill <LeoToh...@discussions.microsoft.com>
wrote:

That's why this is a bug, or at best undefined behavior. In a regular
expression, .* and .+ are both greedy by default, meaning they will
expand to fill up as much space as they can while still matching. In
both cases this means they expand to encompass the entire string. .*?
and .+? would give the non-greedy matches and match the shortest
string that still satisfies the expression, but even in that case .*
should only match zero characters if there are zero characters that
match.

It is using the Matches method of the System.Text.RegularExpressions
object, which is supposed to iterate through the string and return all
successful matches and their index in a MatchCollection object. What
is happening is that it is matching the entire string and then
returning to the string at the index past the last match, but instead
of realizing that it's run out of string to process, it's running one
last time on the string '' (empty), which is what's left over after
the first match because there is no more string left. In most cases
that would result in nothing more being returned, but in this case,
because .* also matches zero characters, it adds an empty result to
the MatchCollection before returning.

Larry__Weiss

unread,
Nov 9, 2009, 5:14:03 PM11/9/09
to
tojo2000 wrote:
> That's why this is a bug, ...

>
> It is using the Matches method of the System.Text.RegularExpressions
> object, which is supposed to iterate through the string and return all
> successful matches and their index in a MatchCollection object. What
> is happening is that it is matching the entire string and then
> returning to the string at the index past the last match, but instead
> of realizing that it's run out of string to process, it's running one
> last time on the string '' (empty), which is what's left over after
> the first match because there is no more string left. In most cases
> that would result in nothing more being returned, but in this case,
> because .* also matches zero characters, it adds an empty result to
> the MatchCollection before returning.
>

That's exactly what I see by executing

It's got to be a bug in .NET that PowerShell does not make up for.

Where does one report .NET bugs?

- Larry

0 new messages