Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Search for multiple things in a string

177 views
Skip to first unread message

tshad

unread,
Sep 13, 2005, 2:47:15 PM9/13/05
to
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom


Nicholas Paldino [.NET/C# MVP]

unread,
Sep 13, 2005, 2:53:43 PM9/13/05
to
Tom,

Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- m...@spam.guard.caspershouse.com

"tshad" <tschei...@ftsolutions.com> wrote in message
news:OOoiQOJu...@TK2MSFTNGP09.phx.gbl...

tshad

unread,
Sep 13, 2005, 2:58:07 PM9/13/05
to
"Nicholas Paldino [.NET/C# MVP]" <m...@spam.guard.caspershouse.com> wrote in
message news:O5RhxRJ...@TK2MSFTNGP12.phx.gbl...

> Tom,
>
> Your best bet would be to use a regular expression. You can use the
> classes in the System.Text.RegularExpressions namespace to do this.

This would be preferrable to the multiple if tests?

I don't know which is more efficient. Both would have to go back and test
for all the different items.

Thanks,

Tom

Jon Skeet [C# MVP]

unread,
Sep 13, 2005, 3:14:26 PM9/13/05
to
tshad <tschei...@ftsolutions.com> wrote:
> > Your best bet would be to use a regular expression. You can use the
> > classes in the System.Text.RegularExpressions namespace to do this.
>
> This would be preferrable to the multiple if tests?
>
> I don't know which is more efficient. Both would have to go back and test
> for all the different items.

Personally, I'd go for the "if" tests - possibly with a helper method
using a params string array to aid readability - unless the performance
is really a problem, in which case measuring that performance and that
of the regular expressions would be an absolute necessity.

Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Oliver Sturm

unread,
Sep 14, 2005, 5:41:37 AM9/14/05
to
Jon Skeet [C# MVP] wrote:

>Regular expressions are really powerful, but can be much harder to read
>than a series of very simple string operations.

But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

But you're right about the performance question for simple cases like
this, of course.


Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)

tshad

unread,
Sep 14, 2005, 11:27:55 AM9/14/05
to

"Oliver Sturm" <oli...@sturmnet.org> wrote in message
news:xn0e78dgy...@msnews.microsoft.com...

But it is nice to know the options.

BTW, what is the "@" for?

Thanks,

Tom

Oliver Sturm

unread,
Sep 14, 2005, 11:49:09 AM9/14/05
to
tshad wrote:

>But it is nice to know the options.
>
>BTW, what is the "@" for?

It defines a verbatim literal string. See here (MSDN):
http://shrinkster.com/81i

Jon Skeet [C# MVP]

unread,
Sep 14, 2005, 12:45:00 PM9/14/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> Jon Skeet [C# MVP] wrote:
>
> >Regular expressions are really powerful, but can be much harder to read
> >than a series of very simple string operations.
>
> But they really aren't in this case:
>
> if (Regex.IsMatch(myString, @"something1|something2|something3"))
> ...
>
> or even, in this special case:
>
> if (Regex.IsMatch(myString, @"something[123]"))
> ...

Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.



> I tend to think that regular expressions get hard to read when they are
> used to do complicated stuff - and then the alternative is usually not a
> "very simple string operation". Part of the reason, though, is that people
> don't know it's possible to stretch regular expressions over multiple
> lines and even use comments in them. I could rewrite the code above like
> this:
>
> string myRegex = @"
> something1 | # something1 is our first option
> something2 | # something2 would also be fine
> something3 # last chance, something3";
>
> if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
> ...
>
> So it's really easy to pick apart the expression and comment the parts - I
> don't think it's less readable than any other part of code. You have to
> know the language of course, but that's the same for any other programming
> language or construct out there.

Well, I don't have to learn (or more importantly, remember) *any* extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.

Oliver Sturm

unread,
Sep 14, 2005, 1:42:14 PM9/14/05
to
Jon Skeet [C# MVP] wrote:

>Until, of course, something1 etc start having characters in which need
>escaping - how confident would you be that you'd get that right? It's
>an extra thing to think about - and I'm sure the real strings aren't
>actually "something1" etc.

Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.

>Well, I don't have to learn (or more importantly, remember) any extra


>bits of language other than C# (which I already need to know) to get it
>right with IndexOf, even if the strings I'm looking for contain things
>like dots, stars etc. That isn't true for regular expressions.

No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough? I believe that Regular
Expressions are a powerful technology well worth learning - and it's
probably good advice to stay clear of them for anything but the simplest
applications if you're not willing to put in a bit of time to get to know
them.

About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.

Jon Skeet [C# MVP]

unread,
Sep 14, 2005, 5:51:10 PM9/14/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> >Until, of course, something1 etc start having characters in which need
> >escaping - how confident would you be that you'd get that right? It's
> >an extra thing to think about - and I'm sure the real strings aren't
> >actually "something1" etc.
>
> Aren't you exaggerating a bit here? There are regex testers out there to
> help you with building regular expressions and the Regex class itself
> knows how to escape special chars - it's not that big a deal.

No, but it's still harder to remember than not having to remember
anything special at all, which is what you get with IndexOf.

In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.

Now, where's the *advantage* of using regular expressions in this case?



> >Well, I don't have to learn (or more importantly, remember) any extra
> >bits of language other than C# (which I already need to know) to get it
> >right with IndexOf, even if the strings I'm looking for contain things
> >like dots, stars etc. That isn't true for regular expressions.
>
> No, it isn't. But you won't get far in today's programming world if you
> don't know the first thing about SQL or XML, for example, so I guess
> you're not suggesting that one language is enough?

No - but I'm suggesting that when one language works perfectly well for
the task at hand, and it's the same language that the rest of your code
is written in, it's easier to stick within that language.

> I believe that Regular Expressions are a powerful technology well
> worth learning - and it's probably good advice to stay clear of them
> for anything but the simplest applications if you're not willing to
> put in a bit of time to get to know them.

Regular expressions are absolutely worth learning for where they
provide extra value. In cases like this, where they're only really
providing extra things to remember (what you need to escape, or to call
Regex's own escaping mechanism) I don't think there's any value.

> About IndexOf, as I meant to say already, as long as the problems you're
> trying to solve are the kind that can be solved with those simple string
> functions (and without resulting in huge algorithms), you'll probably have
> the performance argument on your side anyway.

Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.

Oliver Sturm

unread,
Sep 15, 2005, 4:46:20 AM9/15/05
to
Jon Skeet [C# MVP] wrote:

>In a hurry, I can very easily see someone changing a string literal
>from one thing to another, not noticing that as it's a regular
>expression, they need to escape part of their new string.

In a hurry, all kinds of things can happen when making changes to source
code.

>Now, where's the advantage of using regular expressions in this case?

I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.

>>About IndexOf, as I meant to say already, as long as the problems you're
>>trying to solve are the kind that can be solved with those simple string
>>functions (and without resulting in huge algorithms), you'll probably have
>>the performance argument on your side anyway.
>
>Well, I'm much keener on the readability argument than the performance
>one - I suspect that the performance difference would rarely be of
>overall significance.

As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer. I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.

Jon Skeet [C# MVP]

unread,
Sep 15, 2005, 12:54:32 PM9/15/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> >In a hurry, I can very easily see someone changing a string literal
> >from one thing to another, not noticing that as it's a regular
> >expression, they need to escape part of their new string.
>
> In a hurry, all kinds of things can happen when making changes to source
> code.

Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" *shouldn't* be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.

> >Now, where's the advantage of using regular expressions in this case?
>
> I wasn't saying there was one in the specific scenario the OP introduced.
> I was using the example to show that regular expressions don't have to be
> any more complicated than simple string operations.

But there's *always* the added complexity of "do I have to escape this
or not". There are certainly times when the string operations become
more complicated than the corresponding regular expressions (otherwise
they really would be pointless - something I've never suggested), but I
don't believe that's the case here.



> >>About IndexOf, as I meant to say already, as long as the problems you're
> >>trying to solve are the kind that can be solved with those simple string
> >>functions (and without resulting in huge algorithms), you'll probably have
> >>the performance argument on your side anyway.
> >
> >Well, I'm much keener on the readability argument than the performance
> >one - I suspect that the performance difference would rarely be of
> >overall significance.
>
> As I'm trying to say all the time, as soon as an implementation reaches a
> complexity that makes it worth thinking about regular expressions, I'm
> sure an alternative solution based on simple string functions won't be
> more readable any longer.

Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)

> I'd even go so far as to say that as soon as
> more than one call to a simple string function is needed for a given
> problem, most probably I'll find the regular expression solution more
> readable. This is, after all, a subjective decision to make.

Whereas three calls to IndexOf is *definitely* more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.

Oliver Sturm

unread,
Sep 15, 2005, 2:37:17 PM9/15/05
to
Jon Skeet [C# MVP] wrote:

>>In a hurry, all kinds of things can happen when making changes to source
>>code.
>
>Indeed - but why make it even easier to introduce bugs? Changing a

>search from "somewhere" to "somewhere.com" shouldn't be something


>which requires significant thought, in my view - but it does as soon as
>you're using regular expressions.

But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.

>>As I'm trying to say all the time, as soon as an implementation reaches a
>>complexity that makes it worth thinking about regular expressions, I'm
>>sure an alternative solution based on simple string functions won't be
>>more readable any longer.
>
>Well, Nicholas certainly thought it worth thinking about regular
>expressions in this case - do you? (The earlier part of your reply
>suggests not, but the bit below suggests you do.)
>
>>I'd even go so far as to say that as soon as
>>more than one call to a simple string function is needed for a given
>>problem, most probably I'll find the regular expression solution more
>>readable. This is, after all, a subjective decision to make.
>

>Whereas three calls to IndexOf is definitely more readable than a


>regular expression which, depending on the strings involved may well
>need to involve escaping.

In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.

Jon Skeet [C# MVP]

unread,
Sep 15, 2005, 2:55:31 PM9/15/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> >Indeed - but why make it even easier to introduce bugs? Changing a
> >search from "somewhere" to "somewhere.com" shouldn't be something
> >which requires significant thought, in my view - but it does as soon as
> >you're using regular expressions.
>
> But in any proper real-world use case of regular expressions, there won't
> be an expression saying "somewhere" to start with. If the pattern string
> doesn't show any trace of wildcards or other recognizable regular
> expression features, it should be safe to assume that regular expressions
> aren't being used. If a string in some source code I don't know shows
> signs of being a match pattern and there's nothing else that tells me
> whether it's a regular expression or not, I'll have to look and find it
> out, there's no way around that. To be safe in assuming that no string
> could ever be a regular expression, regardless of whether it looks like
> it, you would have to forbid them completely in your team at least.

No - you just have to be careful when you're using regular expressions.
I prefer code which means I don't have to take as much care, because
being human, sooner or later I'll be careless. The fewer possibilities
I have for carelessness actually causing an error, the better.

I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you *and* every member of
your team?

> >Whereas three calls to IndexOf is definitely more readable than a
> >regular expression which, depending on the strings involved may well
> >need to involve escaping.
>
> In this case, as far as it's described by the sample we've seen, I
> wouldn't favor the usage of regular expressions.

Even though it's more than one call to a simple string function?

> I don't know whether the
> actual code that the OP is writing might justify regexes better. Anyway, I
> was merely using the case to demonstrate the fact that regular expressions
> don't have a readability problem, IMHO, or at least they don't need to
> have one if used properly.

They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.

I'm not saying they're hideously unreadable - just *less* readable.
That's enough for me.

--
Jon Skeet - <sk...@pobox.com>

http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet

Oliver Sturm

unread,
Sep 15, 2005, 3:38:36 PM9/15/05
to
Jon Skeet [C# MVP] wrote:

>I know I couldn't off the top of my head list all the characters which

>need escaping for regular expressions - could you and every member of
>your team?

I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.

>>>Whereas three calls to IndexOf is definitely more readable than a
>>>regular expression which, depending on the strings involved may well
>>>need to involve escaping.
>>
>>In this case, as far as it's described by the sample we've seen, I
>>wouldn't favor the usage of regular expressions.
>
>Even though it's more than one call to a simple string function?

Probably... the number of calls is not really what counts, is it?
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.

I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.

>They have a readability problem compared with simple operations - they
>require more care than simple literals. To me, "more care required"
>means "lower readability and maintainability", which is a problem.

Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation. Use the
simple operations as long as it makes sense, but don't hesitate to look at
other solutions because you think someone else on the team might make a
mistake changing a string literal later on.

>I'm not saying they're hideously unreadable - just less readable.


>That's enough for me.

Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.

Jon Skeet [C# MVP]

unread,
Sep 15, 2005, 3:52:43 PM9/15/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> >I know I couldn't off the top of my head list all the characters which
> >need escaping for regular expressions - could you and every member of
> >your team?
>
> I think I might, they are not really as many as you think. But that's not
> the point; I use a testing tool when I create a larger expression and I
> most probably use it again when I make changes. I have comments on my
> regular expressions telling me what they do, what sample input and output
> is. The first thing that's important is just that someone has to recognize
> a regular expression when he encounters it, you're right about that.

Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet", are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.

> >>>Whereas three calls to IndexOf is definitely more readable than a
> >>>regular expression which, depending on the strings involved may well
> >>>need to involve escaping.
> >>
> >>In this case, as far as it's described by the sample we've seen, I
> >>wouldn't favor the usage of regular expressions.
> >
> >Even though it's more than one call to a simple string function?
>
> Probably... the number of calls is not really what counts, is it?

I was only going by what you'd said previously:

<quote>
I'd even go so far as to say that as soon as more than one call to a


simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.

</quote>

> Sometimes, string parsing algorithms that don't make use of regular
> expressions involve several nested loops, several temporary variables and
> just a single call to a simple string function. Yet these beasts can be
> horrible because it takes only a short while until even the author can't
> reliably remember what the algorithm does.

Absolutely.

> I won't contest the fact that three lines of code, calling IndexOf three
> times, are probably a better alternative to a regular expression.

Goodo :)

> >They have a readability problem compared with simple operations - they
> >require more care than simple literals. To me, "more care required"
> >means "lower readability and maintainability", which is a problem.
>
> Well, let's agree to disagree. I'm still trying to make the point that the
> comparison with simple string literals is a bad one, because the two won't
> ever be equal alternatives in any real world problem situation.

I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.

> Use the simple operations as long as it makes sense, but don't
> hesitate to look at other solutions because you think someone else on
> the team might make a mistake changing a string literal later on.

If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.



> >I'm not saying they're hideously unreadable - just less readable.
> >That's enough for me.
>
> Jon, I'm with you most of the way. But there's a limit to the demand for
> readability, as I see it. I'm not likely to turn down a useful technology
> in cases where it is practically without alternatives because the solution
> doesn't please me aesthetically.

Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)

Oliver Sturm

unread,
Sep 15, 2005, 4:16:46 PM9/15/05
to
Jon Skeet [C# MVP] wrote:

>>>Even though it's more than one call to a simple string function?
>>
>>Probably... the number of calls is not really what counts, is it?
>
>I was only going by what you'd said previously:
>
><quote>
>I'd even go so far as to say that as soon as more than one call to a
>simple string function is needed for a given problem, most probably
>I'll find the regular expression solution more readable.
></quote>

I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".

>>Well, let's agree to disagree. I'm still trying to make the point that the
>>comparison with simple string literals is a bad one, because the two won't
>>ever be equal alternatives in any real world problem situation.
>
>I don't see how you can say that when using regular expressions was one
>suggested solution, and using IndexOf was another suggested solution.

Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.

>>Use the simple operations as long as it makes sense, but don't
>>hesitate to look at other solutions because you think someone else on
>>the team might make a mistake changing a string literal later on.
>
>If the other solution is likely to be fundamentally simpler, I'm all
>for that. It was this particular situation that I was commenting on,
>and the general comment that regular expressions are often used as a
>sledgehammer to crack a pretty flimsy nut.

You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?

>>Jon, I'm with you most of the way. But there's a limit to the demand for
>>readability, as I see it. I'm not likely to turn down a useful technology
>>in cases where it is practically without alternatives because the solution
>>doesn't please me aesthetically.
>

>Me either - but where there is a practical alternative which is more
>readable, I'll go for that. If you only have one solution, you can't


>turn it down really, can you? (Unless you can forego the feature which
>requires it, of course, which is unlikely.)

Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)

Jon Skeet [C# MVP]

unread,
Sep 15, 2005, 4:25:43 PM9/15/05
to
Oliver Sturm <oli...@sturmnet.org> wrote:
> ><quote>
> >I'd even go so far as to say that as soon as more than one call to a
> >simple string function is needed for a given problem, most probably
> >I'll find the regular expression solution more readable.
> ></quote>
>
> I know I said that and I know you were referring to it. But I meant one
> call as in "one call at runtime", as opposed to "one line of code that
> makes the call".

Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.



> >>Well, let's agree to disagree. I'm still trying to make the point that the
> >>comparison with simple string literals is a bad one, because the two won't
> >>ever be equal alternatives in any real world problem situation.
> >
> >I don't see how you can say that when using regular expressions was one
> >suggested solution, and using IndexOf was another suggested solution.
>
> Sorry, I meant "simple string operations". And I meant that I wouldn't
> consider using a regular expression if an IndexOf could do the job just as
> well - the two are no equal alternatives because I wouldn't seriously
> consider one of them.

Right - but unfortunately (IMO) other people do.

> >If the other solution is likely to be fundamentally simpler, I'm all
> >for that. It was this particular situation that I was commenting on,
> >and the general comment that regular expressions are often used as a
> >sledgehammer to crack a pretty flimsy nut.
>
> You're right about that. Complex technologies tend to be misused more
> often than simple ones, don't they?

Absolutely...



> >Me either - but where there is a practical alternative which is more
> >readable, I'll go for that. If you only have one solution, you can't
> >turn it down really, can you? (Unless you can forego the feature which
> >requires it, of course, which is unlikely.)
>
> Well, usually someone will come forward with other solutions, however
> far-fetched. One that can actually be quite a good alternative to more
> complex regular expression scenarios is writing a parser - or rather,
> using a compiler compiler to create one. But in my experience there's a
> lot of room for nicely written regular expressions, somewhere between a
> few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)

Oh certainly. I'm really *not* trying to suggest that regular
expressions should never be used - just that they shouldn't be the
first port of call as soon as you need to do anything with a string :)

Oliver Sturm

unread,
Sep 16, 2005, 10:46:48 AM9/16/05
to
Jon Skeet [C# MVP] wrote:

>>><quote>
>>>I'd even go so far as to say that as soon as more than one call to a
>>>simple string function is needed for a given problem, most probably
>>>I'll find the regular expression solution more readable.
>>></quote>
>>
>>I know I said that and I know you were referring to it. But I meant one
>>call as in "one call at runtime", as opposed to "one line of code that
>>makes the call".
>
>Not quite with you there - in this case, there would be three calls at
>runtime, and three lines of code.

And in this case I would be prepared to see things differently - I said
already that I don't believe in call counting. But the sentence you quoted
was meant more in the context of the problem I was describing, where
simple string functions are used as a part of a, possibly hugely
complicated, larger algorithm.

As soon as there are loops involved, which may or may not result in a
single line with such a call being executed multiple times, things start
getting complex very quickly in my experience. How often have you been
sitting there with the debugger running, counting characters in a string
to find that one-off problem somebody introduced? I'll take an enormously
unreadable regular expression over that task any day :-)

tshad

unread,
Sep 19, 2005, 7:11:36 PM9/19/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d99259a5...@msnews.microsoft.com...
> tshad <tschei...@ftsolutions.com> wrote:
>> I also feel that Regular Expressions, being an object in asp.net (not
>> necessarily C#) makes it just as valid as C#.
>
> Regular expressions have nothing to do with ASP.NET - they're a part of
> "normal" .NET.

Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).

But the discussion was valid in you use the best tool for the situation.

>
>> As far as readability, it has nothing to do with Regular Expressions
>> whether
>> it is readable or not, as Oliver mentions, but how you write it.
>
> No - I believe that searching for "jon.skeet" with IndexOf is clearer
> than searching for "jon\\.skeet" or @"jon\.skeet".

That's maybe true. But it would be clear to someone used to using both C#
and Regex.

Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.

>Which of them
> contains just the information which is actually of concern, and which
> contains information which is only present due to the technology used
> to do the searching?
>
>> You can also make some pretty unreadable C# code as well.
>
> Sure, but that's no reason to use regular expressions just to make
> things worse.

I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.

>
>> Readability is a function of the programmer not the language (in most
>> cases).
>
> Yes, but it's the programmer's decision how to approach things -
> whether you do things the simple way or the complex way. You *could*
> implement the string search by manually iterating over all the
> characters in the string, perhaps even writing your own state machine
> to do it. The code could be pretty readable considering what it's doing
> - but it's *bound* to be more complex than using IndexOf.

I agree.

Just because you can - doesn't mean you should.

>
>> As was also mentioned you also need to know the language. For someone
>> not used to objects, abstract objects and interfaces are also hard to
>> read.
>
> Sure - but why introduce unnecessarily complexity? You're already
> writing C#, so you'd better know C# - but why add regular expressions
> into the mix when they're unnecessary?

But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix. But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer. In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).

I personally feel that both solutions are equally usable and readable (in
this situation).

I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.

I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).
>
>> I like seeing different options and make a choice. Sometimes I may use
>> something like Regex just so I am used to using it, as long as the
>> problem
>> warrants it.
>
> And that's the point - I don't think this problem *does* warrant it.

I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.
>
>> You don't use it - you lose it.
>
> So do you add a database when you just need to do a hashtable lookup,
> just in case you forget SQL? Do you use reflection to get at the value
> of a property, just in case you forget how to use that? I hope not.

Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.

>
> It's very important to use appropriate technology, rather than using it
> for the sake of it. (It's one thing to experiment with technology for
> the sake of it as a learning tool, but I wouldn't do it in production
> code.)

Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.

But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.

As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.

Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?

Tom


> --
> Jon Skeet - <sk...@pobox.com>

> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet

tshad

unread,
Sep 19, 2005, 2:44:48 PM9/19/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d93b7a5f...@msnews.microsoft.com...
Escaping?

You've mentioned that as being a problem a couple of times.

What do you mean by this?

Are you talking about stopping if you find the first one matching?

Thanks,

Tom

tshad

unread,
Sep 19, 2005, 2:52:26 PM9/19/05
to

"Oliver Sturm" <oli...@sturmnet.org> wrote in message
news:xn0e7a65k...@msnews.microsoft.com...

I also feel that Regular Expressions, being an object in asp.net (not

necessarily C#) makes it just as valid as C#.

As far as readability, it has nothing to do with Regular Expressions whether

it is readable or not, as Oliver mentions, but how you write it.

You can also make some pretty unreadable C# code as well. Readability is a
function of the programmer not the language (in most cases). As was also

mentioned you also need to know the language. For someone not used to
objects, abstract objects and interfaces are also hard to read.

I like seeing different options and make a choice. Sometimes I may use

something like Regex just so I am used to using it, as long as the problem
warrants it.

You don't use it - you lose it.

Tom

Jon Skeet [C# MVP]

unread,
Sep 19, 2005, 2:53:59 PM9/19/05
to
tshad <tschei...@ftsolutions.com> wrote:
> Escaping?
>
> You've mentioned that as being a problem a couple of times.
>
> What do you mean by this?
>
> Are you talking about stopping if you find the first one matching?

No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.

Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.


"jon\\.skeet"
or
@"jon\.skeet"

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet

tshad

unread,
Sep 19, 2005, 3:06:43 PM9/19/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d9919a0e...@msnews.microsoft.com...

> tshad <tschei...@ftsolutions.com> wrote:
>> Escaping?
>>
>> You've mentioned that as being a problem a couple of times.
>>
>> What do you mean by this?
>>
>> Are you talking about stopping if you find the first one matching?
>
> No - I'm talking about finding things like "jon.skeet" in a string.
> Using IndexOf, that's no problem - no characters are interpreted in a
> "special" way by IndexOf.
>
> Regular expressions, however, treat "." as "any character", so to find
> an actual dot, you need to escape it with a backslash - and from a C#
> point of view that means either doubling the backslash or using a
> verbatim string literal, i.e.
> "jon\\.skeet"
> or
> @"jon\.skeet"

Got ya.

I thought you were talking about escaping the function/call as you might in
a loop when you find what you are looking for.

Thanks,

Tom

Jon Skeet [C# MVP]

unread,
Sep 19, 2005, 3:45:05 PM9/19/05
to
tshad <tschei...@ftsolutions.com> wrote:
> I also feel that Regular Expressions, being an object in asp.net (not
> necessarily C#) makes it just as valid as C#.

Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.


> As far as readability, it has nothing to do with Regular Expressions whether
> it is readable or not, as Oliver mentions, but how you write it.

No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet". Which of them

contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?

> You can also make some pretty unreadable C# code as well.

Sure, but that's no reason to use regular expressions just to make
things worse.

> Readability is a function of the programmer not the language (in most
> cases).

Yes, but it's the programmer's decision how to approach things -

whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.

> As was also mentioned you also need to know the language. For someone


> not used to objects, abstract objects and interfaces are also hard to
> read.

Sure - but why introduce unnecessarily complexity? You're already

writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?

> I like seeing different options and make a choice. Sometimes I may use

> something like Regex just so I am used to using it, as long as the problem
> warrants it.

And that's the point - I don't think this problem *does* warrant it.



> You don't use it - you lose it.

So do you add a database when you just need to do a hashtable lookup,

just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.

It's very important to use appropriate technology, rather than using it

for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet

Jon Skeet [C# MVP]

unread,
Sep 20, 2005, 2:27:08 AM9/20/05
to
tshad <tschei...@ftsolutions.com> wrote:
> > Regular expressions have nothing to do with ASP.NET - they're a part of
> > "normal" .NET.
>
> Actually, you're right.
>
> But that was my point.
>
> Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
> So using Regex is not really like using another language (as C# is different
> from VB.Net).

It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.

> But the discussion was valid in you use the best tool for the situation.

Indeed.

> >> As far as readability, it has nothing to do with Regular Expressions
> >> whether
> >> it is readable or not, as Oliver mentions, but how you write it.
> >
> > No - I believe that searching for "jon.skeet" with IndexOf is clearer
> > than searching for "jon\\.skeet" or @"jon\.skeet".
>
> That's maybe true. But it would be clear to someone used to using both C#
> and Regex.

But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?

> Also, you have the same problem when dealing with web pages or getting a
> file from the disk. You still use the escape character there (and as you
> say, is a little confusing) - but you still do it.

You have to know the C# escaping, but not the regular expression
escaping.

> >> You can also make some pretty unreadable C# code as well.
> >
> > Sure, but that's no reason to use regular expressions just to make
> > things worse.
>
> I agree with you that readability is important.
>
> It used to be that people didn't like C and C++ for exactly the same reason
> you point out. The code was not as clear as COBOL or Basic and that was the
> complaint back then. I happened to be a Fortran programmer at that time and
> was not interested to moving to C for that reason (not that Fortran was
> better - readability wise).
>
> The problem with C back that was that even though much of the code was
> really cryptic. But it didn't have to be, that was just how people coded
> back then. Mainly, it was important to make the most efficient code
> possible because of the limited computing power and efficient rarely equates
> to readable. And I am not even talking about compiling and linking and all
> the options and cryptic command lines.

To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.

> > Yes, but it's the programmer's decision how to approach things -
> > whether you do things the simple way or the complex way. You *could*
> > implement the string search by manually iterating over all the
> > characters in the string, perhaps even writing your own state machine
> > to do it. The code could be pretty readable considering what it's doing
> > - but it's *bound* to be more complex than using IndexOf.
>
> I agree.
>
> Just because you can - doesn't mean you should.

Exactly.

> > Sure - but why introduce unnecessarily complexity? You're already
> > writing C#, so you'd better know C# - but why add regular expressions
> > into the mix when they're unnecessary?
>
> But if you know both and as I (and you) mentioned regex is part of .net as
> is C# - so it is already in the mix.

No, it's not. It's not already used in every single C# program, any
more than SQL is.

> But you're right, don't introduce any
> more complexity that necessary. But if it's 6 of one ... it's really up to
> the programmer.

In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.

> In the original case, that was what it was. You can't tell
> me that you feel that the solution suggested for this case was even close to
> being unreadable (if you are even a stones throw from understanding Regular
> Expressions).

It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.

> I personally feel that both solutions are equally usable and readable (in
> this situation).

I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?

> I have also seen times when I just couldn't find an easy solution in C# or
> VB and it was fairly easy in Regex.

Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.

> I myself would usually opt for the C# or VB solutions first, but would have
> no problem using Regex. As a matter of fact, I use Regex to strip commas
> and $ from my textbox fields before writing it to SQL as it was the best
> solution I could find. Such as:
>
> SalaryMax.Text =
> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
>
> At the time, I couldn't seem to find as simple a solution as this in VB.Net
> so I use this (not saying there isn't one).

And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
.Replace(",", ""));

I know which version I'd rather read...

> > And that's the point - I don't think this problem *does* warrant it.
>
> I agree that is isn't necessary here, but I don't think it is warranted or
> unwarranted here. I think it's just as readable either way.

But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.

> > So do you add a database when you just need to do a hashtable lookup,
> > just in case you forget SQL? Do you use reflection to get at the value
> > of a property, just in case you forget how to use that? I hope not.
>
> Of course not. But as was mentioned there are times where Regex may be a
> good solution and if you can do it either way, why not.

Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.

> > It's very important to use appropriate technology, rather than using it
> > for the sake of it. (It's one thing to experiment with technology for
> > the sake of it as a learning tool, but I wouldn't do it in production
> > code.)
>
> Right. But Regex is not inappropriate technology. As you said, trying to
> loop through each character when there is an easier way is a bit much.

As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.

> But Regex is valid and is an appropriate method for handling strings and if
> you are as comfortable with one as the other than it isn't inappropriate.
> It's all in how you use it. And I was not saying experiment with it. I was
> saying using it for the sake of staying familier with it. I don't want to
> need to use it and have to figure it out when I need to use it.

Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.

> As you said. Use the appropriate tool. If the appropriate tool is Regex,
> it is going to be d... inconvenient to need it and not know how to use it.

I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.



> Now I am not saying go out and learn every tool out there. But if it is a
> valid tool in your particular environment, and it is available - why would
> you not avail yourself of it?

Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?

tshad

unread,
Sep 20, 2005, 12:49:08 PM9/20/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d99bc169...@msnews.microsoft.com...

> tshad <tschei...@ftsolutions.com> wrote:
>> > Regular expressions have nothing to do with ASP.NET - they're a part of
>> > "normal" .NET.
>>
>> Actually, you're right.
>>
>> But that was my point.
>>
>> Regex is part of .net as is C# (although it doesn't have to be) or
>> VB.Net.
>> So using Regex is not really like using another language (as C# is
>> different
>> from VB.Net).
>
> It is - the regular expression *language* is a different language to
> C#, in the same way that XPath is. That's why under "regular
> expressions" in MSDN, there's a "language elements" section.

I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language). It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.

You don't build pages with it. It isn't procedural. It is a tool used by
the other languages. You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).

>
>> But the discussion was valid in you use the best tool for the situation.
>
> Indeed.
>
>> >> As far as readability, it has nothing to do with Regular Expressions
>> >> whether
>> >> it is readable or not, as Oliver mentions, but how you write it.
>> >
>> > No - I believe that searching for "jon.skeet" with IndexOf is clearer
>> > than searching for "jon\\.skeet" or @"jon\.skeet".
>>
>> That's maybe true. But it would be clear to someone used to using both
>> C#
>> and Regex.
>
> But not as instantly clear, I believe. Can you really say that you find
> the regex version doesn't take you *any* longer to understand than the
> non-regex version?

Depends on the C# code as well as the Regex code.

Again, are we talking about the best tool for the job or the most
readability. As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just a
valid as C# (or VB.Net) string handlers.

I do use them when convenient.

For example, I was creating a simple text search engine and wanted to modify
what the user put in and found it simpler to do the following than in VB or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.

>
>> Also, you have the same problem when dealing with web pages or getting a
>> file from the disk. You still use the escape character there (and as you
>> say, is a little confusing) - but you still do it.
>
> You have to know the C# escaping, but not the regular expression
> escaping.

But you do NEED to know the C# escaping (readability not high - unless you
understand it).

But writing objects and the objects themselves are not easily readable. But
you would advocate not writing them, would you?

>
>> > Yes, but it's the programmer's decision how to approach things -
>> > whether you do things the simple way or the complex way. You *could*
>> > implement the string search by manually iterating over all the
>> > characters in the string, perhaps even writing your own state machine
>> > to do it. The code could be pretty readable considering what it's doing
>> > - but it's *bound* to be more complex than using IndexOf.
>>
>> I agree.
>>
>> Just because you can - doesn't mean you should.
>
> Exactly.
>
>> > Sure - but why introduce unnecessarily complexity? You're already
>> > writing C#, so you'd better know C# - but why add regular expressions
>> > into the mix when they're unnecessary?
>>
>> But if you know both and as I (and you) mentioned regex is part of .net
>> as
>> is C# - so it is already in the mix.
>
> No, it's not. It's not already used in every single C# program, any
> more than SQL is.

Nor are all the objects you use.

But if you are using .Net, it is part of the mix.

>
>> But you're right, don't introduce any
>> more complexity that necessary. But if it's 6 of one ... it's really up
>> to
>> the programmer.
>
> In what way is it 6 of one or half a dozen of the other when one
> solution requires knowing more than the other? I would expect *any* C#
> programmer to know what String.IndexOf does. I wouldn't expect all C#
> programmers to know by heart which regex language elements require
> escaping - and if you don't know that off the top of your head, then
> changing the code to search for a different string involves an extra
> bit of brainpower.

Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it :)

I don't know all of the possible combinations of calls to every Object, but
that doesn't preclude me from using them.

My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.

I happen to use .Net. Regex is part of .Net. I would be limiting myself if
I didn't use Regex in places where it is appropriate. If I happen to know a
good way in Regex to solve a problem, I am not going use *extra brainpower*
to try to solve the problem in C#.

>
>> In the original case, that was what it was. You can't tell
>> me that you feel that the solution suggested for this case was even close
>> to
>> being unreadable (if you are even a stones throw from understanding
>> Regular
>> Expressions).
>
> It was *less* readable though - and would have been *significantly*
> less readable if the string being searched for had included dots,
> brackets etc.

But it didn't. But if it did, it is no different than having to deal with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What is
the first 0 for? What about the 2nd? It is readable because you know C.

I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word "IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And that
would be to someone not familier with regular expressions doing a quick
perusal

So I am at a loss as to how this regular expression is more unreadable than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.

>
>> I personally feel that both solutions are equally usable and readable (in
>> this situation).
>
> I suspect not all programmers would though. Don't forget that the
> person who writes the code is very often not the one to maintain it.
> Can you guarantee that *everyone* who touches the code will find
> regexes as readable as String.IndexOf?

As was said, you can make readable and unreadable C or Regex code. Are you
going to tell your programmers they "cannot" use Regex for the same reason?

Are you going to leave out some objects that programmers may not be familier
with?

>
>> I have also seen times when I just couldn't find an easy solution in C#
>> or
>> VB and it was fairly easy in Regex.
>
> Which is why I've said repeatedly that I'm not trying to suggest that
> regexes are bad, or should never be used. I'm just saying that in this
> case it's using a sledgehammer to crack a nut.

And I don't in this case, as I think I've shown. Less typing, easy to read,
straight forward - in this case.


>
>> I myself would usually opt for the C# or VB solutions first, but would
>> have
>> no problem using Regex. As a matter of fact, I use Regex to strip commas
>> and $ from my textbox fields before writing it to SQL as it was the best
>> solution I could find. Such as:
>>
>> SalaryMax.Text =
>> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
>>
>> At the time, I couldn't seem to find as simple a solution as this in
>> VB.Net
>> so I use this (not saying there isn't one).
>
> And of course there is:
> SalaryMax.Text =
> String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
> .Replace(",", ""));
>
> I know which version I'd rather read...

I can read either (although, I didn't know you could string multiple
"Replace"s together).

>
>> > And that's the point - I don't think this problem *does* warrant it.
>>
>> I agree that is isn't necessary here, but I don't think it is warranted
>> or
>> unwarranted here. I think it's just as readable either way.
>
> But I suspect you're more used to regular expressions than many other
> programmers - and making the code less readable for other programmers
> for no benefit is what makes it unwarranted here, even in the simple
> case where there's nothing to escape.

First of all, I am not. I don't use it much at all, but I find it easy to
figure out and staight forward (but you can make it really complex). I use
it to validate phone numbers, credit card numbers, zip codes etc. Which are
very well documented and when there are a myiad of ways a user can put input
these types of data, I prefer to use Regular expressions which are all over
the place (easy to find) then try to come put with some complex set of loops
and temporary variables which make it far easier to make a mistake and much
more unreadable the the Regex equivelant.


>
>> > So do you add a database when you just need to do a hashtable lookup,
>> > just in case you forget SQL? Do you use reflection to get at the value
>> > of a property, just in case you forget how to use that? I hope not.
>>
>> Of course not. But as was mentioned there are times where Regex may be a
>> good solution and if you can do it either way, why not.
>
> Because it's more complicated! You can't deny that there's more to
> consider due to the escaping. There's more to know, more to consider,
> and it doesn't get the job done any more cleanly.

Escaping seems to be your main compaint with it.

I have the same problem with C or VB when trying to remember when to use "\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.

>
>> > It's very important to use appropriate technology, rather than using it
>> > for the sake of it. (It's one thing to experiment with technology for
>> > the sake of it as a learning tool, but I wouldn't do it in production
>> > code.)
>>
>> Right. But Regex is not inappropriate technology. As you said, trying
>> to
>> loop through each character when there is an easier way is a bit much.
>
> As is using the power of regular expressions when there is an easier
> way - using IndexOf, which is *precisely* there to find one string
> within another.

I am not discounting IndexOf, I am just saying that both work fine and are
just as readable (in this case). In other cases, that may not be the case
(with either C or Regex).

>
>> But Regex is valid and is an appropriate method for handling strings and
>> if
>> you are as comfortable with one as the other than it isn't inappropriate.
>> It's all in how you use it. And I was not saying experiment with it. I
>> was
>> saying using it for the sake of staying familier with it. I don't want
>> to
>> need to use it and have to figure it out when I need to use it.
>
> Do you really think it would take you that long to refamiliarise
> yourself with it? I don't see why it's a good idea to make some poor
> maintenance engineer who hasn't used regular expressions before try to
> figure out that *actually* you were just trying to find strings within
> each other just so you can keep your skill set current.

So you would prefer to code to the lowest common denominator.

I am not going to code to the level of a junior programmer. I prefer that
he learn to code to a higher level.

I am not saying that that you still should write decent, readable, commented
code. But I am not going to limit myself because another programmer may not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).

>
>> As you said. Use the appropriate tool. If the appropriate tool is
>> Regex,
>> it is going to be d... inconvenient to need it and not know how to use
>> it.
>
> I've never had a problem with reading the documentation when I've
> needed to use regular expressions, without putting it in projects in
> places where I *don't* need it.
>

"Need" is a personal question. I don't thing it applies here. You prefer
IndexOf and I might prefer IsMatch.

>> Now I am not saying go out and learn every tool out there. But if it is
>> a
>> valid tool in your particular environment, and it is available - why
>> would
>> you not avail yourself of it?
>
> Because it makes things more complicated for no benefit. The reflection
> example was a good one - that allows you to get a property value, so do
> you think it's a good idea to write:
>
> string x = (string) something.GetType()
> .GetProperty("Name")
> .GetValue(something, null);
> or
>
> string x = something.Name;
>
> ?
>
> Maybe I should use the latter. After all, I wouldn't want to forget how
> to use reflection, would I?

Lost me on that one.

Tom

Jon Skeet [C# MVP]

unread,
Sep 20, 2005, 2:20:36 PM9/20/05
to
tshad <tschei...@ftsolutions.com> wrote:
> > It is - the regular expression *language* is a different language to
> > C#, in the same way that XPath is. That's why under "regular
> > expressions" in MSDN, there's a "language elements" section.
>
> I think calling it a language is a stretch, although I know it is called a
> language in places(it's all in what you define as a language).

In plenty of places. It has a language with a defined syntax etc.

> It really is
> a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
> by various languages.
>
> You don't build pages with it. It isn't procedural.

Neither of those are required for it to be a language.

> It is a tool used by the other languages.

Sure - so is XPath, but that's a language too.
(See http://www.w3.org/TR/xpath)

> You don't use VB.Net in C# or Vice versa but both use
> Regular expressions (as the both use Substring, Replace etc).

None of those state that regular expressions aren't a language.

> > But not as instantly clear, I believe. Can you really say that you find
> > the regex version doesn't take you *any* longer to understand than the
> > non-regex version?
>
> Depends on the C# code as well as the Regex code.

The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)

If I did it regularly, I'd write a short method which took a params
string array.



> Again, are we talking about the best tool for the job or the most
> readability.

Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.

> As was mentioned before, you set up loops and temporary
> variables to do what you can do in a simple Regular Expression.
>
> Again, I am not pushing Regular Expressions here, just that they are just a
> valid as C# (or VB.Net) string handlers.

But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.

> I do use them when convenient.
>
> For example, I was creating a simple text search engine and wanted to modify
> what the user put in and found it simpler to do the following than in VB or
> C:
>
> ' The following replaces all multiple blanks with " ". It then takes
> ' out the anomalies, such as "and not and" and replaces them with "and"
>
> keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
> keywords = Regex.Replace(keywords, "( )", " or ")
> keywords = Regex.Replace(keywords," or or "," ")
> keywords = Regex.Replace(keywords,"or and or","and")
> keywords = Regex.Replace(keywords,"or near or","near")
> keywords = Regex.Replace(keywords,"and not or","and not")
>
> Fairly straight forward and easy to follow.

Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?
In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.

> >> Also, you have the same problem when dealing with web pages or getting a
> >> file from the disk. You still use the escape character there (and as you
> >> say, is a little confusing) - but you still do it.
> >
> > You have to know the C# escaping, but not the regular expression
> > escaping.
>
> But you do NEED to know the C# escaping (readability not high - unless you
> understand it).

Yes, but I *already* need to know that in order to write C#. Choosing
to use String.IndexOf doesn't add to what I need to remember - choosing
regular expressions does. In addition, there aren't many things which
need escaping compared with those which need escaping in regular
expressions. In addition to *that*, whenever you need to escape in
regular expressions, you also need to escape in C# (or remember to use
verbatim string literals) - yet another piece of headache.

> > To me, a lot of readability comes from decent naming and commenting,
> > which fortunately are available in pretty much any language. I'd
> > certainly agree that object orientation (and exceptions, automatic
> > memory management etc) makes it a lot easier to write readable code
> > though.
>
> But writing objects and the objects themselves are not easily readable. But
> you would advocate not writing them, would you?

No, but I don't see how that's relevant.

> >> But if you know both and as I (and you) mentioned regex is part of .net
> >> as is C# - so it is already in the mix.
> >
> > No, it's not. It's not already used in every single C# program, any
> > more than SQL is.
>
> Nor are all the objects you use.
>
> But if you are using .Net, it is part of the mix.

It's not necessarily part of the mix I have to use. I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.

> > In what way is it 6 of one or half a dozen of the other when one
> > solution requires knowing more than the other? I would expect *any* C#
> > programmer to know what String.IndexOf does. I wouldn't expect all C#
> > programmers to know by heart which regex language elements require
> > escaping - and if you don't know that off the top of your head, then
> > changing the code to search for a different string involves an extra
> > bit of brainpower.
>
> Why? Ever heard of references or cheat sheets? And what is wrong with a
> little extra brainpower - if you don't use it, you lose it :)

If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.

If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.

> I don't know all of the possible combinations of calls to every Object, but
> that doesn't preclude me from using them.

Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?

> My position has always been, don't memorize. You will remember what you
> use. But if you know how to get it (where to look), then you have
> everything you need.

Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?

> I happen to use .Net. Regex is part of .Net. I would be limiting myself if
> I didn't use Regex in places where it is appropriate.

I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.

Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.

> If I happen to know a good way in Regex to solve a problem, I am not
> going use *extra brainpower* to try to solve the problem in C#.

In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower? If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.

> > It was *less* readable though - and would have been *significantly*
> > less readable if the string being searched for had included dots,
> > brackets etc.
>
> But it didn't. But if it did, it is no different than having to deal with
> escapes in C (less readable)
>
> If you are talking about
>
> if ((someString.IndexOf("something1",0) >= 0) ||
> ((someString.IndexOf("something2",0) >= 0) ||
> ((someString.IndexOf("something3",0) >= 0))
> {
> Do something
> }
>
> vs
>
> if (Regex.IsMatch(myString, @"something1|something2|something3"))
>
> If you know absolutely nothing about Regular expressions, I would agree that
> this is less readable.
>
> But I would also contend that IndexOf could be just as confusing. What is
> the first 0 for? What about the 2nd? It is readable because you know C.

Well, for a start the 0s aren't necessary, and I wouldn't include them.



> I would maintain that if even if you knew nothing about Regex, you would
> assume that you are doing a Match (can't tell that from the word "IndexOf")
> and it probably has something to do with the words "something1",
> "something2" and "something3". Now if you know C than I would assume you
> would pick up that "|" is "or" (not so clear to a VB programmer). And that
> would be to someone not familier with regular expressions doing a quick
> perusal

Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?

> So I am at a loss as to how this regular expression is more unreadable than
> the C# counterpart. That is not to say that you couldn't make it more
> unreadable - but you could do the same with C# if you wanted to.

You could start by making the C# more readable, as I've shown...

However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.
2) It's got all the strings concatenated, so it's harder to spot each
of them separately.

And that's before you need to actually *maintain* the code.

Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?

> > I suspect not all programmers would though. Don't forget that the
> > person who writes the code is very often not the one to maintain it.
> > Can you guarantee that *everyone* who touches the code will find
> > regexes as readable as String.IndexOf?
>
> As was said, you can make readable and unreadable C or Regex code. Are you
> going to tell your programmers they "cannot" use Regex for the same reason?

I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.



> Are you going to leave out some objects that programmers may not be familier
> with?

Absolutely, where there are simpler and more familiar ways of solving
the same problem.

> > Which is why I've said repeatedly that I'm not trying to suggest that
> > regexes are bad, or should never be used. I'm just saying that in this
> > case it's using a sledgehammer to crack a nut.
>
> And I don't in this case, as I think I've shown. Less typing, easy to read,
> straight forward - in this case.

You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).

> >> SalaryMax.Text =
> >> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
> >>
> >> At the time, I couldn't seem to find as simple a solution as this in
> >> VB.Net
> >> so I use this (not saying there isn't one).
> >
> > And of course there is:
> > SalaryMax.Text =
> > String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
> > .Replace(",", ""));
> >
> > I know which version I'd rather read...
>
> I can read either (although, I didn't know you could string multiple
> "Replace"s together).

Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for. Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?

> > But I suspect you're more used to regular expressions than many other
> > programmers - and making the code less readable for other programmers
> > for no benefit is what makes it unwarranted here, even in the simple
> > case where there's nothing to escape.
>
> First of all, I am not. I don't use it much at all, but I find it easy to
> figure out and staight forward (but you can make it really complex). I use
> it to validate phone numbers, credit card numbers, zip codes etc.

And in all of those cases, regular expressions are really useful.

> Which are very well documented and when there are a myiad of ways a
> user can put input these types of data, I prefer to use Regular
> expressions which are all over the place (easy to find) then try to
> come put with some complex set of loops and temporary variables which
> make it far easier to make a mistake and much more unreadable the the
> Regex equivelant.

Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?

> > Because it's more complicated! You can't deny that there's more to
> > consider due to the escaping. There's more to know, more to consider,
> > and it doesn't get the job done any more cleanly.
>
> Escaping seems to be your main compaint with it.

It's the main potential source of problems, yes. It's a potential
source of problems which simply doesn't exist when you use
String.IndexOf.



> I have the same problem with C or VB when trying to remember when to use "\"
> vs "/" in paths or do I need to add "\" in front of my slash or quote.
> These are inherent problems with pretty much all of them.

You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.

> > As is using the power of regular expressions when there is an easier
> > way - using IndexOf, which is *precisely* there to find one string
> > within another.
>
> I am not discounting IndexOf, I am just saying that both work fine and are
> just as readable (in this case). In other cases, that may not be the case
> (with either C or Regex).

Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?

> > Do you really think it would take you that long to refamiliarise
> > yourself with it? I don't see why it's a good idea to make some poor
> > maintenance engineer who hasn't used regular expressions before try to
> > figure out that *actually* you were just trying to find strings within
> > each other just so you can keep your skill set current.
>
> So you would prefer to code to the lowest common denominator.

When there's no good reason not to, absolutely.

> I am not going to code to the level of a junior programmer. I prefer that
> he learn to code to a higher level.

Learning to solve problems as simply as possible *is* learning to code
to a higher level.

> I am not saying that that you still should write decent, readable, commented
> code. But I am not going to limit myself because another programmer may not
> be able to read well written code. If that were the case, I would not be
> writing objects (abstract classes, interfaces, etc).

If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.

> > I've never had a problem with reading the documentation when I've
> > needed to use regular expressions, without putting it in projects in
> > places where I *don't* need it.
>
> "Need" is a personal question. I don't thing it applies here. You prefer
> IndexOf and I might prefer IsMatch.

I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?

> > Because it makes things more complicated for no benefit. The reflection
> > example was a good one - that allows you to get a property value, so do
> > you think it's a good idea to write:
> >
> > string x = (string) something.GetType()
> > .GetProperty("Name")
> > .GetValue(something, null);
> > or
> >
> > string x = something.Name;
> >
> > ?
> >
> > Maybe I should use the latter. After all, I wouldn't want to forget how
> > to use reflection, would I?
>
> Lost me on that one.

Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?

tshad

unread,
Sep 20, 2005, 6:31:12 PM9/20/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d9a634f1...@msnews.microsoft.com...

> tshad <tschei...@ftsolutions.com> wrote:
>> > It is - the regular expression *language* is a different language to
>> > C#, in the same way that XPath is. That's why under "regular
>> > expressions" in MSDN, there's a "language elements" section.
>>
>> I think calling it a language is a stretch, although I know it is called
>> a
>> language in places(it's all in what you define as a language).
>
> In plenty of places. It has a language with a defined syntax etc.

Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).


>
>> It really is
>> a text/string processor, as is: IndexOf, Substring, Right, Replace etc
>> used
>> by various languages.
>>
>> You don't build pages with it. It isn't procedural.
>
> Neither of those are required for it to be a language.
>
>> It is a tool used by the other languages.
>
> Sure - so is XPath, but that's a language too.
> (See http://www.w3.org/TR/xpath)
>
>> You don't use VB.Net in C# or Vice versa but both use
>> Regular expressions (as the both use Substring, Replace etc).
>
> None of those state that regular expressions aren't a language.
>
>> > But not as instantly clear, I believe. Can you really say that you find
>> > the regex version doesn't take you *any* longer to understand than the
>> > non-regex version?
>>
>> Depends on the C# code as well as the Regex code.
>
> The C# code in question would be:
>
> if (someVariable.IndexOf ("firstliteral") != -1 ||
> someVariable.IndexOf ("secondliteral") != -1 ||
> someVariable.IndexOf ("thirdliteral") != -1)
>

And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))

> If I did it regularly, I'd write a short method which took a params


> string array.
>
>> Again, are we talking about the best tool for the job or the most
>> readability.
>
> Unless there's another compelling argument in favour of one tool or
> another, readability is a very important part of choosing the best
> tool.

Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."

>
>> As was mentioned before, you set up loops and temporary
>> variables to do what you can do in a simple Regular Expression.
>>
>> Again, I am not pushing Regular Expressions here, just that they are just
>> a
>> valid as C# (or VB.Net) string handlers.
>
> But you're effectively pushing them in the situation described by the
> OP when you say that the solution using regular expressions is as
> readable as the solution without.

No.

No pushing. No more than your pushing not using it.

>
>> I do use them when convenient.
>>
>> For example, I was creating a simple text search engine and wanted to
>> modify
>> what the user put in and found it simpler to do the following than in VB
>> or
>> C:
>>
>> ' The following replaces all multiple blanks with " ". It then takes
>> ' out the anomalies, such as "and not and" and replaces them with "and"
>>
>> keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
>> keywords = Regex.Replace(keywords, "( )", " or ")
>> keywords = Regex.Replace(keywords," or or "," ")
>> keywords = Regex.Replace(keywords,"or and or","and")
>> keywords = Regex.Replace(keywords,"or near or","near")
>> keywords = Regex.Replace(keywords,"and not or","and not")
>>
>> Fairly straight forward and easy to follow.
>
> Reasonably, although apart from the first regex, I'd suggest doing the
> rest with straight calls to String.Replace. As an example of why I
> think that would be more readable, what exactly do the second line do?

Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().

> In some flavours of regular expressions, brackets form capturing
> groups. Do they in .NET? I'd have to look it up. If it's really just
> trying to replace the string "( )" with " or ", a call to
> String.Replace would mean I didn't need to look anything up.

Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.

Just that you don't want to Regex as it is not easily readable. Neither are
Regex.

But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?

>
>> >> But if you know both and as I (and you) mentioned regex is part of
>> >> .net
>> >> as is C# - so it is already in the mix.
>> >
>> > No, it's not. It's not already used in every single C# program, any
>> > more than SQL is.
>>
>> Nor are all the objects you use.
>>
>> But if you are using .Net, it is part of the mix.
>
> It's not necessarily part of the mix I have to use.

You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex. I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
.Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.

>I suspect *very*
> few programs don't do any string manipulation - knowing the string
> methods well is *far* more fundamental to .NET programming than knowing
> regular expressions.

I agree with part of that and think that regular expressions are just as
important to know. As we have been saying, it is here and many people use
it, so to not understand it is to limit yourself. You don't have to use it,
but you should at least understand the basics of how it works. What are you
going to do when someone uses a RegularExpressionValidator and you don't
understand what the expression is? The fact that it is not C# (neither is a
textbox, datagrid, etc), doesn't mean you should understand them. Whether
you use them is up to you.

As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?

I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.


>
>> > In what way is it 6 of one or half a dozen of the other when one
>> > solution requires knowing more than the other? I would expect *any* C#
>> > programmer to know what String.IndexOf does. I wouldn't expect all C#
>> > programmers to know by heart which regex language elements require
>> > escaping - and if you don't know that off the top of your head, then
>> > changing the code to search for a different string involves an extra
>> > bit of brainpower.
>>
>> Why? Ever heard of references or cheat sheets? And what is wrong with a
>> little extra brainpower - if you don't use it, you lose it :)
>
> If you truly think that given two solutions which are otherwise equal,
> the solution which is easiest to write, read and maintain doesn't win
> hands down, we'll definitely never agree.
>

I agree there.

Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.

> If you want to keep your hand in with respect to regular expressions,
> do it in a test project, or with a regular expressions workbench. Keep
> it out of code which needs to be read and maintained, probably by other
> people who don't want to waste time because you wanted to keep your
> skill set up to date.
>

Keep regular expressions out of my code?????

So now you are saying there is no use for it?

>> I don't know all of the possible combinations of calls to every Object,
>> but
>> that doesn't preclude me from using them.
>
> Exactly - and you wouldn't go out of your way to use methods you don't
> need, just to get into the habit of using them, would you?

Sure.

If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.


>
>> My position has always been, don't memorize. You will remember what you
>> use. But if you know how to get it (where to look), then you have
>> everything you need.
>
> Absolutely - so why are you so keen on making people either memorise or
> look up the characters which need escaping for regular expressions
> every time they read or modify your code?
>

I am not. I don't memorize. But I still use it.

>> I happen to use .Net. Regex is part of .Net. I would be limiting myself
>> if
>> I didn't use Regex in places where it is appropriate.
>
> I seem to be having difficulty making myself clear on this point: I
> have never stated and will never state that you shouldn't use regular
> expressions where they're appropriate. But they are *not* appropriate
> in this case, as they are a more complex and less readable way of
> solving the problem.

No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.

If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.

As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.

>
> Show me a problem where the regex way of solving it is simpler than
> using simple string operations (and there are plenty of problems like
> that) and I'll plump for the regex in a heartbeat.
>
>> If I happen to know a good way in Regex to solve a problem, I am not
>> going use *extra brainpower* to try to solve the problem in C#.
>
> In what way is using the method which is designed for *precisely* the
> task in hand (finding something in a string) using extra brainpower?

I wasn't referring to this particular issue when I said this.

>If
> you're not familiar with String.IndexOf, you've got *much* bigger
> things to worry about than whether or not your regular expression
> skills are getting rusty.

I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".

****************************************************************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

***************************************************************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.


>
>> > It was *less* readable though - and would have been *significantly*
>> > less readable if the string being searched for had included dots,
>> > brackets etc.
>>
>> But it didn't. But if it did, it is no different than having to deal
>> with
>> escapes in C (less readable)
>>
>> If you are talking about
>>
>> if ((someString.IndexOf("something1",0) >= 0) ||
>> ((someString.IndexOf("something2",0) >= 0) ||
>> ((someString.IndexOf("something3",0) >= 0))
>> {
>> Do something
>> }
>>
>> vs
>>
>> if (Regex.IsMatch(myString, @"something1|something2|something3"))
>>
>> If you know absolutely nothing about Regular expressions, I would agree
>> that
>> this is less readable.
>>
>> But I would also contend that IndexOf could be just as confusing. What
>> is
>> the first 0 for? What about the 2nd? It is readable because you know C.
>
> Well, for a start the 0s aren't necessary, and I wouldn't include them.

You're right.

>
>> I would maintain that if even if you knew nothing about Regex, you would
>> assume that you are doing a Match (can't tell that from the word
>> "IndexOf")
>> and it probably has something to do with the words "something1",
>> "something2" and "something3". Now if you know C than I would assume you
>> would pick up that "|" is "or" (not so clear to a VB programmer). And
>> that
>> would be to someone not familier with regular expressions doing a quick
>> perusal
>
> Okay - now suppose I need to change it from searching for "something1"
> to "something.1" or "something[1]". How long does it take to change in
> each version? How easy is it to read afterwards?

That wasn't the question.

What if you wanted to change "something1" to "something\". Same problem.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.


>
>> So I am at a loss as to how this regular expression is more unreadable
>> than
>> the C# counterpart. That is not to say that you couldn't make it more
>> unreadable - but you could do the same with C# if you wanted to.
>
> You could start by making the C# more readable, as I've shown...

As you can with Regular Expressions.

>
> However, the regex is already less readable:
> 1) It's got "|" as a "magic character" in there.

| = or (same as C)

> 2) It's got all the strings concatenated, so it's harder to spot each
> of them separately.

You are kidding, right?

>
> And that's before you need to actually *maintain* the code.
>
> Furthermore, suppose you didn't just want to search for literals -
> suppose one of the strings you wanted to search for was contained in a
> variable. How sure are you that *no-one* on your team would use:
>
> x+"|something2|something3"
>
> as the regular expression?
>


You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.

>> > I suspect not all programmers would though. Don't forget that the
>> > person who writes the code is very often not the one to maintain it.
>> > Can you guarantee that *everyone* who touches the code will find
>> > regexes as readable as String.IndexOf?
>>
>> As was said, you can make readable and unreadable C or Regex code. Are
>> you
>> going to tell your programmers they "cannot" use Regex for the same
>> reason?
>
> I would tell programmers on my team not to use regular expressions
> where the alternative is simpler and more readbale, yes.

Why use them at all? It isn't readable.

And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.


>
>> Are you going to leave out some objects that programmers may not be
>> familier
>> with?
>
> Absolutely, where there are simpler and more familiar ways of solving
> the same problem.
>
>> > Which is why I've said repeatedly that I'm not trying to suggest that
>> > regexes are bad, or should never be used. I'm just saying that in this
>> > case it's using a sledgehammer to crack a nut.
>>
>> And I don't in this case, as I think I've shown. Less typing, easy to
>> read,
>> straight forward - in this case.
>
> You've shown nothing of the kind - whereas I think I've given plenty of
> examples of how using regular expressions make the code less easily
> maintainable, even if you consider it equally readable to start with
> (which I don't).

Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.

But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).

>
>> >> SalaryMax.Text =
>> >> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
>> >>
>> >> At the time, I couldn't seem to find as simple a solution as this in
>> >> VB.Net
>> >> so I use this (not saying there isn't one).
>> >
>> > And of course there is:
>> > SalaryMax.Text =
>> > String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
>> > .Replace(",", ""));
>> >
>> > I know which version I'd rather read...
>>
>> I can read either (although, I didn't know you could string multiple
>> "Replace"s together).
>
> Yes, I can read either too. The point is that in reading my version, I
> didn't need to wade through various special characters, understanding
> exactly what was there for.

If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.

>Of course, your version wasn't even valid
> C#, as it didn't escape the backslashes and you didn't specify a
> verbatim literal. I assume it was originally VB.NET. I wonder which
> version would be easier to convert to valid C#? Mine, perhaps?

Actually, it was VB.Net.

>
>> > But I suspect you're more used to regular expressions than many other
>> > programmers - and making the code less readable for other programmers
>> > for no benefit is what makes it unwarranted here, even in the simple
>> > case where there's nothing to escape.
>>
>> First of all, I am not. I don't use it much at all, but I find it easy
>> to
>> figure out and staight forward (but you can make it really complex). I
>> use
>> it to validate phone numbers, credit card numbers, zip codes etc.
>
> And in all of those cases, regular expressions are really useful.

But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it. Definately if they would have a problem with
our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.

>
>> Which are very well documented and when there are a myiad of ways a
>> user can put input these types of data, I prefer to use Regular
>> expressions which are all over the place (easy to find) then try to
>> come put with some complex set of loops and temporary variables which
>> make it far easier to make a mistake and much more unreadable the the
>> Regex equivelant.
>
> Where exactly are the complex loops and temporary variables in this
> specific case? After all, you have been arguing for using regular
> expressions in *this specific case*, haven't you?
>

I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.

>> > Because it's more complicated! You can't deny that there's more to
>> > consider due to the escaping. There's more to know, more to consider,
>> > and it doesn't get the job done any more cleanly.
>>
>> Escaping seems to be your main compaint with it.
>
> It's the main potential source of problems, yes. It's a potential
> source of problems which simply doesn't exist when you use
> String.IndexOf.
>
>> I have the same problem with C or VB when trying to remember when to use
>> "\"
>> vs "/" in paths or do I need to add "\" in front of my slash or quote.
>> These are inherent problems with pretty much all of them.
>
> You already need to know that when writing C# though - my use of
> String.IndexOf doesn't add to the volume of knowledge required.
>

It is still an issue. Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.

>> > As is using the power of regular expressions when there is an easier
>> > way - using IndexOf, which is *precisely* there to find one string
>> > within another.
>>
>> I am not discounting IndexOf, I am just saying that both work fine and
>> are
>> just as readable (in this case). In other cases, that may not be the
>> case
>> (with either C or Regex).
>
> Just because they're as readable *to you* doesn't mean they're as
> readable to everyone. How sure are you that the next engineer to read
> this code will be familiar with regular expressions? How sure are you
> that when you need to change it to look for a different string, you'll
> check whether any of the characters need to be escaped? Why would you
> even want to force that check on yourself?

Again - then don't allow them at all.


>
>> > Do you really think it would take you that long to refamiliarise
>> > yourself with it? I don't see why it's a good idea to make some poor
>> > maintenance engineer who hasn't used regular expressions before try to
>> > figure out that *actually* you were just trying to find strings within
>> > each other just so you can keep your skill set current.
>>
>> So you would prefer to code to the lowest common denominator.
>
> When there's no good reason not to, absolutely.

I guess that is where we disagree.


>
>> I am not going to code to the level of a junior programmer. I prefer
>> that
>> he learn to code to a higher level.
>
> Learning to solve problems as simply as possible *is* learning to code
> to a higher level.

No argument there.


>
>> I am not saying that that you still should write decent, readable,
>> commented
>> code. But I am not going to limit myself because another programmer may
>> not
>> be able to read well written code. If that were the case, I would not be
>> writing objects (abstract classes, interfaces, etc).
>
> If it's not the simplest code for the situation, it's not well written
> IMO. If it introduces risk for no reward (the risk of maintenance
> failing to notice that they might need to escape something, versus no
> reward) then it's not well written.
>

I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).

>> > I've never had a problem with reading the documentation when I've
>> > needed to use regular expressions, without putting it in projects in
>> > places where I *don't* need it.
>>
>> "Need" is a personal question. I don't thing it applies here. You
>> prefer
>> IndexOf and I might prefer IsMatch.
>
> I bet if I showed my code to a random sample of a hundred C# developers
> and asked them to change it to search for "hello[there]", virtually all
> of them would get it right. I also bet that if I showed your code to
> them and asked them for the same change, some would fail to escape it
> appropriately. Do you disagree?

No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.


>
>> > Because it makes things more complicated for no benefit. The reflection
>> > example was a good one - that allows you to get a property value, so do
>> > you think it's a good idea to write:
>> >
>> > string x = (string) something.GetType()
>> > .GetProperty("Name")
>> > .GetValue(something, null);
>> > or
>> >
>> > string x = something.Name;
>> >
>> > ?
>> >
>> > Maybe I should use the latter. After all, I wouldn't want to forget how
>> > to use reflection, would I?
>>
>> Lost me on that one.
>
> Both are ways of finding the value of a property. The first is harder
> to maintain and harder to read, just like your use of regular
> expressions in this instance. Now, which of the above snippets of code
> would you use, and why?

Since I am not sure why you would use the first, I would do the 2nd.

But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.

Tom


Jon Skeet [C# MVP]

unread,
Sep 20, 2005, 8:07:26 PM9/20/05
to
tshad <tschei...@ftsolutions.com> wrote:
> > In plenty of places. It has a language with a defined syntax etc.
>
> Yes, but so are dolphin sounds.
>
> When I talk about a Programming Language - I am talking about a Procedural
> Language (C, Fortran, VB, Pascal, etc.).

So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

Of course, you didn't even specify "programming language" before.

Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.

> > The C# code in question would be:
> >
> > if (someVariable.IndexOf ("firstliteral") != -1 ||
> > someVariable.IndexOf ("secondliteral") != -1 ||
> > someVariable.IndexOf ("thirdliteral") != -1)
> >
>
> And the Regex version:
>
> if (Regex.IsMatch(myString, @"something1|something2|something3"))

Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that



> > Unless there's another compelling argument in favour of one tool or
> > another, readability is a very important part of choosing the best
> > tool.
>
> Again, why do I need a compelling reason. If I have the solution and it
> happens to be Regex, I would use it, I wouldn't necessarily say to myself -
> "Is there perhaps a more readable way to write this? I wonder if Jim will
> be able to read this or not."

Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.

> > But you're effectively pushing them in the situation described by the
> > OP when you say that the solution using regular expressions is as
> > readable as the solution without.
>
> No.
>
> No pushing. No more than your pushing not using it.

But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?

> >> ' The following replaces all multiple blanks with " ". It then takes
> >> ' out the anomalies, such as "and not and" and replaces them with "and"
> >>
> >> keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
> >> keywords = Regex.Replace(keywords, "( )", " or ")
> >> keywords = Regex.Replace(keywords," or or "," ")
> >> keywords = Regex.Replace(keywords,"or and or","and")
> >> keywords = Regex.Replace(keywords,"or near or","near")
> >> keywords = Regex.Replace(keywords,"and not or","and not")
> >>
> >> Fairly straight forward and easy to follow.
> >
> > Reasonably, although apart from the first regex, I'd suggest doing the
> > rest with straight calls to String.Replace. As an example of why I
> > think that would be more readable, what exactly do the second line do?
>
> Actually, nothing. It is grouping a " ", which isn't necessary. I think I
> used to have something else there and took it out and didn't realize I
> didn't need the ().

So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.

> > In some flavours of regular expressions, brackets form capturing
> > groups. Do they in .NET? I'd have to look it up. If it's really just
> > trying to replace the string "( )" with " or ", a call to
> > String.Replace would mean I didn't need to look anything up.
>
> Obviously, you didn't need to look this one up either - as you were correct.
> It is just grouping a blank.

I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.

> >> But writing objects and the objects themselves are not easily readable.
> >> But
> >> you would advocate not writing them, would you?
> >
> > No, but I don't see how that's relevant.
>
> Just that you don't want to Regex as it is not easily readable. Neither are
> Regex.

Eh?

> But the fact a junior programmer might not understand Objects as you do
> would not prevent you from writing them, would you?

When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.

> >> But if you are using .Net, it is part of the mix.
> >
> > It's not necessarily part of the mix I have to use.
>
> You don't have to use lots of things. That doesn't make them invalid.
> Neither is the fact that you use Foreach vs For {}. They are there and are
> part of the mix as is Regex.

No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.

> I might agree with you more if Regex were some
> component that you picked up and added. Or if Regex were some obscure
> technique that few knew about. They have been around for quite a long time
> and is just another gun in your arsenal. If I thought that MS were
> deprecating it, I would also think twice about using it. But it is part of
> .Net that all the languages can make use of and I would never tell a
> programmer, who may be really comfortable with it and uses it responsibly
> (not obscure cryptic non-commented code), that he should be using IndexOf
> instead.

Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.

> >I suspect *very*
> > few programs don't do any string manipulation - knowing the string
> > methods well is *far* more fundamental to .NET programming than knowing
> > regular expressions.
>
> I agree with part of that and think that regular expressions are just as
> important to know.

Why? I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once. I
suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.

> As we have been saying, it is here and many people use it, so to not
> understand it is to limit yourself.

It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.

> You don't have to use it, but you should at least understand the
> basics of how it works. What are you going to do when someone uses a
> RegularExpressionValidator and you don't understand what the
> expression is?

At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?

> The fact that it is not C# (neither is a textbox, datagrid, etc),
> doesn't mean you should understand them. Whether you use them is up
> to you.
>
> As you point out, you are not the only programmer and many programmers like
> to use Regex and that doesn't make them any lesser programmers. What are
> you going to when you run into their code?

If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.



> I see code all the time (much of the time it is mine) and wonder why the
> programmer didn't do it another way. There are many ways to skin a cat.
> Sometimes it is just style, sometimes it is all they know. But if they
> follow whatever standards are setup (and in your case maybe you forbid
> Regex) then as long as the code is well written and clean - I have no
> problem with it.

If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.

> > If you truly think that given two solutions which are otherwise equal,
> > the solution which is easiest to write, read and maintain doesn't win
> > hands down, we'll definitely never agree.
>
> I agree there.
>
> Which is easier to write is obviously your perception. I found my example,
> as easy as yours to write and just as readable.

And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.



> > If you want to keep your hand in with respect to regular expressions,
> > do it in a test project, or with a regular expressions workbench. Keep
> > it out of code which needs to be read and maintained, probably by other
> > people who don't want to waste time because you wanted to keep your
> > skill set up to date.
>
> Keep regular expressions out of my code?????
>
> So now you are saying there is no use for it?

Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.



> >> I don't know all of the possible combinations of calls to every Object,
> >> but that doesn't preclude me from using them.
> >
> > Exactly - and you wouldn't go out of your way to use methods you don't
> > need, just to get into the habit of using them, would you?
>
> Sure.
>
> If it is valid. As I said there are many ways to skin ..., depending on the
> situation I may do it one way and the next time another way. Gives me many
> options. I don't do it willy nilly, as you seem to suggest, as a test
> bench.

But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.

> > Absolutely - so why are you so keen on making people either memorise or
> > look up the characters which need escaping for regular expressions
> > every time they read or modify your code?
>
> I am not. I don't memorize. But I still use it.

Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.



> > I seem to be having difficulty making myself clear on this point: I
> > have never stated and will never state that you shouldn't use regular
> > expressions where they're appropriate. But they are *not* appropriate
> > in this case, as they are a more complex and less readable way of
> > solving the problem.
>
> No you are very clear. If you are so concerned with others being able to
> read your code and problems with escape characters - why would you EVER want
> them to use them. You can't have it both ways.

I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.

> If they would have a hard time with a nothing expression like "if
> (Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
> never going to get some of the of the other standard Regex solutions I
> mentioned before.

Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?

> As you said, the two solutions are equal. Your solution is that you MUST go
> with IndexOf. Mine is you can use either.

Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.

> > Show me a problem where the regex way of solving it is simpler than
> > using simple string operations (and there are plenty of problems like
> > that) and I'll plump for the regex in a heartbeat.
> >
> >> If I happen to know a good way in Regex to solve a problem, I am not
> >> going use *extra brainpower* to try to solve the problem in C#.
> >
> > In what way is using the method which is designed for *precisely* the
> > task in hand (finding something in a string) using extra brainpower?
>
> I wasn't referring to this particular issue when I said this.

It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?

> >If
> > you're not familiar with String.IndexOf, you've got *much* bigger
> > things to worry about than whether or not your regular expression
> > skills are getting rusty.
>
> I never said I was not familier with IndexOf.
>
> As a matter of fact, the original question was given whether you could "do a
> search for more that one string in another string".

And of course the answer is "yes, by calling IndexOf multiple times".

> ****************************************************************
> Can you do a search for more that one string in another string?
>
> Something like:
>
> someString.IndexOf("something1","something2","something3",0)
>
> or would you have to do something like:
>
> if ((someString.IndexOf("something1",0) >= 0) ||
> ((someString.IndexOf("something2",0) >= 0) ||
> ((someString.IndexOf("something3",0) >= 0))
> {
> Do something
> }
> ***************************************************************************
> IndexOf doesn't do it. This was the original question. You have to do
> multiple calls as is said in the original question. Nicholas was correct in
> his assessment. One Regex call would work.

Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.

> > Okay - now suppose I need to change it from searching for "something1"
> > to "something.1" or "something[1]". How long does it take to change in
> > each version? How easy is it to read afterwards?
>
> That wasn't the question.

Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?

> What if you wanted to change "something1" to "something\". Same problem.

Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.

> And if escapes were a problem (if it were me) I would have a little sheet
> that showed them at my desk within easy reach.

Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.

> >> So I am at a loss as to how this regular expression is more unreadable
> >> than
> >> the C# counterpart. That is not to say that you couldn't make it more
> >> unreadable - but you could do the same with C# if you wanted to.
> >
> > You could start by making the C# more readable, as I've shown...
>
> As you can with Regular Expressions.

Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.

Neither is as readable as the String.IndexOf version, however.

> > However, the regex is already less readable:
> > 1) It's got "|" as a "magic character" in there.
>
> | = or (same as C)

Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.

> > 2) It's got all the strings concatenated, so it's harder to spot each
> > of them separately.
>
> You are kidding, right?

Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.

> > Furthermore, suppose you didn't just want to search for literals -
> > suppose one of the strings you wanted to search for was contained in a
> > variable. How sure are you that *no-one* on your team would use:
> >
> > x+"|something2|something3"
> >
> > as the regular expression?
>
> You are now leaving the original question. I never said that Regular
> Expressions was the better (or not better) in all cases.

While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.

> > I would tell programmers on my team not to use regular expressions
> > where the alternative is simpler and more readbale, yes.
>
> Why use them at all? It isn't readable.

They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.

Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.

> And if your programmers can't maintain the simple Regexs, they definately
> won't be able to handle the more complicated ones.

You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?

> > You've shown nothing of the kind - whereas I think I've given plenty of
> > examples of how using regular expressions make the code less easily
> > maintainable, even if you consider it equally readable to start with
> > (which I don't).
>
> Not in this specific case. I was never maintaining or pushing Regex for all
> or any situations.

But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.



> But I am not going to force my programmers to come to me to find out whether
> or not Regex is the easiest way or not. That is up to the programmer. If
> there is a problem with their code and feel the programmer is way off base
> in his coding we would talk about (that would be the case with his C#, VB or
> Regex code).

Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.

> > Yes, I can read either too. The point is that in reading my version, I
> > didn't need to wade through various special characters, understanding
> > exactly what was there for.
>
> If you knew enough to know about Regex at all (which you said you would have
> no problem with in some situations - so the programmers better be able to
> read it), there should not be a problem with the 2 special characters which
> is the same as C#. There is nothing obscure in this example - that I can
> see.

Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.

> >Of course, your version wasn't even valid
> > C#, as it didn't escape the backslashes and you didn't specify a
> > verbatim literal. I assume it was originally VB.NET. I wonder which
> > version would be easier to convert to valid C#? Mine, perhaps?
>
> Actually, it was VB.Net.

Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.

> > And in all of those cases, regular expressions are really useful.
>
> But according to you, you shouldn't use them as some of the programmers may
> not be able to maintain it.

<sigh> If you actually believe that, you haven't been reading what I've
been writing.

> Definately if they would have a problem with our example.
>
> Can't have it both ways. If you allow Regular Expressions, you shouldn't
> have a problem if a programmer used the Regex or IndexOf in our example.
> Anyone maintaining the "USEFUL" ones would have zero problems with this one.

How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?

> >> Which are very well documented and when there are a myiad of ways a
> >> user can put input these types of data, I prefer to use Regular
> >> expressions which are all over the place (easy to find) then try to
> >> come put with some complex set of loops and temporary variables which
> >> make it far easier to make a mistake and much more unreadable the the
> >> Regex equivelant.
> >
> > Where exactly are the complex loops and temporary variables in this
> > specific case? After all, you have been arguing for using regular
> > expressions in *this specific case*, haven't you?
>
> I was obviously talking about Regular Expressions in general here as I was
> refering to the standard ones you can get anywhere dealing with (Phone
> numbers, credit card etc). There would be none in this case, obviously.
> But there may be in more complicated cases.

Yes - the complicated cases where I've already said that regular
expressions are useful!

> > You already need to know that when writing C# though - my use of
> > String.IndexOf doesn't add to the volume of knowledge required.
>
> It is still an issue.

Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.

> Just as the Regular expressions are. And again, if
> you are going to allow Regex at all, you would still need to know about the
> escapes.

You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.



> > Just because they're as readable *to you* doesn't mean they're as
> > readable to everyone. How sure are you that the next engineer to read
> > this code will be familiar with regular expressions? How sure are you
> > that when you need to change it to look for a different string, you'll
> > check whether any of the characters need to be escaped? Why would you
> > even want to force that check on yourself?
>
> Again - then don't allow them at all.

No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.

> > When there's no good reason not to, absolutely.
>
> I guess that is where we disagree.

It certainly sounds like it.

> >> I am not going to code to the level of a junior programmer. I prefer
> >> that
> >> he learn to code to a higher level.
> >
> > Learning to solve problems as simply as possible *is* learning to code
> > to a higher level.
>
> No argument there.

But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.

> > If it's not the simplest code for the situation, it's not well written
> > IMO. If it introduces risk for no reward (the risk of maintenance
> > failing to notice that they might need to escape something, versus no
> > reward) then it's not well written.
>
> I see no risk in the example we are talking about. At least, no more that
> in the IndexOf solution (in this situation).

You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.



> > I bet if I showed my code to a random sample of a hundred C# developers
> > and asked them to change it to search for "hello[there]", virtually all
> > of them would get it right. I also bet that if I showed your code to
> > them and asked them for the same change, some would fail to escape it
> > appropriately. Do you disagree?
>
> No. But then the same developers would have a problem with the more
> complicated expressions you claim is useful.

Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.

> > Both are ways of finding the value of a property. The first is harder
> > to maintain and harder to read, just like your use of regular
> > expressions in this instance. Now, which of the above snippets of code
> > would you use, and why?
>
> Since I am not sure why you would use the first, I would do the 2nd.

You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.

> But in our case, I would still use either - as I see the Regex version as
> easy as the IndexOf.

I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.

tshad

unread,
Sep 26, 2005, 9:24:35 PM9/26/05
to
I'm back.

Was a little busy and didn't have time to respond.

"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message

news:MPG.1d9ab49a4...@msnews.microsoft.com...


> tshad <tschei...@ftsolutions.com> wrote:
>> > In plenty of places. It has a language with a defined syntax etc.
>>
>> Yes, but so are dolphin sounds.
>>
>> When I talk about a Programming Language - I am talking about a
>> Procedural
>> Language (C, Fortran, VB, Pascal, etc.).
>
> So you wouldn't regard LISP as a programming language, just because
> it's functional rather than procedural?
>

I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.

> Of course, you didn't even specify "programming language" before.

True.

But I did specify, that it depends on how you define it.

>
> Regular expressions form a language in computing, and that language
> needs to be learned before being used, just as any other language does,
> whether it's C#, HTML, XPath or VB.NET.
>

OK

>> > The C# code in question would be:
>> >
>> > if (someVariable.IndexOf ("firstliteral") != -1 ||
>> > someVariable.IndexOf ("secondliteral") != -1 ||
>> > someVariable.IndexOf ("thirdliteral") != -1)
>> >
>>
>> And the Regex version:
>>
>> if (Regex.IsMatch(myString, @"something1|something2|something3"))
>
> Right. Immediately the IndexOf value is more readable, by more clearly
> separating the three separate strings which are being searched on.
> (Oliver Sturm's version is more readable than that


I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what it
is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
understand C and so the IndexOf - which doesn't really telling you what it
is doing. IsMatch is much more understandable term than IndexOf.

>
>> > Unless there's another compelling argument in favour of one tool or
>> > another, readability is a very important part of choosing the best
>> > tool.
>>
>> Again, why do I need a compelling reason. If I have the solution and it
>> happens to be Regex, I would use it, I wouldn't necessarily say to
>> myself -
>> "Is there perhaps a more readable way to write this? I wonder if Jim
>> will
>> be able to read this or not."
>
> Then I'm afraid that's your problem. It sounds like you're basically
> admitting that you're not that interested in readability. Personally, I
> like writing code which is elegant but easy to maintain. Having *a*
> solution which happens to work isn't enough when there are obviously
> others available which could well be simpler.
>

I never said that.

I never said readability is not an issue, but I am not going to write "Cat
in the Hat" instead of a novel so that the programmers with the simplest of
experience can read it. But I am not going to write cryptic code either so
they can't read it.

I assume there are company standards to program by and I would follow that.

>Having *a*
> solution which happens to work isn't enough when there are obviously
> others available which could well be simpler.

I am not writing simple code, I am writing code to handle a problem. I
prefer to write good code not simple code. Sometimes they are synonymous,
sometimes they aren't.

But in our case, I still them as equally readable.

> Far more time is spent maintaining code than writing it in the first
> place. Taking the attitude you take above just isn't cost-effective in
> the long run.

Don't agree there.


>
>> > But you're effectively pushing them in the situation described by the
>> > OP when you say that the solution using regular expressions is as
>> > readable as the solution without.
>>
>> No.
>>
>> No pushing. No more than your pushing not using it.
>
> But I'll readily admit to pushing the (IMO simpler) solution, for this
> particular situation. So are you actually admitting that you *are*
> pushing the use of regular expressions here?
>

In your opinion (as you say).

And you obviously are not listening. I am not pushing either side. I have
been saying over and over that in this situation, they are the same (IMO).
I am not pushing Regex nor am I ruling them out. You however, can't make up
your mind. One minute you say that something as simple as the example we
are using is too complex for a programmer and then proceed to say that you
would use Regex in other situations (which would have to be more
complicated), makes no sense.

>> >> ' The following replaces all multiple blanks with " ". It then takes
>> >> ' out the anomalies, such as "and not and" and replaces them with
>> >> "and"
>> >>
>> >> keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
>> >> keywords = Regex.Replace(keywords, "( )", " or ")
>> >> keywords = Regex.Replace(keywords," or or "," ")
>> >> keywords = Regex.Replace(keywords,"or and or","and")
>> >> keywords = Regex.Replace(keywords,"or near or","near")
>> >> keywords = Regex.Replace(keywords,"and not or","and not")
>> >>
>> >> Fairly straight forward and easy to follow.
>> >
>> > Reasonably, although apart from the first regex, I'd suggest doing the
>> > rest with straight calls to String.Replace. As an example of why I
>> > think that would be more readable, what exactly do the second line do?
>>
>> Actually, nothing. It is grouping a " ", which isn't necessary. I think
>> I
>> used to have something else there and took it out and didn't realize I
>> didn't need the ().
>
> So again, the code could be made more readable even by just modifying
> the existing regex replacement, let alone by replacing the regular
> expressions with simple String.Replace calls. Had they been
> String.Replace calls, the meaning of the second line would have been
> unambiguous - you'd have had to write it the simple way to start with.
>

I am not saying there may not be other ways to write the code. As I said, I
often rewrite my own code later as I see a way I like better that I may not
have thought of at the time I wrote it. Many times it isn't better code,
just different.

> Note that your first replacement will replace two tabs with a single
> space, but leave one tab alone, by the way. It would be better to
> replace "\s+" with the space, IMO.

Probably true. I am not a Regex expert. That was what I came up with at
the time.


>
>> > In some flavours of regular expressions, brackets form capturing
>> > groups. Do they in .NET? I'd have to look it up. If it's really just
>> > trying to replace the string "( )" with " or ", a call to
>> > String.Replace would mean I didn't need to look anything up.
>>
>> Obviously, you didn't need to look this one up either - as you were
>> correct.
>> It is just grouping a blank.
>
> I have had to look it up if you hadn't been answering the question
> though. Why make the code harder to understand in the first place? If
> you want to replace a space with " or ", just use
> keywords = keywords.Replace (" ", " or ");
> Much more straightforward.
>

Even in C, which I have used for years, I have to look up parameters to make
sure I have the right parameters and have them in the right order.

As I said, the Parens were probably a mistake and may have made some changes
to the line and left the parens in. I agree yours is the correct one.

>> >> But writing objects and the objects themselves are not easily
>> >> readable.
>> >> But
>> >> you would advocate not writing them, would you?
>> >
>> > No, but I don't see how that's relevant.
>>
>> Just that you don't want to Regex as it is not easily readable. Neither
>> are
>> Regex.
>
> Eh?
>

Must have had a little brain fade there. Not sure what I was saying.

>> But the fact a junior programmer might not understand Objects as you do
>> would not prevent you from writing them, would you?
>
> When using C#, one has to use objects. I will almost always try to
> implement the simplest solution to a problem, unless there is a
> compelling reason to use a more complex solution. That way, anyone
> reading the code has to learn relatively little "extra" stuff beyond
> the language itself.

That isn't the point.

We are talking readability here. So don't write any objects. You can use
the ones you need to, but if you write objects and someone has to maintain
it, it could be a problem if he doesn't understand objects.

You can write the same code in straight C to do what objects do. We got
along fine before there were objects. So I think, based on your statements,
you should write the easier code that some very junior programmer might have
to read.

>
>> >> But if you are using .Net, it is part of the mix.
>> >
>> > It's not necessarily part of the mix I have to use.
>>
>> You don't have to use lots of things. That doesn't make them invalid.
>> Neither is the fact that you use Foreach vs For {}. They are there and
>> are
>> part of the mix as is Regex.
>
> No, they really aren't. for and foreach are well-defined in the C#
> language specification. If the program is in C# to start with, it is
> reasonable to assume competency in C# on the part of the reader of the
> code. It is *not* reasonable to assume competency in regular
> expressions, and while that wouldn't prevent me from using regular
> expressions where they provide value, they just *don't* here.

But I am not writing in C# only. I am writing in .Net.


>
>> I might agree with you more if Regex were some
>> component that you picked up and added. Or if Regex were some obscure
>> technique that few knew about. They have been around for quite a long
>> time
>> and is just another gun in your arsenal. If I thought that MS were
>> deprecating it, I would also think twice about using it. But it is part
>> of
>> .Net that all the languages can make use of and I would never tell a
>> programmer, who may be really comfortable with it and uses it responsibly
>> (not obscure cryptic non-commented code), that he should be using IndexOf
>> instead.
>
> Clearly not, as you seem to be keen on using them instead of simple
> string manipulations all over the place - if I saw anyone using regular
> expressions rather than String.Replace in the way you've shown in other
> code posts, that code would never get through code review.
>

Obviously, you micro manage more than I.

If you would have a problem with our examples, I don't think I would like to
work in your team.

In my area, if your code is reasonable and well written and it follows our
standards, it's fine.

>> >I suspect *very*
>> > few programs don't do any string manipulation - knowing the string
>> > methods well is *far* more fundamental to .NET programming than knowing
>> > regular expressions.
>>
>> I agree with part of that and think that regular expressions are just as
>> important to know.
>
> Why?

Because they are perfectly valid and as you said before there are some that
are useful (therefore, you should know them as someone might use them and
you may have to maintain it).

>I'm working on a fairly large project which hasn't needed to use
> regular expressions and wouldn't have benefitted from them once.

That's your style and position, but may not be someone else's.

>I suspect many people could say the same thing. I suspect very few if any
> of them could say the same thing about the basic string manipulation
> methods - and yet you were surprised to see that one could call Replace
> on the result of another Replace method call, which I'd consider a far
> more "basic" level of understanding than knowledge of regular
> expressions.
>
>> As we have been saying, it is here and many people use it, so to not
>> understand it is to limit yourself.
>
> It's one thing to understand the general power of regular expressions,
> so you would know when they may be applicable - it's another thing to
> use them when they serve no purpose beyond what can be more simply
> achieved with the simple String methods.
>
>> You don't have to use it, but you should at least understand the
>> basics of how it works. What are you going to do when someone uses a
>> RegularExpressionValidator and you don't understand what the
>> expression is?
>
> At that point, if I didn't understand the regular expression, I'd look
> it up in the documentation. Do you know every part of regular
> expression syntax off by heart?

According to your position, you should ban them altogether for ANY use,
since you can do anything in C# you can do in Regex.


>
>> The fact that it is not C# (neither is a textbox, datagrid, etc),
>> doesn't mean you should understand them. Whether you use them is up
>> to you.
>>
>> As you point out, you are not the only programmer and many programmers
>> like
>> to use Regex and that doesn't make them any lesser programmers. What are
>> you going to when you run into their code?
>
> If they're on my team, I'll tell them to refactor their code to only
> use them when they're appropriate, frankly.

Appropriate as defined by you. Why allow them at all?


>
>> I see code all the time (much of the time it is mine) and wonder why the
>> programmer didn't do it another way. There are many ways to skin a cat.
>> Sometimes it is just style, sometimes it is all they know. But if they
>> follow whatever standards are setup (and in your case maybe you forbid
>> Regex) then as long as the code is well written and clean - I have no
>> problem with it.
>
> If code uses regular expressions when they serve no purpose, it is
> *not* well written and clean though - it is less maintainable than it
> might be.
>

They serve a purpose. They do the same as your string routines, so there is
a pupose. Both are string handling routines.

>> > If you truly think that given two solutions which are otherwise equal,
>> > the solution which is easiest to write, read and maintain doesn't win
>> > hands down, we'll definitely never agree.
>>
>> I agree there.
>>
>> Which is easier to write is obviously your perception. I found my
>> example,
>> as easy as yours to write and just as readable.
>
> And you believe that everyone else does? Again, bear in mind that
> you're unlikely to be the only person ever to read your code.

So you should never EVER use Regex. Someone else might read your code.

This is going in circles.

As I said, I would have a problem with someone who couldn't figure out what
the example we were using was doing.


>
>> > If you want to keep your hand in with respect to regular expressions,
>> > do it in a test project, or with a regular expressions workbench. Keep
>> > it out of code which needs to be read and maintained, probably by other
>> > people who don't want to waste time because you wanted to keep your
>> > skill set up to date.
>>
>> Keep regular expressions out of my code?????
>>
>> So now you are saying there is no use for it?
>
> Not at all - I'm saying that you shouldn't put regular expressions in
> your code just for the sake of keeping your hand in. Use them where
> they're applicable, and only there.

There either is a use or not. You can't say there is a use for it and then
brow beat a programmer because he happens to like to use it. Has a
programmer got to come to you each time he wants to use it to get your
permission.

I can see it if he writes some obscure cyptic Regular Expression - but come
on.

>
>> >> I don't know all of the possible combinations of calls to every
>> >> Object,
>> >> but that doesn't preclude me from using them.
>> >
>> > Exactly - and you wouldn't go out of your way to use methods you don't
>> > need, just to get into the habit of using them, would you?
>>
>> Sure.
>>
>> If it is valid. As I said there are many ways to skin ..., depending on
>> the
>> situation I may do it one way and the next time another way. Gives me
>> many
>> options. I don't do it willy nilly, as you seem to suggest, as a test
>> bench.
>
> But that's *exactly* what you've suggested you should do with regular
> expressions - use them even when there's no real purpose in doing so,
> just so that you remember what they look like.

Sure.

If they are both perfectly valid, I might. Depends on my mood (you should
really have a problem with that). :)


>
>> > Absolutely - so why are you so keen on making people either memorise or
>> > look up the characters which need escaping for regular expressions
>> > every time they read or modify your code?
>>
>> I am not. I don't memorize. But I still use it.
>
> Okay, so you don't memorise it, which means you *do* have to look up
> which characters require escaping. I think you've just admitted that
> your code is less maintainable than mine.

No.

I can maintain my car, but I might still have to look up specs on it.


>
>> > I seem to be having difficulty making myself clear on this point: I
>> > have never stated and will never state that you shouldn't use regular
>> > expressions where they're appropriate. But they are *not* appropriate
>> > in this case, as they are a more complex and less readable way of
>> > solving the problem.
>>
>> No you are very clear. If you are so concerned with others being able to
>> read your code and problems with escape characters - why would you EVER
>> want
>> them to use them. You can't have it both ways.
>
> I would use them when the solution which uses regular expressions is
> clearer than the solution which doesn't use them. It seems a pretty
> simple policy to me.

If they are not readable, you shouldn't use them at all. I personally think
they are both readable, in this case.


>
>> If they would have a hard time with a nothing expression like "if
>> (Regex.IsMatch(myString, @"something1|something2|something3"))" - they
>> are
>> never going to get some of the of the other standard Regex solutions I
>> mentioned before.
>
> Those maintaining the code could no doubt understand it after looking
> at it for a little while, just like they could work out your other
> regular expressions after looking at them and consulting the
> documentation - but why are you trying to make their jobs harder? Why
> are you not concerned that the code you're writing is costing your
> company money by making it harder to maintain than it needs to be?

Again, then you feel there is no place for Regex as you can do anything with
C# that you can do with Regex. As you say, it will always be harder to
read.

>
>> As you said, the two solutions are equal. Your solution is that you MUST
>> go
>> with IndexOf. Mine is you can use either.
>
> Well, they're equal in terms of their semantics. They're definitely not
> equal in terms of maintainability, and as that's important to me, I
> don't see what's wrong with saying that I'm very strongly in favour of
> avoiding the less readable/maintainable code.
>

I didn't say that.

>> > Show me a problem where the regex way of solving it is simpler than
>> > using simple string operations (and there are plenty of problems like
>> > that) and I'll plump for the regex in a heartbeat.
>> >
>> >> If I happen to know a good way in Regex to solve a problem, I am not
>> >> going use *extra brainpower* to try to solve the problem in C#.
>> >
>> > In what way is using the method which is designed for *precisely* the
>> > task in hand (finding something in a string) using extra brainpower?
>>
>> I wasn't referring to this particular issue when I said this.
>
> It would have been nice if you'd indicated that. Do you agree then that
> it doesn't actually take any more brainpower to come up with
> String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
> brainpower when it comes to maintaining the IndexOf solution?
>

In this case, no. In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying that
it is valid and can be readable - can also be cryptic (as can C#).

>> >If
>> > you're not familiar with String.IndexOf, you've got *much* bigger
>> > things to worry about than whether or not your regular expression
>> > skills are getting rusty.
>>
>> I never said I was not familier with IndexOf.
>>
>> As a matter of fact, the original question was given whether you could
>> "do a
>> search for more that one string in another string".
>
> And of course the answer is "yes, by calling IndexOf multiple times".

That wasn't the question asked. That was the example that was given and the
question was can you do it in one statement.

So the answer is no, using IndexOf.


>
>> ****************************************************************
>> Can you do a search for more that one string in another string?
>>
>> Something like:
>>
>> someString.IndexOf("something1","something2","something3",0)
>>
>> or would you have to do something like:
>>
>> if ((someString.IndexOf("something1",0) >= 0) ||
>> ((someString.IndexOf("something2",0) >= 0) ||
>> ((someString.IndexOf("something3",0) >= 0))
>> {
>> Do something
>> }
>> ***************************************************************************
>> IndexOf doesn't do it. This was the original question. You have to do
>> multiple calls as is said in the original question. Nicholas was correct
>> in
>> his assessment. One Regex call would work.
>
> Yes, as would a single call to a method which called IndexOf on the
> string multiple times. I disagree with you - Nicholas wasn't correct in
> his assessment, as he claimed that the "best bet" would be to use a
> regular expression. Using regular expressions is just *not* the best
> bet here - it requires more effort, as I've described repeatedly.
>

No, he was correct in his answer to the question. The question was never
"Which is better", but can you do it . And you can do a method which called
IndexOf multiple times. But then it isn't one line, is it?

>> > Okay - now suppose I need to change it from searching for "something1"
>> > to "something.1" or "something[1]". How long does it take to change in
>> > each version? How easy is it to read afterwards?
>>
>> That wasn't the question.
>
> Are you suggesting that maintainability isn't something that should be
> considered? Do you *really* want to look for "something1",
> "something2" and "something3" or were they (as I suspect) just
> examples, and the real values could easily have dots, brackets etc in?

I don't really remember what the context was originally. But I know they
didn't have dots and brackets in it.


>
>> What if you wanted to change "something1" to "something\". Same problem.
>
> Well, half the problem with IndexOf than it is with regular
> expressions. With regular expressions, you'd need to know that not only
> does backslash need escaping in C#, it also needs escaping in regular
> expressions.
>
> IndexOf: "something\\" or @"something\"
> Regex: "something\\\\" or @"something\\"
>
> Once again, the IndexOf version is easier to understand - there's less
> to mentally unescape to work out what's actually being asked for.
>

Splitting hairs, now. Both are the same, as far as I can see (here).

>> And if escapes were a problem (if it were me) I would have a little sheet
>> that showed them at my desk within easy reach.
>
> Whereas by needing to know less (just the C# escapes) it's really easy
> to memorise everything I need to know to solve this situation.
>

That's true, but then you would only know C#. And if that is your aim.
That's fine.

>> >> So I am at a loss as to how this regular expression is more unreadable
>> >> than
>> >> the C# counterpart. That is not to say that you couldn't make it more
>> >> unreadable - but you could do the same with C# if you wanted to.
>> >
>> > You could start by making the C# more readable, as I've shown...
>>
>> As you can with Regular Expressions.
>
> Well, Oliver Sturm has shown a more readable version, but you seem to
> be keen on the "put them all in the same line" version.
>
> Neither is as readable as the String.IndexOf version, however.
>
>> > However, the regex is already less readable:
>> > 1) It's got "|" as a "magic character" in there.
>>
>> | = or (same as C)
>
> Yup, but it's something that isn't used in string literals other than
> for regular expressions. It's an extra thing to bear in mind
> unnecessarily.
>

No room for it, huh?

>> > 2) It's got all the strings concatenated, so it's harder to spot each
>> > of them separately.
>>
>> You are kidding, right?
>
> Absolutely not! It's significantly easier to spot the three separate
> values when they're three separate strings than when they're all mashed
> together.
>
>> > Furthermore, suppose you didn't just want to search for literals -
>> > suppose one of the strings you wanted to search for was contained in a
>> > variable. How sure are you that *no-one* on your team would use:
>> >
>> > x+"|something2|something3"
>> >
>> > as the regular expression?
>>
>> You are now leaving the original question. I never said that Regular
>> Expressions was the better (or not better) in all cases.
>
> While I'm leaving the exact original question, it's far from out of the
> question that the original code wouldn't need to be changed to use a
> variable to be searched for some time. At that point, can you guarantee
> that your team would get it right? They'd need to be on their guard
> when using regular expressions - they wouldn't need to be on their
> guard using IndexOf.
>

Right. No one makes mistakes with IndexOf.

>> > I would tell programmers on my team not to use regular expressions
>> > where the alternative is simpler and more readbale, yes.
>>
>> Why use them at all? It isn't readable.
>
> They aren't as readable *in this case*. In other, more complicated
> situations, the version which only used IndexOf would be harder to read
> than the regular expression version.

But your problem was that it would be hard for other programmers to read.
If they can read your more complicated version, this one should be easy.


>
> Using a regular expression is like getting a car compared with walking
> somewhere - it's absolutely the right thing to do when you're going on
> a long journey, but in this case you're advocating getting in a car
> just to travel to the next room. It's simpler to walk.
>
>> And if your programmers can't maintain the simple Regexs, they definately
>> won't be able to handle the more complicated ones.
>
> You seem to fail to grasp the "make it as simple as possible" concept.
> It's not a case of maintenance engineers being idiots - it's about
> presenting them with fewer possible risks. Why leave them a trap to
> fall into when you can write simpler code which is easier to change
> later on?
>

No.

I just find it as simple, in this case and you don't.

>> > You've shown nothing of the kind - whereas I think I've given plenty of
>> > examples of how using regular expressions make the code less easily
>> > maintainable, even if you consider it equally readable to start with
>> > (which I don't).
>>
>> Not in this specific case. I was never maintaining or pushing Regex for
>> all
>> or any situations.
>
> But you're pushing for regular expressions in *this* situation, or at
> least saying it's just as good as using IndexOf. You've also shown in
> your other code that you use regular expressions unnecessarily for
> replacement, making a simple two-step replacement into a complicated
> single-step replacement where the number of characters which *aren't*
> just plain text is greater than the number of characters which are.

No. Not pushing. But think they are equivelant in this case. As you said
earlier, I am sure others would disagree. But I don't think that the
difference is significant enough, in this case, even if I were to agree on
which is easier, to preclude it.

Who?

The person who can understand Regex if complicated, but would be trashed
trying to figure out our little example.

Bit of a stretch there.


>> >> Which are very well documented and when there are a myiad of ways a
>> >> user can put input these types of data, I prefer to use Regular
>> >> expressions which are all over the place (easy to find) then try to
>> >> come put with some complex set of loops and temporary variables which
>> >> make it far easier to make a mistake and much more unreadable the the
>> >> Regex equivelant.
>> >
>> > Where exactly are the complex loops and temporary variables in this
>> > specific case? After all, you have been arguing for using regular
>> > expressions in *this specific case*, haven't you?
>>
>> I was obviously talking about Regular Expressions in general here as I
>> was
>> refering to the standard ones you can get anywhere dealing with (Phone
>> numbers, credit card etc). There would be none in this case, obviously.
>> But there may be in more complicated cases.
>
> Yes - the complicated cases where I've already said that regular
> expressions are useful!

Just make sure the programmer that can't handle the easy Regex doesn't see
that one.


>
>> > You already need to know that when writing C# though - my use of
>> > String.IndexOf doesn't add to the volume of knowledge required.
>>

Can't have that !!!!

Already dealt with.


>
>> > When there's no good reason not to, absolutely.
>>
>> I guess that is where we disagree.
>
> It certainly sounds like it.
>
>> >> I am not going to code to the level of a junior programmer. I prefer
>> >> that
>> >> he learn to code to a higher level.
>> >
>> > Learning to solve problems as simply as possible *is* learning to code
>> > to a higher level.
>>
>> No argument there.
>
> But regular expressions are by their very nature more complicated than
> a simple String.IndexOf call. If they weren't they wouldn't be as
> powerful as they are.
>

Write and vanilla C# is less complicated than writing objects, but we still
do them.

Agreed.

Tom


Jon Skeet [C# MVP]

unread,
Sep 27, 2005, 2:55:32 AM9/27/05
to
tshad <tschei...@ftsolutions.com> wrote:
> >> When I talk about a Programming Language - I am talking about a
> >> Procedural
> >> Language (C, Fortran, VB, Pascal, etc.).
> >
> > So you wouldn't regard LISP as a programming language, just because
> > it's functional rather than procedural?
>
> I don't know much about LISP, but Mathematics is also a language, but not
> the same way as English and German are.

Indeed.

> > Of course, you didn't even specify "programming language" before.
>
> True.
>
> But I did specify, that it depends on how you define it.

True.

> > Right. Immediately the IndexOf value is more readable, by more clearly
> > separating the three separate strings which are being searched on.
> > (Oliver Sturm's version is more readable than that
>
> I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

No, I mean:

Yes, where the string itself is separated onto three lines.

> But actually they are both Olivers.
>
> I don't agree there. I think the Regex is just as readable, as long as you
> have a bit of Regular Expression understanding, obviously. I also think
> that if you understand C and didn't understand Regex - you would get what it
> is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
> understand C and so the IndexOf - which doesn't really telling you what it
> is doing.

Yes it does, it's finding the index of one string within another.

> IsMatch is much more understandable term than IndexOf.

The name is as understandable, but the exact semantics are *much* more
obscure. The name doesn't suggest that you can't just put a only in
there and expect it to only match a dot for instance, does it?

> >> Again, why do I need a compelling reason. If I have the solution and it
> >> happens to be Regex, I would use it, I wouldn't necessarily say to
> >> myself -
> >> "Is there perhaps a more readable way to write this? I wonder if Jim
> >> will
> >> be able to read this or not."
> >
> > Then I'm afraid that's your problem. It sounds like you're basically
> > admitting that you're not that interested in readability. Personally, I
> > like writing code which is elegant but easy to maintain. Having *a*
> > solution which happens to work isn't enough when there are obviously
> > others available which could well be simpler.
>
> I never said that.

You said that when you have a solution, you won't consider whether a
more readable way of writing it. To me, that demonstrates that you
don't care very much about readability.



> I never said readability is not an issue, but I am not going to write "Cat
> in the Hat" instead of a novel so that the programmers with the simplest of
> experience can read it. But I am not going to write cryptic code either so
> they can't read it.

If the "Cat in the Hat" does the job as well as the novel and is easier
to read, why on earth would you want to write the novel?

> I assume there are company standards to program by and I would follow that.

There aren't usually company standards down to the level of when to use
regular expressions.

> >Having *a*
> > solution which happens to work isn't enough when there are obviously
> > others available which could well be simpler.
>
> I am not writing simple code, I am writing code to handle a problem. I
> prefer to write good code not simple code. Sometimes they are synonymous,
> sometimes they aren't.

I disagree - simple code that works (as well as the more complicated
code) is always good. Note that this is in terms of implementation, not
design - there's sometimes a very simple but inelegant design which
ends up costing a lot more work in the long run. That's a different
matter.

> But in our case, I still them as equally readable.

You still haven't said whether you see them as equally readable *and
maintainable* to others though.

> > Far more time is spent maintaining code than writing it in the first
> > place. Taking the attitude you take above just isn't cost-effective in
> > the long run.
>
> Don't agree there.

With which bit? If you're going to disagree with the first sentence
quoted, we really don't have much basis for discussion. I thought it
was pretty much universally accepted these days that code almost always
spends more time in maintenance than in original coding. That's why I'm
always happy to spend a bit more time refactoring working code to make
it easier to maintain.

> >> No pushing. No more than your pushing not using it.
> >
> > But I'll readily admit to pushing the (IMO simpler) solution, for this
> > particular situation. So are you actually admitting that you *are*
> > pushing the use of regular expressions here?
> >
> In your opinion (as you say).
>
> And you obviously are not listening. I am not pushing either side. I have
> been saying over and over that in this situation, they are the same (IMO).

But that *is* pushing regular expressions from my point of view, where
they shouldn't be an option.

Consider an exaggerated equivalent situation. Suppose we were
discussing how to implement addition. Suppose I thought that just using
the expression x+y was the easiest way of doing things, and you thought
it was just as easy to write a remote web service which took two
integers. By *not* ruling out the more complex solution, you're
*effectively* pushing it - at least pushing it as an equally valid
option.

> I am not pushing Regex nor am I ruling them out. You however, can't make up
> your mind. One minute you say that something as simple as the example we
> are using is too complex for a programmer and then proceed to say that you
> would use Regex in other situations (which would have to be more
> complicated), makes no sense.

<sigh> I don't know whether you're intentionally missing the point or
whether I'm genuinely not getting through.

There is always risk associated with changing code. When writing code,
you should try to reduce the risk that future changes will incur. That
means making the code as simple as possible, and easy to change.

In some cases a regular expression will be a lot simpler to read and
change than the equivalent "primitive string manipulation" code. Those
cases would usually be where the string manipulation involves several
steps, often nested loops etc. There, the complexity of regular
expressions (which is still there) is less than the complexity of the
primitive solution.

In this case, however, the primitive solution is very simple and
understandable. Changing it to search for a different string or an
extra string (or even a string passed in as a parameter) is trivial.
Changing the regular expression is not.

> > So again, the code could be made more readable even by just modifying
> > the existing regex replacement, let alone by replacing the regular
> > expressions with simple String.Replace calls. Had they been
> > String.Replace calls, the meaning of the second line would have been
> > unambiguous - you'd have had to write it the simple way to start with.
> >
> I am not saying there may not be other ways to write the code. As I said, I
> often rewrite my own code later as I see a way I like better that I may not
> have thought of at the time I wrote it. Many times it isn't better code,
> just different.

In this case though, it *would* be better - it would be simpler to
understand, and simpler to write in the first place.

For instance, I wouldn't have had to consider whether the brackets were
doing something clever or not. I had to look up .NET regular
expressions just to check the meaning in this case. Do you really
believe that a solution which *doesn't* involve that extra thought
isn't better?

> > Note that your first replacement will replace two tabs with a single
> > space, but leave one tab alone, by the way. It would be better to
> > replace "\s+" with the space, IMO.
>
> Probably true. I am not a Regex expert. That was what I came up with at
> the time.

And that's part of the risk - that someone doesn't put enough effort
into the regex to get the *actually* desired behaviour. Where the
alternative is a complex solution, it makes a lot of sense to put
significant effort into getting the regex right. When you could do the
same thing with a few string operations, it's just not worth it.

(For this first line, a regex is probably the best way to go - but you
need to think about it more closely.)

> > I have had to look it up if you hadn't been answering the question
> > though. Why make the code harder to understand in the first place? If
> > you want to replace a space with " or ", just use
> > keywords = keywords.Replace (" ", " or ");
> > Much more straightforward.
> >
> Even in C, which I have used for years, I have to look up parameters to make
> sure I have the right parameters and have them in the right order.

Usually intellisense can help you with that though - it *doesn't* start
explaining the details of regular expressions though.

> As I said, the Parens were probably a mistake and may have made some changes
> to the line and left the parens in. I agree yours is the correct one.

And if you weren't taking "use regular expressions" as your default
position, you wouldn't have made the mistake in the first place. The
first thing you should try to think of is the simplest one. You want to
manipulate a string, so ask yourself if there's anything in the string
class which does what you want.

> >> But the fact a junior programmer might not understand Objects as you do
> >> would not prevent you from writing them, would you?
> >
> > When using C#, one has to use objects. I will almost always try to
> > implement the simplest solution to a problem, unless there is a
> > compelling reason to use a more complex solution. That way, anyone
> > reading the code has to learn relatively little "extra" stuff beyond
> > the language itself.
>
> That isn't the point.

It may not be your point, but it's part of my point.

> We are talking readability here. So don't write any objects. You can use
> the ones you need to, but if you write objects and someone has to maintain
> it, it could be a problem if he doesn't understand objects.

I'm assuming that "the solution uses .NET" is a given - in other words,
any maintenance engineer should know C# and the basics of .NET. To me
"the basics" don't include regular expressions and memorising all the
details of them. *Some* familiarity can be hoped for, but not knowing
all the constructs - so anything which requires that people *do* know
the regex constructs in order to change things is at a disadvantage.

> You can write the same code in straight C to do what objects do. We got
> along fine before there were objects. So I think, based on your statements,
> you should write the easier code that some very junior programmer might have
> to read.

No, we didn't "get along fine" before there were objects. C code is
typically far harder to read than OO code - and where it's not, that's
often because it's effectively written in a semi-OO way, just using
naming to indicate which type of object is being used (just without
polymorphism etc).

> > No, they really aren't. for and foreach are well-defined in the C#
> > language specification. If the program is in C# to start with, it is
> > reasonable to assume competency in C# on the part of the reader of the
> > code. It is *not* reasonable to assume competency in regular
> > expressions, and while that wouldn't prevent me from using regular
> > expressions where they provide value, they just *don't* here.
>
> But I am not writing in C# only. I am writing in .Net.

So you would assume that everyone who is reading and maintaining your
code knows every class in the .NET framework? I don't.

> > Clearly not, as you seem to be keen on using them instead of simple
> > string manipulations all over the place - if I saw anyone using regular
> > expressions rather than String.Replace in the way you've shown in other
> > code posts, that code would never get through code review.
> >
> Obviously, you micro manage more than I.

Well, I code review, just as my peers code review. We almost always
find things which can be done better (which works even better when pair
programming). That doesn't indicate that we're not good developers -
just that an extra point of view is always helpful. It also stops us
from getting lazy and implementing something which is just "okay"
rather than as good as it should be.

> If you would have a problem with our examples, I don't think I would like to
> work in your team.

Likewise if you don't consider that finding the simplest way of
implementing a solution is worth doing, I wouldn't like to work on your
code.



> In my area, if your code is reasonable and well written and it follows our
> standards, it's fine.

Being more complex than it needs to be means that code *isn't*
reasonable and well-written, IMO.

> >> >I suspect *very*
> >> > few programs don't do any string manipulation - knowing the string
> >> > methods well is *far* more fundamental to .NET programming than knowing
> >> > regular expressions.
> >>
> >> I agree with part of that and think that regular expressions are just as
> >> important to know.
> >
> > Why?
>
> Because they are perfectly valid and as you said before there are some that
> are useful (therefore, you should know them as someone might use them and
> you may have to maintain it).

Occasionally they're useful. I haven't used a single one in the project
I've been working on for the last six months. On the other hand, I've
used string manipulation all over the place.

I would expect that the number of straight string manipulations in most
code should be *much* higher than the number of regular expressions
used - hence it's more important to thoroughly understand the string
methods than regexes.



> >I'm working on a fairly large project which hasn't needed to use
> > regular expressions and wouldn't have benefitted from them once.
>
> That's your style and position, but may not be someone else's.

Everyone else in the team certainly feels the same way.

> > At that point, if I didn't understand the regular expression, I'd look
> > it up in the documentation. Do you know every part of regular
> > expression syntax off by heart?
>
> According to your position, you should ban them altogether for ANY use,
> since you can do anything in C# you can do in Regex.

No, because - as I *keep* saying - there are things you can't do as
*simply* using straight string manipulation. Where it's simpler to use
regexes, I'd use them. Those situations come up occasionally, but not
with the frequency you seem to use regular expressions.

> > If they're on my team, I'll tell them to refactor their code to only
> > use them when they're appropriate, frankly.
>
> Appropriate as defined by you. Why allow them at all?

See the various places I've exlained that both in this post and many
others.

> > If code uses regular expressions when they serve no purpose, it is
> > *not* well written and clean though - it is less maintainable than it
> > might be.
> >
> They serve a purpose. They do the same as your string routines, so there is
> a pupose. Both are string handling routines.

No, using regular expressions *instead* of the string handling routines
serves no purpose, just as using a web service to perform addition
would serve no purpose.

There's no advantage in using the regular expression here, and there
*is* a disadvantage.

> > And you believe that everyone else does? Again, bear in mind that
> > you're unlikely to be the only person ever to read your code.
>
> So you should never EVER use Regex. Someone else might read your code.
>
> This is going in circles.

Yes, because you seem unable to understand the position I've presented
several times.



> As I said, I would have a problem with someone who couldn't figure out what
> the example we were using was doing.

But would you have a problem with the same person if they forgot or
didn't check whether, say, '[' needed escaping? I'd find that a fairly
understandable mistake (although I'd hope that unit tests would show
the problem up).

> >> Keep regular expressions out of my code?????
> >>
> >> So now you are saying there is no use for it?
> >
> > Not at all - I'm saying that you shouldn't put regular expressions in
> > your code just for the sake of keeping your hand in. Use them where
> > they're applicable, and only there.
>
> There either is a use or not. You can't say there is a use for it and then
> brow beat a programmer because he happens to like to use it.

I certainly can when the programmer uses it where there's no good
reason. There's a time and place to use reflection, but I would
certainly brow-beat a programmer who decided to use it to get the value
of a property which could be done in a safer way (using normal property
access syntax).

> Has a programmer got to come to you each time he wants to use it to get your
> permission.

In our team a programmer (including myself) has to get "permission"
every time they want to check anything in. It's called code review, and
it vastly improves the quality of the code.

> I can see it if he writes some obscure cyptic Regular Expression - but come
> on.

Cryptic such as "( )" where a straight " " would have been more
readable? Code review should have picked that up.

> > But that's *exactly* what you've suggested you should do with regular
> > expressions - use them even when there's no real purpose in doing so,
> > just so that you remember what they look like.
>
> Sure.
>
> If they are both perfectly valid, I might. Depends on my mood (you should
> really have a problem with that). :)

I certainly do. "Valid" to me involves the code being as simple as
possible.

> > Okay, so you don't memorise it, which means you *do* have to look up
> > which characters require escaping. I think you've just admitted that
> > your code is less maintainable than mine.
>
> No.
>
> I can maintain my car, but I might still have to look up specs on it.

But wouldn't it be easier to maintain something which *didn't* require
you to look up anything?

> > I would use them when the solution which uses regular expressions is
> > clearer than the solution which doesn't use them. It seems a pretty
> > simple policy to me.
>
> If they are not readable, you shouldn't use them at all. I personally think
> they are both readable, in this case.

Readability is not a black and white issue. Something is "more
readable" than something else - in this case, using string manipulation
is more readable (and maintainable, importantly) than using regular
expressions. In other cases, it isn't.

> > Those maintaining the code could no doubt understand it after looking
> > at it for a little while, just like they could work out your other
> > regular expressions after looking at them and consulting the
> > documentation - but why are you trying to make their jobs harder? Why
> > are you not concerned that the code you're writing is costing your
> > company money by making it harder to maintain than it needs to be?
>
> Again, then you feel there is no place for Regex as you can do anything with
> C# that you can do with Regex. As you say, it will always be harder to
> read.

Where did I say it will *always* be harder to read? Please don't put
words in my mouth, especially when I've expressly stated otherwise
elsewhere.

At times, regular expressions will be easier to understand than the
equivalent string manipulation solution. In this case, they're not.

> >> As you said, the two solutions are equal. Your solution is that you MUST
> >> go
> >> with IndexOf. Mine is you can use either.
> >
> > Well, they're equal in terms of their semantics. They're definitely not
> > equal in terms of maintainability, and as that's important to me, I
> > don't see what's wrong with saying that I'm very strongly in favour of
> > avoiding the less readable/maintainable code.
>
> I didn't say that.

Didn't say what?



> >> I wasn't referring to this particular issue when I said this.
> >
> > It would have been nice if you'd indicated that. Do you agree then that
> > it doesn't actually take any more brainpower to come up with
> > String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
> > brainpower when it comes to maintaining the IndexOf solution?
>
> In this case, no.

So you don't think that it would be harder to change the regex code to
look for "hello[there" than it would be to change the IndexOf code in
the same way?

> In other cases, could be. Would have to look at it. I
> never said that Regex is the best thing out there. I was just saying that
> it is valid and can be readable - can also be cryptic (as can C#).

And I've never argued with that. I've argued against it being *as*
readable and maintainable in *this* case.

> > And of course the answer is "yes, by calling IndexOf multiple times".
>
> That wasn't the question asked. That was the example that was given and the
> question was can you do it in one statement.
>
> So the answer is no, using IndexOf.

Okay. But the follow-on answer is "the best way to do it is to use
IndexOf repeatedly" possibly with "and you can always write your own
method to do this if you want".

> > Yes, as would a single call to a method which called IndexOf on the
> > string multiple times. I disagree with you - Nicholas wasn't correct in
> > his assessment, as he claimed that the "best bet" would be to use a
> > regular expression. Using regular expressions is just *not* the best
> > bet here - it requires more effort, as I've described repeatedly.
>
> No, he was correct in his answer to the question. The question was never
> "Which is better", but can you do it .

His answer talked about the "best bet" - although the question didn't
ask about the best way, his answer did. I disagree with that answer.

> And you can do a method which called
> IndexOf multiple times. But then it isn't one line, is it?

You could put it in one line if you wanted to. It wouldn't be as easy
to read, but you could do it.



> > Are you suggesting that maintainability isn't something that should be
> > considered? Do you *really* want to look for "something1",
> > "something2" and "something3" or were they (as I suspect) just
> > examples, and the real values could easily have dots, brackets etc in?
>
> I don't really remember what the context was originally. But I know they
> didn't have dots and brackets in it.

And wouldn't ever?

> >> What if you wanted to change "something1" to "something\". Same problem.
> >
> > Well, half the problem with IndexOf than it is with regular
> > expressions. With regular expressions, you'd need to know that not only
> > does backslash need escaping in C#, it also needs escaping in regular
> > expressions.
> >
> > IndexOf: "something\\" or @"something\"
> > Regex: "something\\\\" or @"something\\"
> >
> > Once again, the IndexOf version is easier to understand - there's less
> > to mentally unescape to work out what's actually being asked for.
> >
> Splitting hairs, now. Both are the same, as far as I can see (here).

You don't think that having to count 4 backslashes is even slightly
harder than only counting 2? I can spot a double-backslash without
doing any double-checking. I'd always be careful when I needed four.

> >> And if escapes were a problem (if it were me) I would have a little sheet
> >> that showed them at my desk within easy reach.
> >
> > Whereas by needing to know less (just the C# escapes) it's really easy
> > to memorise everything I need to know to solve this situation.
>
> That's true, but then you would only know C#. And if that is your aim.
> That's fine.

My aim is to only *need* to know as little as possible. The rest is
available where necessary.



> > Yup, but it's something that isn't used in string literals other than
> > for regular expressions. It's an extra thing to bear in mind
> > unnecessarily.
>
> No room for it, huh?

Not when there's a simpler solution, no.



> > While I'm leaving the exact original question, it's far from out of the
> > question that the original code wouldn't need to be changed to use a
> > variable to be searched for some time. At that point, can you guarantee
> > that your team would get it right? They'd need to be on their guard
> > when using regular expressions - they wouldn't need to be on their
> > guard using IndexOf.
>
> Right. No one makes mistakes with IndexOf.

More rarely than with regular expressions.



> > They aren't as readable *in this case*. In other, more complicated
> > situations, the version which only used IndexOf would be harder to read
> > than the regular expression version.
>
> But your problem was that it would be hard for other programmers to read.
> If they can read your more complicated version, this one should be easy.

<sigh> It's a matter of degree, as I keep saying. It's a case of how
much effort needs to be put in to understand something.

> > You seem to fail to grasp the "make it as simple as possible" concept.
> > It's not a case of maintenance engineers being idiots - it's about
> > presenting them with fewer possible risks. Why leave them a trap to
> > fall into when you can write simpler code which is easier to change
> > later on?
>
> No.
>
> I just find it as simple, in this case and you don't.

I would be willing to wager large amounts of money on others
(particularly junior programmers) finding it less simple though. I'm
absolutely certain that if thousands of programmers had to maintain the
IndexOf version and change it to look for "foo.bar", fewer would make a
mistake than thousands of equivalent programmers maintaining the
regular expression version.

Are you absolutely certain that the regular expression *wouldn't* prove
more bug-prone?

> > But you're pushing for regular expressions in *this* situation, or at
> > least saying it's just as good as using IndexOf. You've also shown in
> > your other code that you use regular expressions unnecessarily for
> > replacement, making a simple two-step replacement into a complicated
> > single-step replacement where the number of characters which *aren't*
> > just plain text is greater than the number of characters which are.
>
> No. Not pushing. But think they are equivelant in this case. As you said
> earlier, I am sure others would disagree. But I don't think that the
> difference is significant enough, in this case, even if I were to agree on
> which is easier, to preclude it.

To me, it's definitely signifiant. Using regular expressions here
introduces risk for no benefit.

> >> Can't have it both ways. If you allow Regular Expressions, you shouldn't
> >> have a problem if a programmer used the Regex or IndexOf in our example.
> >> Anyone maintaining the "USEFUL" ones would have zero problems with this
> >> one.
> >
> > How very black and white of you. Do you really have no concept of
> > someone being able to understand something, but having a harder time
> > understanding it one way than the other?
> >
> Who?
>
> The person who can understand Regex if complicated, but would be trashed
> trying to figure out our little example.
>
> Bit of a stretch there.

Again, you're being black and white. I'm not saying that people
*couldn't* understand the regular expression - although they're more
likely to make a simple mistake without thinking about it. I'm saying
that they'll need to put more effort into understanding it than a
straight IndexOf.

> > Yes - the complicated cases where I've already said that regular
> > expressions are useful!
>
> Just make sure the programmer that can't handle the easy Regex doesn't see
> that one.

I would hope that anyone maintaining a complex regular expression will
double-check what's going on. It's easy to conceive of someone
maintaining a simple one failing to do so.

> > No, just allow them where they make sense. Note that if you only use
> > them where they're going to be doing something fairly involved, it's
> > much less likely that an engineer will forget that he's actually
> > dealing with a regular expression than with a simple string.
>
> Already dealt with.

Where?

> > But regular expressions are by their very nature more complicated than
> > a simple String.IndexOf call. If they weren't they wouldn't be as
> > powerful as they are.
>
> Write and vanilla C# is less complicated than writing objects, but we still
> do them.

No, it's not less complicated. If you avoided using objects, the code
would be *much* harder to read and maintain.

tshad

unread,
Sep 28, 2005, 12:00:19 PM9/28/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1da2fd43f...@msnews.microsoft.com...

But you aren't getting the point.

You are talking readability (not all the possible permutations). OK if you
you want to put dots and {} and () and [] and \ in the string, we have a
different story. In this case, however, you cannot tell me that a
programmer can't see that you are looking for a match (IsMatch) and there
are OBVIOUSLY (to even a half way decent programmer) 3 strings separated by
a "|", so therefore this line (and this line only) says we are trying to
match one of the 3 strings.

Wouldn't you agree? Leave out all the extraneous possibilities.

We aren't talking about a Regular Expression such as:

^((31(?!\ (Feb(ruary)?|Apr(il)?|June?|(Sept|Nov)(ember)?)))|((30|29)(?!\
Feb(ruary)?))|(29(?=\ Feb(ruary)?\
(((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])\
(Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\
((1[6-9]|[2-9]\d)\d{2})$

Description: This RE validates dates in the dd MMM yyyy format. Spaces
separate the values. Month value is either the full name of the month or the
3 letter abbrieviation without a period. Days for the month are validated
for all month, including Feb in leap years. Years are 4 digit years.

(this isn't mine, just one I saw on the net)

In this case, I would probably agree with you. Of course, I would assume
you could probably do the same thing in C# (although not with IndexOf only -
I would expect), but I don't know if it would be easy or readable. Probably
would be more readable.

>> >> Again, why do I need a compelling reason. If I have the solution and
>> >> it
>> >> happens to be Regex, I would use it, I wouldn't necessarily say to
>> >> myself -
>> >> "Is there perhaps a more readable way to write this? I wonder if Jim
>> >> will
>> >> be able to read this or not."
>> >
>> > Then I'm afraid that's your problem. It sounds like you're basically
>> > admitting that you're not that interested in readability. Personally, I
>> > like writing code which is elegant but easy to maintain. Having *a*
>> > solution which happens to work isn't enough when there are obviously
>> > others available which could well be simpler.
>>
>> I never said that.
>
> You said that when you have a solution, you won't consider whether a
> more readable way of writing it. To me, that demonstrates that you
> don't care very much about readability.
>

What I said was (restated) - If I have a solution, I am not not going to
think (as I try to write reasonable code, anyway and document it) " Wait a
minute, will Jim have a problem with it, will Mark have a problem with it.
We just hired Steve, a junior programmer, will he have a problem with it.
Wait a minute I'm not sure whether Greg is versed in this simple Regex
statement. Maybe I should find another solution - even though this is valid
and 10 of my programmers can read it, someone may have a little problem with
it. Surely, I can spend a little more time to rewrite a solution that
clearly works and find a more readable one."

This is the mindset you go through?

Of course, readability is important, but lets not be excessive about it. I
am not going write to a 3rd grade mentality when I am writing a business
letter. This does not mean it isn't readable, just not written to a grade
school level.

I don't expect a junior programmer to be able to understand everything I
code (which is why we have documentation). I agree yours is readable in
this case, but not anymore readable than mine (in this case). If your
programmers would have a problem with this statement, I would maintain the
problem is not with the code but the programmers.

>> I never said readability is not an issue, but I am not going to write
>> "Cat
>> in the Hat" instead of a novel so that the programmers with the simplest
>> of
>> experience can read it. But I am not going to write cryptic code either
>> so
>> they can't read it.
>
> If the "Cat in the Hat" does the job as well as the novel and is easier
> to read, why on earth would you want to write the novel?
>
>> I assume there are company standards to program by and I would follow
>> that.
>
> There aren't usually company standards down to the level of when to use
> regular expressions.

In your case, there should be.

Otherwise, don't quibble over a simple Regex, when you clearly say that a
much more complicated one is fine.

I have no problem with your saying that in your company, you don't allow
Regex statements.

I do have a problem with your saying, "your company doesn't preclude Regex,
and you accept Regex in complicated cases. But I'd better not catch you
using it in a simple case.".


>
>> >Having *a*
>> > solution which happens to work isn't enough when there are obviously
>> > others available which could well be simpler.
>>
>> I am not writing simple code, I am writing code to handle a problem. I
>> prefer to write good code not simple code. Sometimes they are
>> synonymous,
>> sometimes they aren't.
>
> I disagree - simple code that works (as well as the more complicated
> code) is always good.

Except that I don't agree that this one is complicated.

>Note that this is in terms of implementation, not
> design - there's sometimes a very simple but inelegant design which
> ends up costing a lot more work in the long run. That's a different
> matter.
>
>> But in our case, I still them as equally readable.
>
> You still haven't said whether you see them as equally readable *and
> maintainable* to others though.

I do.


>
>> > Far more time is spent maintaining code than writing it in the first
>> > place. Taking the attitude you take above just isn't cost-effective in
>> > the long run.
>>
>> Don't agree there.
>
> With which bit? If you're going to disagree with the first sentence
> quoted, we really don't have much basis for discussion. I thought it
> was pretty much universally accepted these days that code almost always
> spends more time in maintenance than in original coding. That's why I'm
> always happy to spend a bit more time refactoring working code to make
> it easier to maintain.
>
>> >> No pushing. No more than your pushing not using it.
>> >
>> > But I'll readily admit to pushing the (IMO simpler) solution, for this
>> > particular situation. So are you actually admitting that you *are*
>> > pushing the use of regular expressions here?
>> >
>> In your opinion (as you say).
>>
>> And you obviously are not listening. I am not pushing either side. I
>> have
>> been saying over and over that in this situation, they are the same
>> (IMO).
>
> But that *is* pushing regular expressions from my point of view, where
> they shouldn't be an option.

What?????

No pushing Regex is saying Regex is better and you should use it.

Giving choices and options is not pushing either side.

You are the one pushing one side over the other, I am not.


>
> Consider an exaggerated equivalent situation. Suppose we were
> discussing how to implement addition. Suppose I thought that just using
> the expression x+y was the easiest way of doing things, and you thought
> it was just as easy to write a remote web service which took two
> integers. By *not* ruling out the more complex solution, you're
> *effectively* pushing it - at least pushing it as an equally valid
> option.
>

No, that just means I am still not pushing the Complex side.

I am not saying pushing a position is a bad thing. But even if I said
Complex is as good as not, that still doesn't push the position.

>> I am not pushing Regex nor am I ruling them out. You however, can't make
>> up
>> your mind. One minute you say that something as simple as the example we
>> are using is too complex for a programmer and then proceed to say that
>> you
>> would use Regex in other situations (which would have to be more
>> complicated), makes no sense.
>
> <sigh> I don't know whether you're intentionally missing the point or
> whether I'm genuinely not getting through.

No you are getting through, I just disagree in this situation.


>
> There is always risk associated with changing code. When writing code,
> you should try to reduce the risk that future changes will incur. That
> means making the code as simple as possible, and easy to change.
>

I don't disagree.

I just disagree that that is not the point here.

> In some cases a regular expression will be a lot simpler to read and
> change than the equivalent "primitive string manipulation" code. Those
> cases would usually be where the string manipulation involves several
> steps, often nested loops etc. There, the complexity of regular
> expressions (which is still there) is less than the complexity of the
> primitive solution.
>

Yes, but if a programmer can read the more complex Regex, he can most
certainly read this almost nothing line.

> In this case, however, the primitive solution is very simple and
> understandable. Changing it to search for a different string or an
> extra string (or even a string passed in as a parameter) is trivial.
> Changing the regular expression is not.
>

But you are changing exactly the same thing in this example, whether it is a
parameter or literal.

>> > So again, the code could be made more readable even by just modifying
>> > the existing regex replacement, let alone by replacing the regular
>> > expressions with simple String.Replace calls. Had they been
>> > String.Replace calls, the meaning of the second line would have been
>> > unambiguous - you'd have had to write it the simple way to start with.
>> >
>> I am not saying there may not be other ways to write the code. As I
>> said, I
>> often rewrite my own code later as I see a way I like better that I may
>> not
>> have thought of at the time I wrote it. Many times it isn't better code,
>> just different.
>
> In this case though, it *would* be better - it would be simpler to
> understand, and simpler to write in the first place.
>

Round and round

> For instance, I wouldn't have had to consider whether the brackets were
> doing something clever or not. I had to look up .NET regular
> expressions just to check the meaning in this case. Do you really
> believe that a solution which *doesn't* involve that extra thought
> isn't better?

I see no brackets in our example (unless you are talking about the second
option).


>
>> > Note that your first replacement will replace two tabs with a single
>> > space, but leave one tab alone, by the way. It would be better to
>> > replace "\s+" with the space, IMO.
>>
>> Probably true. I am not a Regex expert. That was what I came up with at
>> the time.
>
> And that's part of the risk - that someone doesn't put enough effort
> into the regex to get the *actually* desired behaviour. Where the
> alternative is a complex solution, it makes a lot of sense to put
> significant effort into getting the regex right. When you could do the
> same thing with a few string operations, it's just not worth it.
>

And if you feel that way, then you should do it that way. If someone
doesn't and feels that the Regex is just as easy, then it is feasible.

> (For this first line, a regex is probably the best way to go - but you
> need to think about it more closely.)
>
>> > I have had to look it up if you hadn't been answering the question
>> > though. Why make the code harder to understand in the first place? If
>> > you want to replace a space with " or ", just use
>> > keywords = keywords.Replace (" ", " or ");
>> > Much more straightforward.
>> >
>> Even in C, which I have used for years, I have to look up parameters to
>> make
>> sure I have the right parameters and have them in the right order.
>
> Usually intellisense can help you with that though - it *doesn't* start
> explaining the details of regular expressions though.

If you happen to be using VS, I suppose. Now we are talking tools.


>
>> As I said, the Parens were probably a mistake and may have made some
>> changes
>> to the line and left the parens in. I agree yours is the correct one.
>
> And if you weren't taking "use regular expressions" as your default
> position, you wouldn't have made the mistake in the first place. The
> first thing you should try to think of is the simplest one. You want to
> manipulate a string, so ask yourself if there's anything in the string
> class which does what you want.
>

No, don't agree there.

I go with "is there a good solution". And it may be that I would come up
with the C string solution, but it maybe that I come up with the Regex just
as easily. I would probably look at the above Regex that was so complicated
and check to see if this could be done easier with C (or VB if I was working
in that). But not in a simple example as this. I would probably go with my
first thought, if it worked and was viable. I would also probably go with
it as it was one line instead of three, especially if it was being used as a
function.

>> >> But the fact a junior programmer might not understand Objects as you
>> >> do
>> >> would not prevent you from writing them, would you?
>> >
>> > When using C#, one has to use objects. I will almost always try to
>> > implement the simplest solution to a problem, unless there is a
>> > compelling reason to use a more complex solution. That way, anyone
>> > reading the code has to learn relatively little "extra" stuff beyond
>> > the language itself.
>>
>> That isn't the point.
>
> It may not be your point, but it's part of my point.
>
>> We are talking readability here. So don't write any objects. You can
>> use
>> the ones you need to, but if you write objects and someone has to
>> maintain
>> it, it could be a problem if he doesn't understand objects.
>
> I'm assuming that "the solution uses .NET" is a given - in other words,
> any maintenance engineer should know C# and the basics of .NET. To me
> "the basics" don't include regular expressions and memorising all the
> details of them. *Some* familiarity can be hoped for, but not knowing
> all the constructs - so anything which requires that people *do* know
> the regex constructs in order to change things is at a disadvantage.
>

I agree here, but if there is some familiarity with them, I assume they can
see what this example says.

>> You can write the same code in straight C to do what objects do. We got
>> along fine before there were objects. So I think, based on your
>> statements,
>> you should write the easier code that some very junior programmer might
>> have
>> to read.
>
> No, we didn't "get along fine" before there were objects. C code is
> typically far harder to read than OO code - and where it's not, that's
> often because it's effectively written in a semi-OO way, just using
> naming to indicate which type of object is being used (just without
> polymorphism etc).
>

Not necessarily to a junior programmer.

>> > No, they really aren't. for and foreach are well-defined in the C#
>> > language specification. If the program is in C# to start with, it is
>> > reasonable to assume competency in C# on the part of the reader of the
>> > code. It is *not* reasonable to assume competency in regular
>> > expressions, and while that wouldn't prevent me from using regular
>> > expressions where they provide value, they just *don't* here.
>>
>> But I am not writing in C# only. I am writing in .Net.
>
> So you would assume that everyone who is reading and maintaining your
> code knows every class in the .NET framework? I don't.
>

Right. So they would have to look them up. But that doesn't preclude you
from using them.

>> > Clearly not, as you seem to be keen on using them instead of simple
>> > string manipulations all over the place - if I saw anyone using regular
>> > expressions rather than String.Replace in the way you've shown in other
>> > code posts, that code would never get through code review.
>> >
>> Obviously, you micro manage more than I.
>
> Well, I code review, just as my peers code review. We almost always
> find things which can be done better (which works even better when pair
> programming). That doesn't indicate that we're not good developers -
> just that an extra point of view is always helpful. It also stops us
> from getting lazy and implementing something which is just "okay"
> rather than as good as it should be.

Nothing wrong with that. And I would assume that there are disagreements
with what is "better". I never have a problem with others reading my code
and looking for better ways to program. I just don't necessarily agree that
the other persons way is better.

I have had a running disagreement with Joe Celko (don't know if you know who
he is). He wrote some good Sql Server books and is very knowledgeable in
the subject. Tons more than I am.

But I disagree with him on a couple of issues. One is that he he is dead
set against Camel Case. Says it is harder to read. I disagree. He feels
that code is worse if you use it. I disagree.

I also don't agree with the C style of putting the left bracket at the end
of a line Kernighan and Ritchie style. Makes more sense to put it on the
next line as it is part of a block of code. Have had many discussions on
that point. Matter of style. But not necessarily better.

>
>> If you would have a problem with our examples, I don't think I would like
>> to
>> work in your team.
>
> Likewise if you don't consider that finding the simplest way of
> implementing a solution is worth doing, I wouldn't like to work on your
> code.

OK.

>
>> In my area, if your code is reasonable and well written and it follows
>> our
>> standards, it's fine.
>
> Being more complex than it needs to be means that code *isn't*
> reasonable and well-written, IMO.
>

I agree in some but not in this case.

Because what you are saying is that you really should have 10 programmers
program each piece of each project. That way you can pick which piece is
the simplest (less complex).

If you have 2 programmers program the same problem, you are going to get 2
different solutions. One will be simpler than the other. If you have 3
programmers one of the 3 will be simpler and so on and so on ...

If that is your aim.

>> >> >I suspect *very*
>> >> > few programs don't do any string manipulation - knowing the string
>> >> > methods well is *far* more fundamental to .NET programming than
>> >> > knowing
>> >> > regular expressions.
>> >>
>> >> I agree with part of that and think that regular expressions are just
>> >> as
>> >> important to know.
>> >
>> > Why?
>>
>> Because they are perfectly valid and as you said before there are some
>> that
>> are useful (therefore, you should know them as someone might use them and
>> you may have to maintain it).
>
> Occasionally they're useful. I haven't used a single one in the project
> I've been working on for the last six months. On the other hand, I've
> used string manipulation all over the place.

And no one says you have to use them.


>
> I would expect that the number of straight string manipulations in most
> code should be *much* higher than the number of regular expressions
> used - hence it's more important to thoroughly understand the string
> methods than regexes.

OK.


>
>> >I'm working on a fairly large project which hasn't needed to use
>> > regular expressions and wouldn't have benefitted from them once.
>>
>> That's your style and position, but may not be someone else's.
>
> Everyone else in the team certainly feels the same way.

Then you are lucky you work with that team.


>
>> > At that point, if I didn't understand the regular expression, I'd look
>> > it up in the documentation. Do you know every part of regular
>> > expression syntax off by heart?
>>
>> According to your position, you should ban them altogether for ANY use,
>> since you can do anything in C# you can do in Regex.
>
> No, because - as I *keep* saying - there are things you can't do as
> *simply* using straight string manipulation. Where it's simpler to use
> regexes, I'd use them. Those situations come up occasionally, but not
> with the frequency you seem to use regular expressions.
>

No your position was that "this" particular example was hard to read and may
not be maintainable by some programmer. I agree with you in other
situations, just not this one.

>> > If they're on my team, I'll tell them to refactor their code to only
>> > use them when they're appropriate, frankly.
>>
>> Appropriate as defined by you. Why allow them at all?
>
> See the various places I've exlained that both in this post and many
> others.

The problem here is that you are not leaving "appropriate" up to the
programmer.


>
>> > If code uses regular expressions when they serve no purpose, it is
>> > *not* well written and clean though - it is less maintainable than it
>> > might be.
>> >
>> They serve a purpose. They do the same as your string routines, so there
>> is
>> a pupose. Both are string handling routines.
>
> No, using regular expressions *instead* of the string handling routines
> serves no purpose, just as using a web service to perform addition
> would serve no purpose.
>
> There's no advantage in using the regular expression here, and there
> *is* a disadvantage.
>
>> > And you believe that everyone else does? Again, bear in mind that
>> > you're unlikely to be the only person ever to read your code.
>>
>> So you should never EVER use Regex. Someone else might read your code.
>>
>> This is going in circles.
>
> Yes, because you seem unable to understand the position I've presented
> several times.
>

Of course, not because you seem unable to understand the position I've
presented several times.

>> As I said, I would have a problem with someone who couldn't figure out
>> what
>> the example we were using was doing.
>
> But would you have a problem with the same person if they forgot or
> didn't check whether, say, '[' needed escaping? I'd find that a fairly
> understandable mistake (although I'd hope that unit tests would show
> the problem up).
>

And mistakes are not made with C string code?

>> >> Keep regular expressions out of my code?????
>> >>
>> >> So now you are saying there is no use for it?
>> >
>> > Not at all - I'm saying that you shouldn't put regular expressions in
>> > your code just for the sake of keeping your hand in. Use them where
>> > they're applicable, and only there.
>>
>> There either is a use or not. You can't say there is a use for it and
>> then
>> brow beat a programmer because he happens to like to use it.
>
> I certainly can when the programmer uses it where there's no good
> reason. There's a time and place to use reflection, but I would
> certainly brow-beat a programmer who decided to use it to get the value
> of a property which could be done in a safer way (using normal property
> access syntax).
>
>> Has a programmer got to come to you each time he wants to use it to get
>> your
>> permission.
>
> In our team a programmer (including myself) has to get "permission"
> every time they want to check anything in. It's called code review, and
> it vastly improves the quality of the code.
>
>> I can see it if he writes some obscure cyptic Regular Expression - but
>> come
>> on.
>
> Cryptic such as "( )" where a straight " " would have been more
> readable? Code review should have picked that up.
>

But now you are talking about Code review. The problem was that I probably
forgot to take out the Parens when I took out the other piece of code.
Would happen just as easily in C String handlers.

>> > But that's *exactly* what you've suggested you should do with regular
>> > expressions - use them even when there's no real purpose in doing so,
>> > just so that you remember what they look like.
>>
>> Sure.
>>
>> If they are both perfectly valid, I might. Depends on my mood (you
>> should
>> really have a problem with that). :)
>
> I certainly do. "Valid" to me involves the code being as simple as
> possible.
>
>> > Okay, so you don't memorise it, which means you *do* have to look up
>> > which characters require escaping. I think you've just admitted that
>> > your code is less maintainable than mine.
>>
>> No.
>>
>> I can maintain my car, but I might still have to look up specs on it.
>
> But wouldn't it be easier to maintain something which *didn't* require
> you to look up anything?
>

I've been writing C code for 15+ years and still have to look things up.
Just dense, I guess

>> > I would use them when the solution which uses regular expressions is
>> > clearer than the solution which doesn't use them. It seems a pretty
>> > simple policy to me.
>>
>> If they are not readable, you shouldn't use them at all. I personally
>> think
>> they are both readable, in this case.
>
> Readability is not a black and white issue. Something is "more
> readable" than something else - in this case, using string manipulation
> is more readable (and maintainable, importantly) than using regular
> expressions. In other cases, it isn't.
>

As I said, I think it is as readable as C in this case. I would agree with
you in others.

That I said you were wrong in saying you were strongly in favor of avoiding
the less readable/maintainable code.

>


>> >> I wasn't referring to this particular issue when I said this.
>> >
>> > It would have been nice if you'd indicated that. Do you agree then that
>> > it doesn't actually take any more brainpower to come up with
>> > String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
>> > brainpower when it comes to maintaining the IndexOf solution?
>>
>> In this case, no.
>
> So you don't think that it would be harder to change the regex code to
> look for "hello[there" than it would be to change the IndexOf code in
> the same way?
>
>> In other cases, could be. Would have to look at it. I
>> never said that Regex is the best thing out there. I was just saying
>> that
>> it is valid and can be readable - can also be cryptic (as can C#).
>
> And I've never argued with that. I've argued against it being *as*
> readable and maintainable in *this* case.
>
>> > And of course the answer is "yes, by calling IndexOf multiple times".
>>
>> That wasn't the question asked. That was the example that was given and
>> the
>> question was can you do it in one statement.
>>
>> So the answer is no, using IndexOf.
>
> Okay. But the follow-on answer is "the best way to do it is to use
> IndexOf repeatedly" possibly with "and you can always write your own
> method to do this if you want".

But that wasn't the original question. If you do that you are definately
not doing it in one line (which was the question). The question was if
there was a way - not "what is the best way". That doesn't negate your
point of view on it. But that wasn't what was being asked. Obviously, the
multiple IndexOf lines were known as the question was "is there another way
in one line to do the same thing".

And there was.


>
>> > Yes, as would a single call to a method which called IndexOf on the
>> > string multiple times. I disagree with you - Nicholas wasn't correct in
>> > his assessment, as he claimed that the "best bet" would be to use a
>> > regular expression. Using regular expressions is just *not* the best
>> > bet here - it requires more effort, as I've described repeatedly.
>>
>> No, he was correct in his answer to the question. The question was never
>> "Which is better", but can you do it .
>
> His answer talked about the "best bet" - although the question didn't
> ask about the best way, his answer did. I disagree with that answer.

That was obvious.

But he was answering the question. You weren't.

The question was "is there a way to do this in one command". And then an
example of what was being looked for.

He answered that question and said that was the best bet (IHO).

So far, you haven't answered that question.

Creating another subroutine to call - sort of does it. But wasn't what was
really being asked, according to the examples.

So his answer was valid.


>
>> And you can do a method which called
>> IndexOf multiple times. But then it isn't one line, is it?
>
> You could put it in one line if you wanted to. It wouldn't be as easy
> to read, but you could do it.
>

Ok.

It wasn't one statement, not one line.

>> > Are you suggesting that maintainability isn't something that should be
>> > considered? Do you *really* want to look for "something1",
>> > "something2" and "something3" or were they (as I suspect) just
>> > examples, and the real values could easily have dots, brackets etc in?
>>
>> I don't really remember what the context was originally. But I know they
>> didn't have dots and brackets in it.
>
> And wouldn't ever?

Probably not, but possible. But again, that wasn't the question.

You can make anything more complicated than it has to be. The question was
still answered with a Regular Expression. Not whether you can make it more
complicated.

> Are you absolutely certain that the regular expression *wouldn't* prove
> more bug-prone?
>
>> > But you're pushing for regular expressions in *this* situation, or at
>> > least saying it's just as good as using IndexOf. You've also shown in
>> > your other code that you use regular expressions unnecessarily for
>> > replacement, making a simple two-step replacement into a complicated
>> > single-step replacement where the number of characters which *aren't*
>> > just plain text is greater than the number of characters which are.
>>
>> No. Not pushing. But think they are equivelant in this case. As you
>> said
>> earlier, I am sure others would disagree. But I don't think that the
>> difference is significant enough, in this case, even if I were to agree
>> on
>> which is easier, to preclude it.
>
> To me, it's definitely signifiant. Using regular expressions here
> introduces risk for no benefit.
>

Oh well.

Life is risky.

What????

So now you shouldn't use it because someone might not double-check it?

>> > No, just allow them where they make sense. Note that if you only use
>> > them where they're going to be doing something fairly involved, it's
>> > much less likely that an engineer will forget that he's actually
>> > dealing with a regular expression than with a simple string.
>>
>> Already dealt with.
>
> Where?

Ok. Maybe not.

I really don't see where an engineer is going to "forget" he is dealing with
a regular expression.


>
>> > But regular expressions are by their very nature more complicated than
>> > a simple String.IndexOf call. If they weren't they wouldn't be as
>> > powerful as they are.
>>
>> Write and vanilla C# is less complicated than writing objects, but we
>> still
>> do them.
>
> No, it's not less complicated. If you avoided using objects, the code
> would be *much* harder to read and maintain.
>

I think we are running out of arguments :)

Tom


Jon Skeet [C# MVP]

unread,
Sep 28, 2005, 12:57:20 PM9/28/05
to
tshad <tschei...@ftsolutions.com> wrote:
> > The name is as understandable, but the exact semantics are *much* more
> > obscure. The name doesn't suggest that you can't just put a only in
> > there and expect it to only match a dot for instance, does it?
>
> But you aren't getting the point.
>
> You are talking readability (not all the possible permutations). OK if you
> you want to put dots and {} and () and [] and \ in the string, we have a
> different story. In this case, however, you cannot tell me that a
> programmer can't see that you are looking for a match (IsMatch) and there
> are OBVIOUSLY (to even a half way decent programmer) 3 strings separated by
> a "|", so therefore this line (and this line only) says we are trying to
> match one of the 3 strings.

Whereas I would say it isn't *as* obvious as three calls to IndexOf.
Yes, it wouldn't take more than a few seconds to work out what was
going on, but why add that time in the first place?

> Wouldn't you agree? Leave out all the extraneous possibilities.
>
> We aren't talking about a Regular Expression such as:
>
> ^((31(?!\ (Feb(ruary)?|Apr(il)?|June?|(Sept|Nov)(ember)?)))|((30|29)(?!\
> Feb(ruary)?))|(29(?=\ Feb(ruary)?\
> (((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])\
> (Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\
> ((1[6-9]|[2-9]\d)\d{2})$
>
> Description: This RE validates dates in the dd MMM yyyy format. Spaces
> separate the values. Month value is either the full name of the month or the
> 3 letter abbrieviation without a period. Days for the month are validated
> for all month, including Feb in leap years. Years are 4 digit years.
>
> (this isn't mine, just one I saw on the net)
>
> In this case, I would probably agree with you. Of course, I would assume
> you could probably do the same thing in C# (although not with IndexOf only -
> I would expect), but I don't know if it would be easy or readable. Probably
> would be more readable.

I'd almost certainly use DateTime.ParseExact instead, giving it a list
of appropriate formats which were acceptable :)



> > You said that when you have a solution, you won't consider whether a
> > more readable way of writing it. To me, that demonstrates that you
> > don't care very much about readability.
> >
> What I said was (restated) - If I have a solution, I am not not going to
> think (as I try to write reasonable code, anyway and document it) " Wait a
> minute, will Jim have a problem with it, will Mark have a problem with it.
> We just hired Steve, a junior programmer, will he have a problem with it.
> Wait a minute I'm not sure whether Greg is versed in this simple Regex
> statement. Maybe I should find another solution - even though this is valid
> and 10 of my programmers can read it, someone may have a little problem with
> it. Surely, I can spend a little more time to rewrite a solution that
> clearly works and find a more readable one."
>
> This is the mindset you go through?

Not with individuals, no. I *do* think: "Is this the simplest way of
getting the job done?"

> Of course, readability is important, but lets not be excessive about it. I
> am not going write to a 3rd grade mentality when I am writing a business
> letter. This does not mean it isn't readable, just not written to a grade
> school level.
>
> I don't expect a junior programmer to be able to understand everything I
> code (which is why we have documentation). I agree yours is readable in
> this case, but not anymore readable than mine (in this case). If your
> programmers would have a problem with this statement, I would maintain the
> problem is not with the code but the programmers.

There's a difference between "unable to understand" and "not as
*easily* able to understand" - between "very low risk" and "some risk".
When it comes to maintenance, I would class the regex solution as "some
risk" because of the possibility for error if you need escaping - and
one of the things about maintenance is that you just can't easily
predict what changes will be required.

> > There aren't usually company standards down to the level of when to use
> > regular expressions.
>
> In your case, there should be.

There don't need to be - just the "do things in the simplest way" would
cover this case.



> Otherwise, don't quibble over a simple Regex, when you clearly say that a
> much more complicated one is fine.

The above rule covers both situations.

> I have no problem with your saying that in your company, you don't allow
> Regex statements.
> I do have a problem with your saying, "your company doesn't preclude Regex,
> and you accept Regex in complicated cases. But I'd better not catch you
> using it in a simple case.".

Why? It clearly follows the "use the simplest code to get the job
done" rule.

I should add at this point that I've been talking to a few people about
this, and *all* of them have agreed so far that the regular expression
way just isn't the way to go here - that it's a risky and relatively
complex solution.

> > I disagree - simple code that works (as well as the more complicated
> > code) is always good.
>
> Except that I don't agree that this one is complicated.

I didn't say it was complicated - I said it was *more* complicated.



> > You still haven't said whether you see them as equally readable *and
> > maintainable* to others though.
>
> I do.

So you'd be happy to take the bet about which version would trip up
more coders? (Note that giving it in a "test" situation wouldn't be
entirely appropriate, unfortunate - people are already on their guard
for subtleties when you give them actual tests.)

> > But that *is* pushing regular expressions from my point of view, where
> > they shouldn't be an option.
>
> What?????
>
> No pushing Regex is saying Regex is better and you should use it.
>
> Giving choices and options is not pushing either side.
>
> You are the one pushing one side over the other, I am not.

I'm certainly pushing one side - but I happen to think you count as
"pushing" when the solution you're talking about is to me obviously
more complicated.

> > Consider an exaggerated equivalent situation. Suppose we were
> > discussing how to implement addition. Suppose I thought that just using
> > the expression x+y was the easiest way of doing things, and you thought
> > it was just as easy to write a remote web service which took two
> > integers. By *not* ruling out the more complex solution, you're
> > *effectively* pushing it - at least pushing it as an equally valid
> > option.
>
> No, that just means I am still not pushing the Complex side.
>
> I am not saying pushing a position is a bad thing. But even if I said
> Complex is as good as not, that still doesn't push the position.

I think we'll have to agree to disagree about what counts as pushing
then. Fortunately it's not terribly important to the discussion.



> > There is always risk associated with changing code. When writing code,
> > you should try to reduce the risk that future changes will incur. That
> > means making the code as simple as possible, and easy to change.
>
> I don't disagree.

You did before. You wrote:

<quote>


If I have the solution and it happens to be Regex, I would use it, I
wouldn't necessarily say to myself - "Is there perhaps a more readable
way to write this? I wonder if Jim will be able to read this or not."

</quote>

That doesn't sit well with agreeing that you should make the code as
simple as possible - you're saying that sometimes you wouldn't even
bother thinking whether there might be a more readable way of writing
it.


> > In some cases a regular expression will be a lot simpler to read and
> > change than the equivalent "primitive string manipulation" code. Those
> > cases would usually be where the string manipulation involves several
> > steps, often nested loops etc. There, the complexity of regular
> > expressions (which is still there) is less than the complexity of the
> > primitive solution.
>
> Yes, but if a programmer can read the more complex Regex, he can most
> certainly read this almost nothing line.

Again, you're putting it all in black or white - either something being
readable or not. Life doesn't work like that - it's shades of grey.
Something can be "more readable" or "less readable" - "more
maintainable" or "less maintainable".

> > In this case, however, the primitive solution is very simple and
> > understandable. Changing it to search for a different string or an
> > extra string (or even a string passed in as a parameter) is trivial.
> > Changing the regular expression is not.
>
> But you are changing exactly the same thing in this example, whether it is a
> parameter or literal.

And that's exactly the kind of thing which happens during maintenance.



> > For instance, I wouldn't have had to consider whether the brackets were
> > doing something clever or not. I had to look up .NET regular
> > expressions just to check the meaning in this case. Do you really
> > believe that a solution which *doesn't* involve that extra thought
> > isn't better?
>
> I see no brackets in our example (unless you are talking about the second
> option).

This section of the thread was talking about your call to Regex.Replace
which used "( )" to replace a single space with something.

> > And that's part of the risk - that someone doesn't put enough effort
> > into the regex to get the *actually* desired behaviour. Where the
> > alternative is a complex solution, it makes a lot of sense to put
> > significant effort into getting the regex right. When you could do the
> > same thing with a few string operations, it's just not worth it.
>
> And if you feel that way, then you should do it that way. If someone
> doesn't and feels that the Regex is just as easy, then it is feasible.

We fundamentally disagree about whether the regex really is "just as
easy" though - and I find it hard to understand, given the arguments
I've used for maintenance.

Can you think of any examples where it would be *easier* to maintain
the regex version? I've given plenty of examples where it would be
easier to maintain the IndexOf version. There may be examples where it
would be easier to maintain the regex version, but I strongly suspect
that when you come up with them, you'll agree that the changes which
would be easier to cope with in the IndexOf version are more likely to
occur.



> > Usually intellisense can help you with that though - it *doesn't* start
> > explaining the details of regular expressions though.
>
> If you happen to be using VS, I suppose. Now we are talking tools.

What proportion of professional developers *don't* use VS when writing
C#? I suspect it's under 1% - vanishingly small. Anything which helps
development when using VS is therefore useful for almost everyone.

> > And if you weren't taking "use regular expressions" as your default
> > position, you wouldn't have made the mistake in the first place. The
> > first thing you should try to think of is the simplest one. You want to
> > manipulate a string, so ask yourself if there's anything in the string
> > class which does what you want.
>
> No, don't agree there.
>
> I go with "is there a good solution". And it may be that I would come up
> with the C string solution, but it maybe that I come up with the Regex just
> as easily. I would probably look at the above Regex that was so complicated
> and check to see if this could be done easier with C (or VB if I was working
> in that).

The examples you've given show you thinking of regex as a *first* port
of call rather than *after* the simpler solutions have been found
wanting though. That's just a bad idea.

> But not in a simple example as this. I would probably go with my
> first thought, if it worked and was viable. I would also probably go with
> it as it was one line instead of three, especially if it was being used as a
> function.

Readability is about *so* much more than saving space. As I said
elsewhere, you could put the IndexOf solution all on one line too, if
you want. Heck, put the whole of each class in one line if you want -
but readability will go down rather than up.

> > I'm assuming that "the solution uses .NET" is a given - in other words,
> > any maintenance engineer should know C# and the basics of .NET. To me
> > "the basics" don't include regular expressions and memorising all the
> > details of them. *Some* familiarity can be hoped for, but not knowing
> > all the constructs - so anything which requires that people *do* know
> > the regex constructs in order to change things is at a disadvantage.
>
> I agree here, but if there is some familiarity with them, I assume they can
> see what this example says.

But they would probably have to look up which characters need escaping
when they had to change the string to include something involving
punctuation. You don't have to do that with IndexOf - so it's simpler.



> > No, we didn't "get along fine" before there were objects. C code is
> > typically far harder to read than OO code - and where it's not, that's
> > often because it's effectively written in a semi-OO way, just using
> > naming to indicate which type of object is being used (just without
> > polymorphism etc).
>
> Not necessarily to a junior programmer.

I think the cases where the C code is easier to read for someone who
knows as much C# as C are very, very few.

> > So you would assume that everyone who is reading and maintaining your
> > code knows every class in the .NET framework? I don't.
>
> Right. So they would have to look them up. But that doesn't preclude you
> from using them.

No - but I'd think twice about using a relatively obscure class like
Regex (obscure in that it's not something which I typically use all
over the place - by the looks of it, anyone who maintains your code
really *does* have to know about Regex) when a more common class
(String in this case) does the job just as well.

> > Well, I code review, just as my peers code review. We almost always
> > find things which can be done better (which works even better when pair
> > programming). That doesn't indicate that we're not good developers -
> > just that an extra point of view is always helpful. It also stops us
> > from getting lazy and implementing something which is just "okay"
> > rather than as good as it should be.
>
> Nothing wrong with that. And I would assume that there are disagreements
> with what is "better". I never have a problem with others reading my code
> and looking for better ways to program. I just don't necessarily agree that
> the other persons way is better.
>
> I have had a running disagreement with Joe Celko (don't know if you know who
> he is). He wrote some good Sql Server books and is very knowledgeable in
> the subject. Tons more than I am.
>
> But I disagree with him on a couple of issues. One is that he he is dead
> set against Camel Case. Says it is harder to read. I disagree. He feels
> that code is worse if you use it. I disagree.
>
> I also don't agree with the C style of putting the left bracket at the end
> of a line Kernighan and Ritchie style. Makes more sense to put it on the
> next line as it is part of a block of code. Have had many discussions on
> that point. Matter of style. But not necessarily better.

Yes, bracing and naming is very hard to make absolute judgements on -
and I agree with you on both of these cases. There's little you can put
forward in the way of concrete examples of why one version is better.
That's not the case here though - I've given numerous examples of
situations where changing the code to do something which is on the face
of it very similar (eg changing from looking for "a_b" to "a.b"
requires significantly more work (looking up docs) with the Regex
version than with the IndexOf version.

> >> In my area, if your code is reasonable and well written and it follows
> >> our
> >> standards, it's fine.
> >
> > Being more complex than it needs to be means that code *isn't*
> > reasonable and well-written, IMO.
>
> I agree in some but not in this case.
>
> Because what you are saying is that you really should have 10 programmers
> program each piece of each project. That way you can pick which piece is
> the simplest (less complex).

There's no need, so long as the single or paired programmer always
bears it in mind.



> If you have 2 programmers program the same problem, you are going to get 2
> different solutions. One will be simpler than the other. If you have 3
> programmers one of the 3 will be simpler and so on and so on ...
>
> If that is your aim.

That would suggest that every line of code I pair program raises an
alternative with my pair - it just doesn't happen. Almost always, we
agree on the simplest course of action.

> > Occasionally they're useful. I haven't used a single one in the project
> > I've been working on for the last six months. On the other hand, I've
> > used string manipulation all over the place.
>
> And no one says you have to use them.

But you should take from those stats the fact that people are *likely*
to run into regular expressions less often than straight string
manipulations - so people will be more familiar with the latter. When
two ways of doing something are equivalent other than in familiarity,
go for the more familiar way.

> >> >I'm working on a fairly large project which hasn't needed to use
> >> > regular expressions and wouldn't have benefitted from them once.
> >>
> >> That's your style and position, but may not be someone else's.
> >
> > Everyone else in the team certainly feels the same way.
>
> Then you are lucky you work with that team.

As I say, everyone else I've spoken to about this agrees that using
regular expressions in this case is overkill too.

> > No, because - as I *keep* saying - there are things you can't do as
> > *simply* using straight string manipulation. Where it's simpler to use
> > regexes, I'd use them. Those situations come up occasionally, but not
> > with the frequency you seem to use regular expressions.
>
> No your position was that "this" particular example was hard to read and may
> not be maintainable by some programmer. I agree with you in other
> situations, just not this one.

Not *hard* to read, but *harder* to read (than the IndexOf version).
*Relatively* hard to read.

In this particular example, the IndexOf version is less risky than the
regex version in terms of future maintenance.

> >> > If they're on my team, I'll tell them to refactor their code to only
> >> > use them when they're appropriate, frankly.
> >>
> >> Appropriate as defined by you. Why allow them at all?
> >
> > See the various places I've exlained that both in this post and many
> > others.
>
> The problem here is that you are not leaving "appropriate" up to the
> programmer.

I'm defining it as "simplest" - and you're the only person I've found
so far who *doesn't* think that using IndexOf is simpler.

> >> This is going in circles.
> >
> > Yes, because you seem unable to understand the position I've presented
> > several times.
>
> Of course, not because you seem unable to understand the position I've
> presented several times.

I understand that you're presenting the two implementations as equally
simple despite my numerous maintenance examples where maintaining the
regex version is harder than maintaining the IndexOf version. You
haven't come up with any counter-examples.

> > But would you have a problem with the same person if they forgot or
> > didn't check whether, say, '[' needed escaping? I'd find that a fairly
> > understandable mistake (although I'd hope that unit tests would show
> > the problem up).
>
> And mistakes are not made with C string code?

Sometimes - but more rarely, I believe. A change in what to search for
in the IndexOf case is really easy. It's a bit harder with the regex
version.



> > Cryptic such as "( )" where a straight " " would have been more
> > readable? Code review should have picked that up.
>
> But now you are talking about Code review. The problem was that I probably
> forgot to take out the Parens when I took out the other piece of code.
> Would happen just as easily in C String handlers.

No it wouldn't - because if you didn't have a regex in the first place,
you wouldn't have had the brackets in the first place. Similarly, if
you'd constantly been asking yourself "is there a simpler way of doing
this?" (and come up with the same answer that everyone else I've spoken
to has) then even if you originally had a regex, by the time you got
down to "I'm replacing a space with something else" you'd have changed
to using String.Replace.

Got to go now - will answer the rest of the post later.

Jon Skeet [C# MVP]

unread,
Sep 28, 2005, 2:44:27 PM9/28/05
to
[Continuing from where I left off]

tshad <tschei...@ftsolutions.com> wrote:
> > But wouldn't it be easier to maintain something which *didn't* require
> > you to look up anything?
>
> I've been writing C code for 15+ years and still have to look things up.
> Just dense, I guess

That didn't answer my question though - surely if there are things you
*don't* need to look up (and can reasonably expect others not to need
to look up), those are likely to be easier to read and maintain, right?



> > Readability is not a black and white issue. Something is "more
> > readable" than something else - in this case, using string manipulation
> > is more readable (and maintainable, importantly) than using regular
> > expressions. In other cases, it isn't.
>
> As I said, I think it is as readable as C in this case. I would agree with
> you in others.

Do you think it's as readable to *most* people, or just to you
personally, out of interest?



> >> > Well, they're equal in terms of their semantics. They're definitely not
> >> > equal in terms of maintainability, and as that's important to me, I
> >> > don't see what's wrong with saying that I'm very strongly in favour of
> >> > avoiding the less readable/maintainable code.
> >>
> >> I didn't say that.
> >
> > Didn't say what?
>
> That I said you were wrong in saying you were strongly in favor of avoiding
> the less readable/maintainable code.

Right.

> >> > And of course the answer is "yes, by calling IndexOf multiple times".
> >>
> >> That wasn't the question asked. That was the example that was given and
> >> the
> >> question was can you do it in one statement.
> >>
> >> So the answer is no, using IndexOf.
> >
> > Okay. But the follow-on answer is "the best way to do it is to use
> > IndexOf repeatedly" possibly with "and you can always write your own
> > method to do this if you want".
>
> But that wasn't the original question. If you do that you are definately
> not doing it in one line (which was the question).

Of course you can do it in one line of code. It would just be a very
*long* line.

> The question was if there was a way - not "what is the best way".
> That doesn't negate your point of view on it. But that wasn't what
> was being asked. Obviously, the multiple IndexOf lines were known as
> the question was "is there another way in one line to do the same
> thing".

In that case, if you look at the response from Nicholas, it doesn't
even answer your question...

> And there was.

The problem is that it was portrayed as a *better* way, when I believe
it's a significantly *worse* way.

> > His answer talked about the "best bet" - although the question didn't
> > ask about the best way, his answer did. I disagree with that answer.
>
> That was obvious.
>
> But he was answering the question. You weren't.

Look at his post - given your restricted nature of what you view as the
question, he didn't actually answer it.



> The question was "is there a way to do this in one command". And then an
> example of what was being looked for.
>
> He answered that question and said that was the best bet (IHO).

Nope, he didn't answer the question of whether you could do it with
IndexOf.

> So far, you haven't answered that question.

Actually, I think I've stated several times that you can't do it with a
single call to IndexOf, which *does* answer that question.

> Creating another subroutine to call - sort of does it. But wasn't what was
> really being asked, according to the examples.
>
> So his answer was valid.

See above.

> >> And you can do a method which called
> >> IndexOf multiple times. But then it isn't one line, is it?
> >
> > You could put it in one line if you wanted to. It wouldn't be as easy
> > to read, but you could do it.
>
> Ok.
>
> It wasn't one statement, not one line.

It would still be one statement, in fact. Do you actually mean you're
after something which is a single method call? If so, that's a pretty
odd criterion to use for choice of implementation, IMO.



> >> I don't really remember what the context was originally. But I know they
> >> didn't have dots and brackets in it.
> >
> > And wouldn't ever?
>
> Probably not, but possible. But again, that wasn't the question.

But it affects whether Nick's answer was actually correct - whether his
"best bet" statement was true or not.

(I'm hoping Nick's going to be at the MVP summit and I can ask him for
a bit of clarification on this point - I'll let you know if I get to
chat with him.)

> > I would be willing to wager large amounts of money on others
> > (particularly junior programmers) finding it less simple though. I'm
> > absolutely certain that if thousands of programmers had to maintain the
> > IndexOf version and change it to look for "foo.bar", fewer would make a
> > mistake than thousands of equivalent programmers maintaining the
> > regular expression version.
> >
> You can make anything more complicated than it has to be. The question was
> still answered with a Regular Expression. Not whether you can make it more
> complicated.

But the answer given stated not just that you *could* do it with a
regular expression, but that a regular expression was the "best bet".
That, to me, is false.



> > To me, it's definitely signifiant. Using regular expressions here
> > introduces risk for no benefit.
> >
> Oh well.
>
> Life is risky.

You're really happy to just shrug your shoulders and introduce risk for
*no* benefit? Crikey.

> > I would hope that anyone maintaining a complex regular expression will
> > double-check what's going on. It's easy to conceive of someone
> > maintaining a simple one failing to do so.
> >
> What????
>
> So now you shouldn't use it because someone might not double-check it?

Absolutely. Something which looks simple but has a hidden twist is
dangerous. It deserves a comment at least. Now, something which
requires a comment to decrease the risk is likely to be worse than
something which doesn't.



> >> > No, just allow them where they make sense. Note that if you only use
> >> > them where they're going to be doing something fairly involved, it's
> >> > much less likely that an engineer will forget that he's actually
> >> > dealing with a regular expression than with a simple string.
> >>
> >> Already dealt with.
> >
> > Where?
>
> Ok. Maybe not.
>
> I really don't see where an engineer is going to "forget" he is dealing with
> a regular expression.

You have a lot more faith in developers than I do then. It's very easy
to make simple mistakes when you're perhaps a bit pushed for time. It's
better to reduce the scope of the error in the first place.

Now, even if the engineer remembers, he's quite possibly going to have
to check to see what needs escaping. So even if you don't buy the risk
argument, I can't see how you'd deny that it's making life that little
bit harder for the maintenance team - and, as I keep stressing, for
*no* benefit.

> > No, it's not less complicated. If you avoided using objects, the code
> > would be *much* harder to read and maintain.
>
> I think we are running out of arguments :)

Possibly. I'm still struggling to see how you can view the regex as
*not* more complicated. It's inherent in the power of regular
expressions - in order to be able to express very complicated patterns,
some simple patterns have to be made a bit more complex. Because
IndexOf limits itself to straight substring searches, it doesn't have
complicate things at all when all you want is a straight substring
search.

Jon Skeet [C# MVP]

unread,
Sep 28, 2005, 9:32:15 PM9/28/05
to
Jon Skeet [C# MVP] <sk...@pobox.com> wrote:
> (I'm hoping Nick's going to be at the MVP summit and I can ask him for
> a bit of clarification on this point - I'll let you know if I get to
> chat with him.)

<snip>

Update: I've now met Nick, and we've talked about many things. We
managed to stay on this topic for about a minute before moving onto
something else - it was one of those conversations. I wouldn't like to
trust my memory of the very brief mention of it to say whether or not
he agreed with me on the maintenance point.

tshad

unread,
Oct 1, 2005, 1:29:50 AM10/1/05
to
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1da55475e...@msnews.microsoft.com...

> Jon Skeet [C# MVP] <sk...@pobox.com> wrote:
> > (I'm hoping Nick's going to be at the MVP summit and I can ask him for
> > a bit of clarification on this point - I'll let you know if I get to
> > chat with him.)
>
> <snip>
>
> Update: I've now met Nick, and we've talked about many things. We
> managed to stay on this topic for about a minute before moving onto
> something else - it was one of those conversations. I wouldn't like to
> trust my memory of the very brief mention of it to say whether or not
> he agreed with me on the maintenance point.

He probably did.

I haven't had time to finish up our discussion, but will try to get to it
this weekend.

Tom

0 new messages