Something like:
someString.IndexOf("something1","something2","something3",0)
or would you have to do something like:
if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
Thanks,
Tom
Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- m...@spam.guard.caspershouse.com
"tshad" <tschei...@ftsolutions.com> wrote in message
news:OOoiQOJu...@TK2MSFTNGP09.phx.gbl...
This would be preferrable to the multiple if tests?
I don't know which is more efficient. Both would have to go back and test
for all the different items.
Thanks,
Tom
Personally, I'd go for the "if" tests - possibly with a helper method
using a params string array to aid readability - unless the performance
is really a problem, in which case measuring that performance and that
of the regular expressions would be an absolute necessity.
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
>Regular expressions are really powerful, but can be much harder to read
>than a series of very simple string operations.
But they really aren't in this case:
if (Regex.IsMatch(myString, @"something1|something2|something3"))
...
or even, in this special case:
if (Regex.IsMatch(myString, @"something[123]"))
...
I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:
string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";
if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...
So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.
But you're right about the performance question for simple cases like
this, of course.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
But it is nice to know the options.
BTW, what is the "@" for?
Thanks,
Tom
>But it is nice to know the options.
>
>BTW, what is the "@" for?
It defines a verbatim literal string. See here (MSDN):
http://shrinkster.com/81i
Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.
> I tend to think that regular expressions get hard to read when they are
> used to do complicated stuff - and then the alternative is usually not a
> "very simple string operation". Part of the reason, though, is that people
> don't know it's possible to stretch regular expressions over multiple
> lines and even use comments in them. I could rewrite the code above like
> this:
>
> string myRegex = @"
> something1 | # something1 is our first option
> something2 | # something2 would also be fine
> something3 # last chance, something3";
>
> if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
> ...
>
> So it's really easy to pick apart the expression and comment the parts - I
> don't think it's less readable than any other part of code. You have to
> know the language of course, but that's the same for any other programming
> language or construct out there.
Well, I don't have to learn (or more importantly, remember) *any* extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.
>Until, of course, something1 etc start having characters in which need
>escaping - how confident would you be that you'd get that right? It's
>an extra thing to think about - and I'm sure the real strings aren't
>actually "something1" etc.
Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.
>Well, I don't have to learn (or more importantly, remember) any extra
>bits of language other than C# (which I already need to know) to get it
>right with IndexOf, even if the strings I'm looking for contain things
>like dots, stars etc. That isn't true for regular expressions.
No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough? I believe that Regular
Expressions are a powerful technology well worth learning - and it's
probably good advice to stay clear of them for anything but the simplest
applications if you're not willing to put in a bit of time to get to know
them.
About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.
No, but it's still harder to remember than not having to remember
anything special at all, which is what you get with IndexOf.
In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.
Now, where's the *advantage* of using regular expressions in this case?
> >Well, I don't have to learn (or more importantly, remember) any extra
> >bits of language other than C# (which I already need to know) to get it
> >right with IndexOf, even if the strings I'm looking for contain things
> >like dots, stars etc. That isn't true for regular expressions.
>
> No, it isn't. But you won't get far in today's programming world if you
> don't know the first thing about SQL or XML, for example, so I guess
> you're not suggesting that one language is enough?
No - but I'm suggesting that when one language works perfectly well for
the task at hand, and it's the same language that the rest of your code
is written in, it's easier to stick within that language.
> I believe that Regular Expressions are a powerful technology well
> worth learning - and it's probably good advice to stay clear of them
> for anything but the simplest applications if you're not willing to
> put in a bit of time to get to know them.
Regular expressions are absolutely worth learning for where they
provide extra value. In cases like this, where they're only really
providing extra things to remember (what you need to escape, or to call
Regex's own escaping mechanism) I don't think there's any value.
> About IndexOf, as I meant to say already, as long as the problems you're
> trying to solve are the kind that can be solved with those simple string
> functions (and without resulting in huge algorithms), you'll probably have
> the performance argument on your side anyway.
Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.
>In a hurry, I can very easily see someone changing a string literal
>from one thing to another, not noticing that as it's a regular
>expression, they need to escape part of their new string.
In a hurry, all kinds of things can happen when making changes to source
code.
>Now, where's the advantage of using regular expressions in this case?
I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.
>>About IndexOf, as I meant to say already, as long as the problems you're
>>trying to solve are the kind that can be solved with those simple string
>>functions (and without resulting in huge algorithms), you'll probably have
>>the performance argument on your side anyway.
>
>Well, I'm much keener on the readability argument than the performance
>one - I suspect that the performance difference would rarely be of
>overall significance.
As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer. I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.
Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" *shouldn't* be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.
> >Now, where's the advantage of using regular expressions in this case?
>
> I wasn't saying there was one in the specific scenario the OP introduced.
> I was using the example to show that regular expressions don't have to be
> any more complicated than simple string operations.
But there's *always* the added complexity of "do I have to escape this
or not". There are certainly times when the string operations become
more complicated than the corresponding regular expressions (otherwise
they really would be pointless - something I've never suggested), but I
don't believe that's the case here.
> >>About IndexOf, as I meant to say already, as long as the problems you're
> >>trying to solve are the kind that can be solved with those simple string
> >>functions (and without resulting in huge algorithms), you'll probably have
> >>the performance argument on your side anyway.
> >
> >Well, I'm much keener on the readability argument than the performance
> >one - I suspect that the performance difference would rarely be of
> >overall significance.
>
> As I'm trying to say all the time, as soon as an implementation reaches a
> complexity that makes it worth thinking about regular expressions, I'm
> sure an alternative solution based on simple string functions won't be
> more readable any longer.
Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)
> I'd even go so far as to say that as soon as
> more than one call to a simple string function is needed for a given
> problem, most probably I'll find the regular expression solution more
> readable. This is, after all, a subjective decision to make.
Whereas three calls to IndexOf is *definitely* more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.
>>In a hurry, all kinds of things can happen when making changes to source
>>code.
>
>Indeed - but why make it even easier to introduce bugs? Changing a
>search from "somewhere" to "somewhere.com" shouldn't be something
>which requires significant thought, in my view - but it does as soon as
>you're using regular expressions.
But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.
>>As I'm trying to say all the time, as soon as an implementation reaches a
>>complexity that makes it worth thinking about regular expressions, I'm
>>sure an alternative solution based on simple string functions won't be
>>more readable any longer.
>
>Well, Nicholas certainly thought it worth thinking about regular
>expressions in this case - do you? (The earlier part of your reply
>suggests not, but the bit below suggests you do.)
>
>>I'd even go so far as to say that as soon as
>>more than one call to a simple string function is needed for a given
>>problem, most probably I'll find the regular expression solution more
>>readable. This is, after all, a subjective decision to make.
>
>Whereas three calls to IndexOf is definitely more readable than a
>regular expression which, depending on the strings involved may well
>need to involve escaping.
In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.
No - you just have to be careful when you're using regular expressions.
I prefer code which means I don't have to take as much care, because
being human, sooner or later I'll be careless. The fewer possibilities
I have for carelessness actually causing an error, the better.
I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you *and* every member of
your team?
> >Whereas three calls to IndexOf is definitely more readable than a
> >regular expression which, depending on the strings involved may well
> >need to involve escaping.
>
> In this case, as far as it's described by the sample we've seen, I
> wouldn't favor the usage of regular expressions.
Even though it's more than one call to a simple string function?
> I don't know whether the
> actual code that the OP is writing might justify regexes better. Anyway, I
> was merely using the case to demonstrate the fact that regular expressions
> don't have a readability problem, IMHO, or at least they don't need to
> have one if used properly.
They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.
I'm not saying they're hideously unreadable - just *less* readable.
That's enough for me.
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
>I know I couldn't off the top of my head list all the characters which
>need escaping for regular expressions - could you and every member of
>your team?
I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.
>>>Whereas three calls to IndexOf is definitely more readable than a
>>>regular expression which, depending on the strings involved may well
>>>need to involve escaping.
>>
>>In this case, as far as it's described by the sample we've seen, I
>>wouldn't favor the usage of regular expressions.
>
>Even though it's more than one call to a simple string function?
Probably... the number of calls is not really what counts, is it?
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.
I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.
>They have a readability problem compared with simple operations - they
>require more care than simple literals. To me, "more care required"
>means "lower readability and maintainability", which is a problem.
Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation. Use the
simple operations as long as it makes sense, but don't hesitate to look at
other solutions because you think someone else on the team might make a
mistake changing a string literal later on.
>I'm not saying they're hideously unreadable - just less readable.
>That's enough for me.
Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.
Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet", are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.
> >>>Whereas three calls to IndexOf is definitely more readable than a
> >>>regular expression which, depending on the strings involved may well
> >>>need to involve escaping.
> >>
> >>In this case, as far as it's described by the sample we've seen, I
> >>wouldn't favor the usage of regular expressions.
> >
> >Even though it's more than one call to a simple string function?
>
> Probably... the number of calls is not really what counts, is it?
I was only going by what you'd said previously:
<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>
> Sometimes, string parsing algorithms that don't make use of regular
> expressions involve several nested loops, several temporary variables and
> just a single call to a simple string function. Yet these beasts can be
> horrible because it takes only a short while until even the author can't
> reliably remember what the algorithm does.
Absolutely.
> I won't contest the fact that three lines of code, calling IndexOf three
> times, are probably a better alternative to a regular expression.
Goodo :)
> >They have a readability problem compared with simple operations - they
> >require more care than simple literals. To me, "more care required"
> >means "lower readability and maintainability", which is a problem.
>
> Well, let's agree to disagree. I'm still trying to make the point that the
> comparison with simple string literals is a bad one, because the two won't
> ever be equal alternatives in any real world problem situation.
I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.
> Use the simple operations as long as it makes sense, but don't
> hesitate to look at other solutions because you think someone else on
> the team might make a mistake changing a string literal later on.
If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.
> >I'm not saying they're hideously unreadable - just less readable.
> >That's enough for me.
>
> Jon, I'm with you most of the way. But there's a limit to the demand for
> readability, as I see it. I'm not likely to turn down a useful technology
> in cases where it is practically without alternatives because the solution
> doesn't please me aesthetically.
Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)
>>>Even though it's more than one call to a simple string function?
>>
>>Probably... the number of calls is not really what counts, is it?
>
>I was only going by what you'd said previously:
>
><quote>
>I'd even go so far as to say that as soon as more than one call to a
>simple string function is needed for a given problem, most probably
>I'll find the regular expression solution more readable.
></quote>
I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".
>>Well, let's agree to disagree. I'm still trying to make the point that the
>>comparison with simple string literals is a bad one, because the two won't
>>ever be equal alternatives in any real world problem situation.
>
>I don't see how you can say that when using regular expressions was one
>suggested solution, and using IndexOf was another suggested solution.
Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.
>>Use the simple operations as long as it makes sense, but don't
>>hesitate to look at other solutions because you think someone else on
>>the team might make a mistake changing a string literal later on.
>
>If the other solution is likely to be fundamentally simpler, I'm all
>for that. It was this particular situation that I was commenting on,
>and the general comment that regular expressions are often used as a
>sledgehammer to crack a pretty flimsy nut.
You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?
>>Jon, I'm with you most of the way. But there's a limit to the demand for
>>readability, as I see it. I'm not likely to turn down a useful technology
>>in cases where it is practically without alternatives because the solution
>>doesn't please me aesthetically.
>
>Me either - but where there is a practical alternative which is more
>readable, I'll go for that. If you only have one solution, you can't
>turn it down really, can you? (Unless you can forego the feature which
>requires it, of course, which is unlikely.)
Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)
Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.
> >>Well, let's agree to disagree. I'm still trying to make the point that the
> >>comparison with simple string literals is a bad one, because the two won't
> >>ever be equal alternatives in any real world problem situation.
> >
> >I don't see how you can say that when using regular expressions was one
> >suggested solution, and using IndexOf was another suggested solution.
>
> Sorry, I meant "simple string operations". And I meant that I wouldn't
> consider using a regular expression if an IndexOf could do the job just as
> well - the two are no equal alternatives because I wouldn't seriously
> consider one of them.
Right - but unfortunately (IMO) other people do.
> >If the other solution is likely to be fundamentally simpler, I'm all
> >for that. It was this particular situation that I was commenting on,
> >and the general comment that regular expressions are often used as a
> >sledgehammer to crack a pretty flimsy nut.
>
> You're right about that. Complex technologies tend to be misused more
> often than simple ones, don't they?
Absolutely...
> >Me either - but where there is a practical alternative which is more
> >readable, I'll go for that. If you only have one solution, you can't
> >turn it down really, can you? (Unless you can forego the feature which
> >requires it, of course, which is unlikely.)
>
> Well, usually someone will come forward with other solutions, however
> far-fetched. One that can actually be quite a good alternative to more
> complex regular expression scenarios is writing a parser - or rather,
> using a compiler compiler to create one. But in my experience there's a
> lot of room for nicely written regular expressions, somewhere between a
> few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)
Oh certainly. I'm really *not* trying to suggest that regular
expressions should never be used - just that they shouldn't be the
first port of call as soon as you need to do anything with a string :)
>>><quote>
>>>I'd even go so far as to say that as soon as more than one call to a
>>>simple string function is needed for a given problem, most probably
>>>I'll find the regular expression solution more readable.
>>></quote>
>>
>>I know I said that and I know you were referring to it. But I meant one
>>call as in "one call at runtime", as opposed to "one line of code that
>>makes the call".
>
>Not quite with you there - in this case, there would be three calls at
>runtime, and three lines of code.
And in this case I would be prepared to see things differently - I said
already that I don't believe in call counting. But the sentence you quoted
was meant more in the context of the problem I was describing, where
simple string functions are used as a part of a, possibly hugely
complicated, larger algorithm.
As soon as there are loops involved, which may or may not result in a
single line with such a call being executed multiple times, things start
getting complex very quickly in my experience. How often have you been
sitting there with the debugger running, counting characters in a string
to find that one-off problem somebody introduced? I'll take an enormously
unreadable regular expression over that task any day :-)
Actually, you're right.
But that was my point.
Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).
But the discussion was valid in you use the best tool for the situation.
>
>> As far as readability, it has nothing to do with Regular Expressions
>> whether
>> it is readable or not, as Oliver mentions, but how you write it.
>
> No - I believe that searching for "jon.skeet" with IndexOf is clearer
> than searching for "jon\\.skeet" or @"jon\.skeet".
That's maybe true. But it would be clear to someone used to using both C#
and Regex.
Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.
>Which of them
> contains just the information which is actually of concern, and which
> contains information which is only present due to the technology used
> to do the searching?
>
>> You can also make some pretty unreadable C# code as well.
>
> Sure, but that's no reason to use regular expressions just to make
> things worse.
I agree with you that readability is important.
It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).
The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.
>
>> Readability is a function of the programmer not the language (in most
>> cases).
>
> Yes, but it's the programmer's decision how to approach things -
> whether you do things the simple way or the complex way. You *could*
> implement the string search by manually iterating over all the
> characters in the string, perhaps even writing your own state machine
> to do it. The code could be pretty readable considering what it's doing
> - but it's *bound* to be more complex than using IndexOf.
I agree.
Just because you can - doesn't mean you should.
>
>> As was also mentioned you also need to know the language. For someone
>> not used to objects, abstract objects and interfaces are also hard to
>> read.
>
> Sure - but why introduce unnecessarily complexity? You're already
> writing C#, so you'd better know C# - but why add regular expressions
> into the mix when they're unnecessary?
But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix. But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer. In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).
I personally feel that both solutions are equally usable and readable (in
this situation).
I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.
I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:
SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).
>
>> I like seeing different options and make a choice. Sometimes I may use
>> something like Regex just so I am used to using it, as long as the
>> problem
>> warrants it.
>
> And that's the point - I don't think this problem *does* warrant it.
I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.
>
>> You don't use it - you lose it.
>
> So do you add a database when you just need to do a hashtable lookup,
> just in case you forget SQL? Do you use reflection to get at the value
> of a property, just in case you forget how to use that? I hope not.
Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.
>
> It's very important to use appropriate technology, rather than using it
> for the sake of it. (It's one thing to experiment with technology for
> the sake of it as a learning tool, but I wouldn't do it in production
> code.)
Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.
But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.
As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.
Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?
Tom
> --
> Jon Skeet - <sk...@pobox.com>
> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
You've mentioned that as being a problem a couple of times.
What do you mean by this?
Are you talking about stopping if you find the first one matching?
Thanks,
Tom
I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.
As far as readability, it has nothing to do with Regular Expressions whether
it is readable or not, as Oliver mentions, but how you write it.
You can also make some pretty unreadable C# code as well. Readability is a
function of the programmer not the language (in most cases). As was also
mentioned you also need to know the language. For someone not used to
objects, abstract objects and interfaces are also hard to read.
I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the problem
warrants it.
You don't use it - you lose it.
Tom
No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.
Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.
"jon\\.skeet"
or
@"jon\.skeet"
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
Got ya.
I thought you were talking about escaping the function/call as you might in
a loop when you find what you are looking for.
Thanks,
Tom
Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.
> As far as readability, it has nothing to do with Regular Expressions whether
> it is readable or not, as Oliver mentions, but how you write it.
No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet". Which of them
contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?
> You can also make some pretty unreadable C# code as well.
Sure, but that's no reason to use regular expressions just to make
things worse.
> Readability is a function of the programmer not the language (in most
> cases).
Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.
> As was also mentioned you also need to know the language. For someone
> not used to objects, abstract objects and interfaces are also hard to
> read.
Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?
> I like seeing different options and make a choice. Sometimes I may use
> something like Regex just so I am used to using it, as long as the problem
> warrants it.
And that's the point - I don't think this problem *does* warrant it.
> You don't use it - you lose it.
So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.
It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.
> But the discussion was valid in you use the best tool for the situation.
Indeed.
> >> As far as readability, it has nothing to do with Regular Expressions
> >> whether
> >> it is readable or not, as Oliver mentions, but how you write it.
> >
> > No - I believe that searching for "jon.skeet" with IndexOf is clearer
> > than searching for "jon\\.skeet" or @"jon\.skeet".
>
> That's maybe true. But it would be clear to someone used to using both C#
> and Regex.
But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?
> Also, you have the same problem when dealing with web pages or getting a
> file from the disk. You still use the escape character there (and as you
> say, is a little confusing) - but you still do it.
You have to know the C# escaping, but not the regular expression
escaping.
> >> You can also make some pretty unreadable C# code as well.
> >
> > Sure, but that's no reason to use regular expressions just to make
> > things worse.
>
> I agree with you that readability is important.
>
> It used to be that people didn't like C and C++ for exactly the same reason
> you point out. The code was not as clear as COBOL or Basic and that was the
> complaint back then. I happened to be a Fortran programmer at that time and
> was not interested to moving to C for that reason (not that Fortran was
> better - readability wise).
>
> The problem with C back that was that even though much of the code was
> really cryptic. But it didn't have to be, that was just how people coded
> back then. Mainly, it was important to make the most efficient code
> possible because of the limited computing power and efficient rarely equates
> to readable. And I am not even talking about compiling and linking and all
> the options and cryptic command lines.
To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.
> > Yes, but it's the programmer's decision how to approach things -
> > whether you do things the simple way or the complex way. You *could*
> > implement the string search by manually iterating over all the
> > characters in the string, perhaps even writing your own state machine
> > to do it. The code could be pretty readable considering what it's doing
> > - but it's *bound* to be more complex than using IndexOf.
>
> I agree.
>
> Just because you can - doesn't mean you should.
Exactly.
> > Sure - but why introduce unnecessarily complexity? You're already
> > writing C#, so you'd better know C# - but why add regular expressions
> > into the mix when they're unnecessary?
>
> But if you know both and as I (and you) mentioned regex is part of .net as
> is C# - so it is already in the mix.
No, it's not. It's not already used in every single C# program, any
more than SQL is.
> But you're right, don't introduce any
> more complexity that necessary. But if it's 6 of one ... it's really up to
> the programmer.
In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.
> In the original case, that was what it was. You can't tell
> me that you feel that the solution suggested for this case was even close to
> being unreadable (if you are even a stones throw from understanding Regular
> Expressions).
It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.
> I personally feel that both solutions are equally usable and readable (in
> this situation).
I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?
> I have also seen times when I just couldn't find an easy solution in C# or
> VB and it was fairly easy in Regex.
Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.
> I myself would usually opt for the C# or VB solutions first, but would have
> no problem using Regex. As a matter of fact, I use Regex to strip commas
> and $ from my textbox fields before writing it to SQL as it was the best
> solution I could find. Such as:
>
> SalaryMax.Text =
> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
>
> At the time, I couldn't seem to find as simple a solution as this in VB.Net
> so I use this (not saying there isn't one).
And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
.Replace(",", ""));
I know which version I'd rather read...
> > And that's the point - I don't think this problem *does* warrant it.
>
> I agree that is isn't necessary here, but I don't think it is warranted or
> unwarranted here. I think it's just as readable either way.
But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.
> > So do you add a database when you just need to do a hashtable lookup,
> > just in case you forget SQL? Do you use reflection to get at the value
> > of a property, just in case you forget how to use that? I hope not.
>
> Of course not. But as was mentioned there are times where Regex may be a
> good solution and if you can do it either way, why not.
Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.
> > It's very important to use appropriate technology, rather than using it
> > for the sake of it. (It's one thing to experiment with technology for
> > the sake of it as a learning tool, but I wouldn't do it in production
> > code.)
>
> Right. But Regex is not inappropriate technology. As you said, trying to
> loop through each character when there is an easier way is a bit much.
As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.
> But Regex is valid and is an appropriate method for handling strings and if
> you are as comfortable with one as the other than it isn't inappropriate.
> It's all in how you use it. And I was not saying experiment with it. I was
> saying using it for the sake of staying familier with it. I don't want to
> need to use it and have to figure it out when I need to use it.
Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.
> As you said. Use the appropriate tool. If the appropriate tool is Regex,
> it is going to be d... inconvenient to need it and not know how to use it.
I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.
> Now I am not saying go out and learn every tool out there. But if it is a
> valid tool in your particular environment, and it is available - why would
> you not avail yourself of it?
Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:
string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or
string x = something.Name;
?
Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?
I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language). It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.
You don't build pages with it. It isn't procedural. It is a tool used by
the other languages. You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).
>
>> But the discussion was valid in you use the best tool for the situation.
>
> Indeed.
>
>> >> As far as readability, it has nothing to do with Regular Expressions
&g