Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular Expression Question. Please help!

0 views
Skip to first unread message

alk...@hotmail.com

unread,
Aug 29, 2005, 8:45:10 PM8/29/05
to
I am trying to clean up a city name. Inside the letters only one of 3
characters (dot, space and hyphen) is allowed (1 max). For example:
Los-Angeles,Los Angeles and N.Westminster are ok.
Outside the letters nothing allowed.

So I need to do a replace and get this:
los angeles

from this:
___..&* los - .an#$geles. ^&...____ .


or this:
los-angeles

from this:
___..&* los- .an#$geles. ^&...____ .


Please help.

shiv_k...@yahoo.com

unread,
Aug 29, 2005, 11:22:21 PM8/29/05
to

Chris Priede

unread,
Aug 29, 2005, 11:58:02 PM8/29/05
to
Hi,

alk...@hotmail.com wrote:
> I am trying to clean up a city name. Inside the letters only one of 3
> characters (dot, space and hyphen) is allowed (1 max). For example:
> Los-Angeles,Los Angeles and N.Westminster are ok.
> Outside the letters nothing allowed.

Not what you asked and potentially n/a for what you are doing -- I know. :P

If you are trying to clean up postal addresses, consider purchasing an
inexpensive zipcode database for lookup. Between towns with odd names and
people molesting them at the time of entry, you will never get it right with
inflexible rules.

--
Chris Priede (pri...@panix.com)


Ludovic SOEUR

unread,
Aug 30, 2005, 4:09:02 AM8/30/05
to
This is what you are looking for :
^[^a-zA-Z]*|[^a-zA-Z]*$|[^a-zA-Z-. ]*([-. ])?[^a-zA-Z]*

myStringCleaned=Regex.Replace(myStringToClean,@"^[^a-zA-Z]*|[^a-zA-Z]*$|[^a-
zA-Z-. ]*([-. ])?[^a-zA-Z]*","$1");

Here some explanations :
^[a-zA-Z]* means that all line starting with anything else than a letter
must be replaced by $+ wich is empy (there is no capuring group).
| means "else"
[^a-zA-Z]*$ means that all line ending with anything else than a letter
must be replaced by $+ wich is empty too.
| means "else"
[^a-zA-Z-. ]* means : every sequence that does not match a letter or dot,
hyphen and space
([-. ])? means that you want to match one of these character
(dot, hyphen and space) the first time they appears and all the sequence
matched will be replaced by $+ wich is this characted you have just matched
[^a-zA-Z]* means anything else than a letter

$+ means you replace all sequences by the first match. For
the first and the second part, it is empty. For the last part, it could be
empty if there are no dot,hypen and space) or contains the last captured
group if there was a match.

So,

___..&* los - .an#$geles. ^&...____ .

will be replaced
by los angeles
and


___..&* los- .an#$geles. ^&...____ .

will be replaced
by los-angeles

To understand, let take the example ___..&* los- .an#$geles. ^&...____ .
___..&* will be matched in the first part of the regex and will be replaced
by nothing ($+=empty).
- . will be matched in the third part and will be replaced by -
($+=hyphen)
#$ will be matched in the third part and will be replaced by
nothing ($+=empty)
. ^&...____ . will be matched in the second part of the regex and will be
replaced by nothing ($+=empty).
So you have 'empty' and 'los' and 'hyphen' and 'an' and 'geles' and 'empty'
that is 'los-angeles'.

Hope it helps,

Ludovic SOEUR


<alk...@hotmail.com> a écrit dans le message de
news:1125362710.7...@o13g2000cwo.googlegroups.com...

Oliver Sturm

unread,
Aug 30, 2005, 5:47:34 AM8/30/05
to
alk...@hotmail.com wrote:

>I am trying to clean up a city name. Inside the letters only one of 3
>characters (dot, space and hyphen) is allowed (1 max). For example:
>Los-Angeles,Los Angeles and N.Westminster are ok.
>Outside the letters nothing allowed.

Others have already replied to your question. I was just curious about
this: what am I gonna do if I happen to live in
Krung-thep-maha-nakorn-boworn-ratana-kosin-mahintar-ayudhya-amaha-dilok-pop-nopa-ratana-rajthani-burirom-udom-rajniwes-mahasat-arn-amorn-pimarn-
avatar-satit- sakattiya-visanukam in Thailand, or even just in St John's
Chapel, Newcastle upon Tyne or Stoke-on-Trent?

Oliver Sturm
--
omnibus ex nihilo ducendis sufficit unum
Spaces inserted to prevent google email destruction:
MSN oliver @ sturmnet.org Jabber sturm @ amessage.de
ICQ 27142619 http://www.sturmnet.org/blog

alex

unread,
Aug 30, 2005, 6:27:36 AM8/30/05
to
Thank you everyone for the brilliant ideas :-) especially to Ludovic!

to Oliver: I will try to go easy on "-" and "'" :-)

Thanks again!

--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com

0 new messages