Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regex Question

0 views
Skip to first unread message

AMP

unread,
Apr 21, 2008, 12:24:35 PM4/21/08
to
Hello,
I am coming back to a project and I dont remember what the following
Regex says
I do know it removes all \r\n from the string, but I dont see how.
Can someone explain this one?

Regex re = new Regex(@"([\x00-\x1F\x7E-\xFF]+)",
RegexOptions.Compiled);
string op = re.Replace(FileToParse, "");

Thanks
Mike

Gilles Kohl [MVP]

unread,
Apr 21, 2008, 1:36:17 PM4/21/08
to
On Mon, 21 Apr 2008 09:24:35 -0700 (PDT), AMP <ampe...@gmail.com>
wrote:

How it works? The outer parentheses are redundant IMHO. The regex
boils down to a positive character group with two ranges, the start
and end of which (respectively) being expressed as hexadecimal
escapes: \x00-\x1F (0 to 31 in decimal) and \x7E-\xFF (126 to 255 in
decimal). With the appended "+", it basically means "one or more
characters between 0-31 resp. 126-255".

Replacing all these occurences with nothing (empty string) does far
more than just remove \r and \n - it removes all characters in the
range 0-31 and 126-255. The intention is probably to kill anything
that is not in the "ASCII" range. Unfortunately, it also kills the
tilde "~" (126).

It will also remove e.g. accents and umlaut characters in the range
128-256. What it will NOT remove are Unicode characters from 256
upwards.

Try e.g.

string originalString = "Testing <\u00e7> <\u0107> ";

Regex re = new Regex(@"([\x00-\x1F\x7E-\xFF]+)",
RegexOptions.Compiled);

string replacedString = re.Replace(originalString, "");

MessageBox.Show(originalString);
MessageBox.Show(replacedString);

The first "special" character, a lowercase C with cedilla, will be
removed. The second one, a lowercase c with acute accent, will not be
affected.

(My suggestion, if your intention is to remove anything not in the
range 32-126, would be to use this:

Regex re = new Regex(@"[^\x20-\x7E]+", RegexOptions.Compiled);

instead.)

Regards,
Gilles.


0 new messages