Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Character 'i' not in IsBasicLatin block?

61 views
Skip to first unread message

Jürgen Exner

unread,
Dec 17, 2014, 1:37:24 PM12/17/14
to
Can anyone explain why I am getting a False if my string contains the
letter 'I'?
The upper-case \P is negating the character group. In this case here the
RE is actually checking if the string contains any character that is
_not_ in IsBasicLatin. Or in other words any character that is not
ASCII.

PS: "Awesome - I know" -match "(\P{IsBasicLatin})"
True
PS: "Awesome - x know" -match "(\P{IsBasicLatin})"
False

To the best of my knowledge 'I' is part of BasicLatin.

jue

Marcel Müller

unread,
Dec 20, 2014, 4:18:37 AM12/20/14
to
I'm not an expert in character sets, but I just tested your string on my
machine with the same results.. It seems to me that the letter "i" is
ambivalent, depending on which way you check.

To see the validation for each character, i split your string up and ran
the regexp against every single char:

$string = "Awesome - I know"
$string.tochararray() | foreach {$_ ;$_ -match "(\P{IsBasicLatin})"}

This would display each letter and the correspondig result on the
console, where "I" will be the only one returning "True".

If run with:

$string.tochararray() | foreach {$_ ;[int][char]$_}

it correctly displays the ASCII-Code #73 for the letter I, which
definitely IS BasicLatin.

On the other hand, if i run it with a \p (opposed to \P), to see, if
each letter was indeed part of BasicLatin, every single character
returns a "True" value", even the "I".

Maybe this might be a workaround unless someone else can come up with a
proper explanation as to why exactly it happens and how to avoid this,
just to check every char for being basiclatin and test the result for
any one or more occurances of not passing the test.

Marcel
0 new messages