Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

character classes & regular expressions

10 views
Skip to first unread message

Ivan Shmakov

unread,
May 4, 2012, 10:35:41 PM5/4/12
to
>>>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <onei...@gmail.com> wrote:
>>>>> Kaz Kylheku <k...@kylheku.com> writes:

[Cross-posting to news:comp.text, for the subject being
discussed is hardly specific to Unix shells; really, this time.]

[...]

>>> In my defense, I've never used this committee-designed dog of a
>>> syntax until today, which was only because I was groping for a
>>> quick workaround, and likely never will again.

>>> (I also refuse to implement it in my regex engine, though I have
>>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
>>> far as I will go.)

>> How do you specify a "single character, either an upper-case letter
>> or a digit" within such a regular expression, then?

> [A-Z0-9]
> [A-Z\d]

It happens that the native languages of the most people of the
world either use extensions to the Latin script (beyond those in
ASCII, such as J or W), or use a script not derived from Latin
at all. (Greek-based scripts are not uncommon, for instance;
FWIW, the Latin script is based on the Greek one itself.)

Good luck selling your product to anyone speaking French, Greek,
Polish or Russian.

--
FSF associate member #7257

Kaz Kylheku

unread,
May 4, 2012, 10:42:30 PM5/4/12
to
["Followup-To:" header set to comp.unix.shell.]
Sorry, you're mistaken. In Russia, France, Poland, Japan, you name it,
coders still want [A-Z] to actually denote A-Z.

In GNU flex,

[A-Z] { action(); }

matches A, B, C ... Z. This is nicely baked at the time flex is run,
and not perturbed by any environment variables in the run time of the
generated scanner.

A Russian parser-writing hacker expects this behavior.

[A-Z] meaning anything else is a fuckup by morons, regardless of what is
in the environment or whether or not the application invoked setlocust.
Errr, setlocale, damn it. Why do I keep doing that?

Ivan Shmakov

unread,
May 4, 2012, 11:00:57 PM5/4/12
to
>>>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>>> On 2012-05-05, Ivan Shmakov <onei...@gmail.com> wrote:
>>>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <onei...@gmail.com> wrote:

> ["Followup-To:" header set to comp.unix.shell.]

Fow what reason, I wonder? The issue being discussed has little
relation to the Unix Shells per se. (And indeed, you provide an
example in GNU flex in your own post.)

[...]

>>>> How do you specify a "single character, either an upper-case
>>>> letter or a digit" within such a regular expression, then?

>>> [A-Z0-9]
>>> [A-Z\d]

>> It happens that the native languages of the most people of the
>> world either use extensions to the Latin script (beyond those in
>> ASCII, such as J or W), or use a script not derived from Latin
>> at all. (Greek-based scripts are not uncommon, for instance;
>> FWIW, the Latin script is based on the Greek one itself.)

>> Good luck selling your product to anyone speaking French, Greek,
>> Polish or Russian.

> Sorry, you're mistaken. In Russia, France, Poland, Japan, you name
> it, coders still want [A-Z] to actually denote A-Z.

Yes.

Still, they want for a way to denote a "single character, either
an upper-case letter or a digit", which is precisely what
[[:upper:][:digit:]] is for (and which is the notation that,
IIUC, you were opposed to.)

[...]

Cydrome Leader

unread,
May 5, 2012, 10:31:05 PM5/5/12
to
like greeks have money to buy software or a russian has ever made a legit
software purchase.

Ivan Shmakov

unread,
May 6, 2012, 1:15:44 AM5/6/12
to
>>>>> Cydrome Leader <pres...@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <onei...@gmail.com> wrote:
>>>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <onei...@gmail.com> wrote:

[Cross-posting into news:alt.conspiracy.microsoft, as a possible
aid for kill-filing.]

[...]

>>>> How do you specify a "single character, either an upper-case
>>>> letter or a digit" within such a regular expression, then?

>>> [A-Z0-9]
>>> [A-Z\d]

>> It happens that the native languages of the most people of the world
>> either use extensions to the Latin script (beyond those in ASCII,
>> such as J or W), or use a script not derived from Latin at all.
>> (Greek-based scripts are not uncommon, for instance; FWIW, the Latin
>> script is based on the Greek one itself.)

>> Good luck selling your product to anyone speaking French, Greek,
>> Polish or Russian.

> like greeks have money to buy software or a russian has ever made a
> legit software purchase.

There were the rumors that Sberbank is the largest partner of
Microsoft in Europe. (Perhaps [1] may shed some light on this.)

And not to mention all those gamers on Steam.

One may sell services based on software just as well, BTW.

[1] http://download.microsoft.com/documents/customerevidence/6062_Sberbank.doc

Cydrome Leader

unread,
May 6, 2012, 3:47:49 AM5/6/12
to
So 10 years ago, one bank in russia may have had some legit microsoft
licenses. This alone is actually impressive.

Everything else is russia is still pirated.






Ivan Shmakov

unread,
May 10, 2012, 8:28:26 AM5/10/12
to
>>>>> Cydrome Leader <pres...@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <onei...@gmail.com> wrote:
>>>>> Cydrome Leader <pres...@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <onei...@gmail.com> wrote:

[Cross-posting to news:comp.software.licensing and
news:misc.int-property, and setting Followup-To: there, for the
discussion doesn't belong to the Newsgroups: currently in
effect.]

[...]

>>>> Good luck selling your product to anyone speaking French, Greek,
>>>> Polish or Russian.

>>> like greeks have money to buy software or a russian has ever made a
>>> legit software purchase.

>> There were the rumors that Sberbank is the largest partner of
>> Microsoft in Europe. (Perhaps [1] may shed some light on this.)

>> And not to mention all those gamers on Steam.

>> One may sell services based on software just as well, BTW.

>> [1] http://download.microsoft.com/documents/customerevidence/6062_Sberbank.doc

> So 10 years ago, one bank in russia may have had some legit microsoft
> licenses. This alone is actually impressive.

Actually, free software (as in freedom) is quite popular in
Russia, as is freeware (as in beer), although license terms
violations also occur with these two.

Also to note is that the copyright law in Russia was extended to
cover software in 1994, IIRC, and it took a decade for the
common people, as well as the judicial system itself, to get
accustomed to the concept.

In the recent years, the laws made a shift towards more severe
punishments, and there were some widely-publicized court cases
related to the copyright law. The net result is that illegal
copies of software are now rarely seen at least in state-owned
enterprise (while being commonplace there in the mid-1990s.)
The proliferation of mobile computers (which typically come with
an OEM-licensed version of an OS pre-installed) also made such
copies somewhat harder (though not impossible altogether) to
find at home.

> Everything else is russia is still pirated.

When it comes to the terms, I doubt that the victims of the real
pirates (say, [1, 2]) would readily accept the very notion of
the corporations being "piracy victims, too."

That being said, I share the opinion of that the copyright law,
in its current form, /impedes/ progress, instead of facilitating
it. I've briefly read through [3], and I'd like to recommend it
to anyone interested in this view.

[1] http://seattletimes.nwsource.com/html/nationworld/2014376628_apuspiracyvictimsmemorial.html
[2] http://en.wikipedia.org/wiki/Piracy_in_Somalia
[3] http://mitpress.mit.edu/books/full_pdfs/Access_to_Knowledge_in_the_Age_of_Intellectual_Property.pdf
0 new messages