Because they aren't part of the ANSI C standard. :-)
Seriously, ANSI wasn't out to invent the C language. That was already
done. All they were doing was standardizing and codifying existing
practice. I guess they felt that there weren't enough existing
implementations to include it.
--
D'Arcy J.M. Cain (da...@druid.com) |
Planix, Inc. | Democracy is three wolves and a
Toronto, Ontario, Canada | sheep voting on what's for dinner.
+1 416 424 2871 (DoD#0082) (eNTP) |
>Can anyone explain to me why strupr() and strlwr() are not part of the
>ANSI C library?
They aren't in the _current_ standard because they _weren't_ standard.
By and large X3J11 invented very little.
It is actually astonishingly difficult to define case conversion in a way
that makes sense outside US English. To start with,
- do you want something that affects _only_ the 26 unaccented letters?
- do you want something that affects _all_ the letters in your current
code page? If you do, which of the several code pages that may be
active at the same time do you want?
- several writing systems (a more precise term than 'languages') may
share a code page; which writing system do you want to use the
case rules of? (I am told that there are two different conventions
for upper-casing French; one wipes off the accents and the other
keeps them.)
- what do you want done with a string containing words from different
languages, and how do you propose to tell which is which?
- in the English writing system, words differing only in alphabetic
case are quite often *different* words. For example,
- ACRE (American Council for Religious Education)
- Acre the city
- acre the unit of land area.
Case conversion destroys information.
- what are you going to do about lower case letters that have no
upper case equivalent (y-diaeresis in ISO Latin 1), or an upper
case equivalent that is more than one letter (sharp s in ISO
Latin 1, looks like lower case beta)? What are you going to do
about upper case letters that have more than one lower case
equivalent, depending on where the letter is in a word (Greek
Sigma, for example)
- have you considered the possibility that you may have to deal with
a string of multibyte characters, some of the bytes of which may
_accidentally_ coincide with the normal codes of letters?
In the mean time, just how much work does it take to write
#include <ctype.h>
char *strlwr(char *s) {
unsigned c;
unsigned char *p = (unsigned char *)s;
while (c = *p) *p++ = tolower(c);
}
char *strupr(char *s) {
unsigned c;
unsigned char *p = (unsigned char *)s;
while (c = *p) *p++ = tolower(c);
}
They don't actually _work_ in any very interesting sense, but they _are_
what strupr/strlwr typically _do_.
None of the questions I listed above was idle. The systems I normally use
(UNIX, Mac) make them serious problems right now. Both MacOS and Solaris
do provide tools to tackle the problems, and the Solaris ones are close
relatives of a proposal before the C committee. Just consider the last
example: if I write a program that receives a file name in its argument
vector, it really truly is quite possible right now that it might be a
multibyte encoding of a string of Chinese characters. If I just smash
the bytes, under the assumption that I am dealing with ISO Latin 1, the
characters will be garbled beyond any hope of recognition.
Note that converting DOS file names to one case in order to determine whether
they must refer to the same file is neither necessary nor sufficient.
--
"The complex-type shall be a simple-type." ISO 10206:1991 (Extended Pascal)
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.
This is very true, however, the fact remains that tolower() and
toupper() are defined, and that sometimes it is USEFUL to convert
case. You make it sound (see your lines below about DOS) as if it is
unnecessary.
>In the mean time, just how much work does it take to write
>
> #include <ctype.h>
>
> char *strlwr(char *s) {
> unsigned c;
> unsigned char *p = (unsigned char *)s;
>
> while (c = *p) *p++ = tolower(c);
> }
>
> char *strupr(char *s) {
> unsigned c;
> unsigned char *p = (unsigned char *)s;
>
> while (c = *p) *p++ = tolower(c);
> }
>
>They don't actually _work_ in any very interesting sense, but they _are_
>what strupr/strlwr typically _do_.
>
With a few exceptions, most notably that your strupr() converts to
lowercase, such functions are useful to some people. For example...
one must adopt some standard when providing migration tools similar to
the ones I develop and maintain every day. My clients demand a
sensible remapping of uppercase and lowercase VMS-style names (which
are case-insensitive and primarily dictated by preferred style) to UNIX
names (which are case sensitive).
These clients really don't care about chinese multibyte characters.
I'm not saying that such concerns are not valid for the standards
committee, but you make it sound like strupr() and strlwr() are
completely useless.
>Note that converting DOS file names to one case in order to determine whether
>they must refer to the same file is neither necessary nor sufficient.
See above. Migrating a applications from one OS to another frequently
mandates some kind of filename conversion.
Jerry A. Bolton, Jr.
--
NOTE: The opinions expressed herein are mine. You can like them, or you
can hate them, but you can't hold anyone but me responsible for them.
--
Jerry A. Bolton, Jr. | #define DISCLAIMER printf("%s",std_disclaimer)
Western Michigan University +----------------------------------+------------
Major: Computer Science | Ask me about Linux 1.0, | Atari 8-bit
E-Mail: sqbo...@cs.wmich.edu | the *FREE* i386/i486 Unix Clone | Forever!
>Can anyone explain to me why strupr() and strlwr() are not part of the
>ANSI C library?
No prior art? Not needed often enough to warrant a bigger language?
Trivial to implement when you do need them? I don't know - your
guess is as good as mine!
--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------
> In the mean time, just how much work does it take to write
>
> #include <ctype.h>
>
> char *strlwr(char *s) {
> unsigned c;
> unsigned char *p = (unsigned char *)s;
>
> while (c = *p) *p++ = tolower(c);
> }
>
> char *strupr(char *s) {
> unsigned c;
> unsigned char *p = (unsigned char *)s;
>
> while (c = *p) *p++ = tolower(c);
> }
>
> They don't actually _work_ in any very interesting sense, but they _are_
> what strupr/strlwr typically _do_.
which is invalid ANSI C because the function names strupr() and strlwr()
are in the implementation's part of the namespace[1]. If you're rolling
your own functions in this manner, the names str_upr() and str_lwr() would
be better, since these are not reserved.
Are there any implementations that diagnose the use of identifiers reserved
by the standard but not actually defined by the implementation? I see that
gcc -ansi -pedantic does not, at least in the gcc version we have here.
[1] All names beginning with str and a lower-case letter are reserved for
use with external linkage by section 7.1.3/4.1.2.1 in conjunction with
section 7.13.8/4.13.8. Since section 6.1.2/3.1.2 allows case-insensitive
treatment of identifiers with external linkage, names like StrUpr are
also effectively reserved by this.
--
Mark Brader | "I noted with some interest that Fahrenheit was
m...@sq.com | also used in the weather forecast, but there the
SoftQuad Inc., Toronto | gas marks were missing." -- Ivan A. Derzhanski
This article is in the public domain.
>
> jo...@freenet3.scri.fsu.edu (Joe Ottinger) writes:
>
> >Can anyone explain to me why strupr() and strlwr() are not part of the
> >ANSI C library?
>
> They aren't in the _current_ standard because they _weren't_ standard.
> By and large X3J11 invented very little.
>
> It is actually astonishingly difficult to define case conversion in a way
> that makes sense outside US English. To start with,
[snip]
I agree wholeheartedly with your comments...but not your qualifier
(i.e. US English) - what about UK English? (And, in your case, Australian
English?)
Regards,
Andy
P.S.
:-)
--
=============================================================================
| Andy Sawyer Internet (Personal) : an...@thone.demon.co.uk |
| Compu$erve (Business) : 100432,1713 |
=============================================================================
|The opinions expressed above are my own, but you are granted the right to |
|use and freely distribute them. I accept no responsibility for any injury, |
|harm or damage arising from their use. -- The Management. |
=============================================================================
>It is actually astonishingly difficult to define case conversion in a way
>that makes sense outside US English.
Sory, but I strongly disagree. Nearly all languages (AFAIK) have clear
rules defining the correspondance between upper and lower case characters.
If it is 'astonishingly difficult' to define case conversion outside
English, how do you think the Standard Library will cope with toupper()
and tolower() in non-English locales? After all, strupper relies on
EXACTLY the same rules as toupper.
To start with,
> - do you want something that affects _only_ the 26 unaccented letters?
No, you want something which translates as many characters as necessary
according to the rules of the current locale.
> - several writing systems (a more precise term than 'languages') may
> share a code page; which writing system do you want to use the
> case rules of? (I am told that there are two different conventions
> for upper-casing French; one wipes off the accents and the other
> keeps them.)
In which case, you need two locales. The programmer sets the locale that
he requires and lets the library worry about the details.
> - what do you want done with a string containing words from different
> languages, and how do you propose to tell which is which?
Assuming you mean languages with different character sets, the answer is
how do you propose to do this anyway? If the character set includes all
the characters you require (e.g. Unicode) then there is no problem -
though you might need a special locale to do the dual translation - it's
really a bit out of my depth.
If you use typical 8-bit character sets (i.e. variants for each language),
how do you propose to output the string anyway? Change code pages in
mid-string?
> - in the English writing system, words differing only in alphabetic
> case are quite often *different* words. For example,
> - ACRE (American Council for Religious Education)
> - Acre the city
> - acre the unit of land area.
> Case conversion destroys information.
err... So? If you don't want to destroy case sensitive meaning, why on
earth would you run it through a function to change its case? This is a
design issue, it has nothing to do with rules for HOW to convert case.
> - what are you going to do about lower case letters that have no
> upper case equivalent (y-diaeresis in ISO Latin 1), or an upper
> case equivalent that is more than one letter (sharp s in ISO
> Latin 1, looks like lower case beta)? What are you going to do
> about upper case letters that have more than one lower case
> equivalent, depending on where the letter is in a word (Greek
> Sigma, for example)
What are you going to do about numerals that have no upper case equivilent?
(OK, I'm pushing it a bit.) Easy, if there is no equivilent, you don't
change it! If there are multiple possibilities, youhave to build a lot
more intelligence into the algorithm. (If there is a deterministic rule
for people, it must be deterministic for computers, too.) You will say
that this can be very complex: true, but no one said I18N was always easy.
Also, even if it were utterly impossible to write a locale for some
languages, there is no reason why every one else couldn't use the locale-
dependant functions.
> - have you considered the possibility that you may have to deal with
> a string of multibyte characters, some of the bytes of which may
> _accidentally_ coincide with the normal codes of letters?
>
But my understanding is that you DON'T apply classification or
translation functions to multi-byte characters, you always do it
on the wide-character equivilents. Then you use the special wide-
character functions in <wctype.h> and <wchar.h> that have just been
approved as Ammendment 1.
>In the mean time, just how much work does it take to write
>
>They don't actually _work_ in any very interesting sense, but they _are_
>what strupr/strlwr typically _do_.
So if you can write them, there is no reason why they couldn't have
been put in the standard library apart from the obvious one - that they
weren't (AFAIK) in the /usr/group libraries that were the base document
for the standard.
>
>None of the questions I listed above was idle. The systems I normally use
>(UNIX, Mac) make them serious problems right now. Both MacOS and Solaris
>do provide tools to tackle the problems, and the Solaris ones are close
>relatives of a proposal before the C committee. Just consider the last
>example: if I write a program that receives a file name in its argument
>vector, it really truly is quite possible right now that it might be a
>multibyte encoding of a string of Chinese characters. If I just smash
>the bytes, under the assumption that I am dealing with ISO Latin 1, the
>characters will be garbled beyond any hope of recognition.
But this is just a design problem. If ANY pice of data has some specific
encoding (be it wide-char or encrypted), you have to treat it
appropriately. If it is posible to receive more than one type of data,
you have to do something to allow the program to detect it and do the
correct processing. If I process Swedish in a German locale and get
garbage, it isn't the fault of the translation or classification functions,
it is mine.
--
===================================================================
Ian Cargill CEng MIEE Soliton Software Ltd.
email: i...@soliton.demon.co.uk 54 Windfield, Leatherhead,
tel: +44 (0)1372 37 5529 Surrey, UK KT22 8UQ
For those of you who were unable to make it there, I'd just like to
mention an interesting tidbit from the X3J11/WG14 meeting which was
held recently in Plano, Texas.
At that meeting a straw vote was taken, and the results indicated quite
clearly that from now on, the term `existing practice' (as it relates to
the future revision of the ANSI/ISO C standard) will NOT imply `existing
practice among C compilers'. Rather, it will imply existing practice
among compilers for essentially any language.
If my recollection is correct, there was only one dissenting vote, i.e.
mine.
I leave it to the C community as a whole to infer (from this rather radical
change in definition) the implications for the next revision of the C
standard.
--
-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- E-mail: r...@segfault.us.com ----------- Purveyors of Compiler Test ----
-------------------------------------------- Suites and Bullet-Proof Shoes -
Actually, he makes it sound as if it is very, very difficult to get it
right for everyone. The implicit assumption being, I assume, that it's
not a good idea to put things into the standard that are useful only in
some parts of the world.
The C standards committee (X3J11) put in a lot of work on supporting
non-English users. There's been some doubt whether this was a good idea,
and whether the end result is usable (I don't know, I don't have
experience with this part of the standard; as far as I can see, it is
not at all bad), but at least they cared, which is a fairly novel
thing.
(In case you're interested why I don't know about the locale stuff and
other parts of the standard that are relevant to internationalization:
I hate writing interactive programs. "Core dumped" is my idea of a
perfect error message, if I have to write the program. :-)
--
Lars.Wi...@helsinki.fi (finger wirz...@klaava.helsinki.fi)
Publib version 0.4: ftp://ftp.cs.helsinki.fi/pub/Software/Local/Publib/
: >Can anyone explain to me why strupr() and strlwr() are not part of the
: >ANSI C library?
: They aren't in the _current_ standard because they _weren't_ standard.
: By and large X3J11 invented very little.
Good enough.
Now for the rest of this, which was largely unnecessary and in general
got rid of all the points you earned with your first sentence... I'm
going to refer to strlwr() and tolower() here, but assume I'm referring
to the upper-case conversions too.
: It is actually astonishingly difficult to define case conversion in a way
: that makes sense outside US English. To start with,
: - do you want something that affects _only_ the 26 unaccented letters?
Then why standardize tolower()? That works ONLY on the 26 unaccented
characters.
: - do you want something that affects _all_ the letters in your current
: code page? If you do, which of the several code pages that may be
: active at the same time do you want?
The only environment I'm familiar with in which you have to be aware of
code pages is DOS. I didn't mention DOS in my question, nor did I mean
to.
: - several writing systems (a more precise term than 'languages') may
: share a code page; which writing system do you want to use the
: case rules of? (I am told that there are two different conventions
: for upper-casing French; one wipes off the accents and the other
: keeps them.)
This is a good question, and one that begs the question: why does
tolower() even exist, then?
: - what do you want done with a string containing words from different
: languages, and how do you propose to tell which is which?
Uhhh, see above.
: - in the English writing system, words differing only in alphabetic
: case are quite often *different* words. For example,
: - ACRE (American Council for Religious Education)
: - Acre the city
: - acre the unit of land area.
: Case conversion destroys information.
I would think that a programmer who knew what the heck he was doing
would know enough about his own data to make an intelligent choice
on case conversion.
: - what are you going to do about lower case letters that have no
: upper case equivalent (y-diaeresis in ISO Latin 1), or an upper
: case equivalent that is more than one letter (sharp s in ISO
: Latin 1, looks like lower case beta)? What are you going to do
: about upper case letters that have more than one lower case
: equivalent, depending on where the letter is in a word (Greek
: Sigma, for example)
Once again, in this perfect world of yours there is no place for such a
generic function as tolower(), since it modifies only the English letters
"a-z", AFAIK.
: - have you considered the possibility that you may have to deal with
: a string of multibyte characters, some of the bytes of which may
: _accidentally_ coincide with the normal codes of letters?
Oooh! Ahhh! The ace in the hole! Actually, yes, I did, but once again,
it's irrelevant to my question in the first place. In the case of wide
characters, there are no string-based functions that operate properly in
every case. strlen(), for instance, expects single-byte characters to
populate the char* it is passed. So does strcpy() and the other string.h
functions. Wide character strings normally have their OWN functions that
are prepared to handle the possibility of a NULL as part of a wide char.
I'd expect to find something like "wc_toupper(wide_char c)" as an
upper-case function, or I'd know to write one. For example, my Watcom
compiler has various functions like mblen() and others to handle wide
chars... but strlen() makes no mention of it. (Incidentally, my Watcom
compiler has the ANSI functions for multi-byte characters but no more...
the UNIX compiler may, but I'm not sure.)
: In the mean time, just how much work does it take to write
: #include <ctype.h>
: char *strlwr(char *s) {
: unsigned c;
: unsigned char *p = (unsigned char *)s;
: while (c = *p) *p++ = tolower(c);
: }
More than it does to write
#include <ctype.h>
char *strlwr(char *s) {
char *p=s;
while(*p++) *p=tolower(*p);
return s;
}
and much more than it would take simply to call a library function.
: They don't actually _work_ in any very interesting sense, but they _are_
: what strupr/strlwr typically _do_.
Although strlwr() would normally have at least a return statement, since
you prototyped it that way.
: None of the questions I listed above was idle. The systems I normally use
: (UNIX, Mac) make them serious problems right now. Both MacOS and Solaris
: do provide tools to tackle the problems, and the Solaris ones are close
: relatives of a proposal before the C committee. Just consider the last
: example: if I write a program that receives a file name in its argument
: vector, it really truly is quite possible right now that it might be a
: multibyte encoding of a string of Chinese characters. If I just smash
: the bytes, under the assumption that I am dealing with ISO Latin 1, the
: characters will be garbled beyond any hope of recognition.
But then again, if a function returns char *, but doesn't specify that
it's returning wide characters, you wouldn't be expected to handle
mutli-byte chars. You'd be expected to work in your base language,
acoridng to how C normally defines strings.
: Note that converting DOS file names to one case in order to determine whether
: they must refer to the same file is neither necessary nor sufficient.
Umm.... thank you, I guess, but I knew that. I'm not sure why this got in
there, since I wasn't dealing with DOS or filenames.
Richard A. O'Keefe (o...@goanna.cs.rmit.edu.au) wrote:
: jo...@freenet3.scri.fsu.edu (Joe Ottinger) writes:
: >Can anyone explain to me why strupr() and strlwr() are not part of the
: >ANSI C library?
: It is actually astonishingly difficult to define case conversion in a way
: that makes sense outside US English. To start with,
[various reasons deleted]
I did a little more investigation, based on the reasoning you posted.
strupr() and strlwr() both modify the string passed to them; no other
function declared in string.h does so, AFAIK.
Also, concerning multi-byte characters: strlen(), etc. can be easily
modified to work properly on strings containing multi-byte characters, by
changing how they internally access the "next character." strupr(),
however, could feasibly expand a character from one byte to two, possibly
corrupting the storage space passed to it. Without changing the "normal"
way of implementing strupr(), there's no way to make it conform to the
"we can handle multi-byte chars if you write us the correct way" attitude
of the other string.h functions. tolower() and toupper() are inconsistent
in this, however, and many of the string.h functions have an inconsistent
return value if this is done.
I understand (belatedly) your reasoning; I continue to think that ISO
SHOULD have included them in the standard library, however, because their
functioning is not inconsistent with the other string.h and ctype.h
functions. I'm not arguing that they are to be included in future
versions of the library; I asked the question because I could see no
reason for their exclusion.
Once again, I apologize for the imflammatory tone of the previous
post. :(
strtok() modifies its first argument.
>Also, concerning multi-byte characters: strlen(), etc. can be easily
>modified to work properly on strings containing multi-byte characters, by
>changing how they internally access the "next character."
And by giving the new function a new name, of course.
>strupr(), however, could feasibly expand a character from one byte to two,
>possibly corrupting the storage space passed to it.
Or it could refuse to convert such characters, as toupper() refuses.
A newer function (which converts all characters that are suitable for
conversion, including possibly multibyte characters) could take an additional
argument for buffer length, and still almost remain in the spirit of C.
>I continue to think that ISO SHOULD have included them in the standard
>library, however, because their functioning is not inconsistent with the
>other string.h and ctype.h functions.
I have no opinion on the matter. They are more deserving than half the
stuff the committee added, but that doesn't say whether they're deserving.
>I'm not arguing that they are to be included in future
>versions of the library;
If the committee adds anything at all, you should argue for these.
--
<< If this were the company's opinion, I would not be allowed to post it. >>
No wonder Intel's Pentium[tm] chip has a 100.083% market share for its class of
microprocessors. It's 80585.9927 compatible! <Not speaking for Intel either.>
How is a "straw vote" orchestrated at such a meeting? Were there presentations
and the person at the podium just suddenly asked for a show of hands? Or was
these advance notice or discussion defining and lobbying for a particular
result? Was any of this recorded?
I'm still trying to get a grip on the mechanics of this Committee.
Is it ruled by the compendium of straw votes as interpreted by the
Standard writer? Is the Standard voted on clause-by-clause ?
Is it like the UN's security council where certain members have
veto rights that are not reserved to the others?
Where are the checks-and-balances that regulate the process?
--
Larry Weiss, l...@oc.com
214/888-0471
[Lots of argument deleted]
>: In the mean time, just how much work does it take to write
>: #include <ctype.h>
>: char *strlwr(char *s) {
>: unsigned c;
>: unsigned char *p = (unsigned char *)s;
>: while (c = *p) *p++ = tolower(c);
>: }
>More than it does to write
>#include <ctype.h>
>char *strlwr(char *s) {
> char *p=s;
> while(*p++) *p=tolower(*p);
> return s;
>}
Ahh, but this one (while shorter) doesn't work. And if you remove the
requirement that it work I can write a shorter one. To be more specific,
you should always pass tolower an unsigned char, because it is only defined
for values which are representable as unsigned char (and hence
if you have chars which are signed and pass a negative value it won't
work).
--
Alan Stokes (al...@rcp.co.uk)
Richards Computer Products Ltd
Didcot, UK
Firstly, there are two committees to consider. The first is X3J11, which
is an ANSI-sponsored committee, and is local to the USA. I know little
about it, and will mostly ignore it in what follows.
Secondly, there is (to use the full title) ISO/IEC JTC1/SC22/WG14. This
is:
Working Group 14 [C] of
SubCommittee 22 [Programming Languages] of
Joint Technical Committee 1 of
The International Organisation for Standards and
The International Electrotechnical Commission.
This body created the C Standard that we all know and love.
Each nation has (in theory) a National Standards Body accredited to ISO.
If it pays the relevant fee, the NSB can take part in a given Working
Group; participating members' NSBs usually delegate action to a internal
group. For example, both the UK and the USA take part in WG14. For the
UK, the NSB is the British Standards Institute, who delegate formal
action to Panel IST/5, and practical action to Panel IST/5/-/14. In the
USA, the NSB is ANSI, who delegate to X3J11.
WG14 holds regular meetings. Each participating member is allowed to
send a delegation; delegations tend to vary in size, depending on where
the meeting is. For the most part, the meetings consist of presentations
and discussions on working papers, which may be drafts of the Standard,
Defect Report Responses, Technical Reports, or other things. If
discussion is becoming bogged down over a particular item, the chair or
the presenter may call a straw vote, which is simply a show of hands of
those people present. These votes have no formal effect, but can help in
resolving an issue (when only three or four people are arguing a point,
it can help to know how many supporters each has). Straw votes tend to
happen without much warning; they are recorded in the minutes.
The purpose of WG14 meetings is to generate certain documents. For
example, last year it generated Record of Response 1 - a list of answers
to Defect Reports 001 to 059. This document was prepared by an editor
(P.J.Plauger) and a subcommittee to check his work, and then was presented
to a WG14 meeting. The meeting then held a formal vote to pass the
document to SC22 for final approval. The formal vote at the meeting is
by attending delegation, not by individual, so each country has one vote.
SC22 then held a written ballot on the document, which passed (without
dissent, as I recall).
Some documents are, or are modifications to, International Standards.
WG14 has produced three so far:
The ISO/IEC C Standard (ISO/IEC 9899:1990)
Technical Corrigendum Number 1
Normative Addendum Number 1
and the current C Standard consists of all three taken together. The
process for each of these is roughly as follows:
- A draft is produced by the document's editor. It may be circulated
informally, and changes may take place.
- The draft is discussed at a meeting. The meeting may suggest changes.
When the meeting is happy with the draft, a formal vote is held to
send the draft to SC22 for registration. For the Standard, the next
revision to the Standard, and NA1, there are two or three stages to
registration, and only the last one makes it an actual Standard. For
TC1 - which only corrects errors in the Standard - there is only a
single stage. [Since it isn't possible to do all the editing work
at the meeting, the vote may instead empower the editor to make the
changes, have them checked by a subcommittee, and then send the result
to SC22. This happened with both TC1 and NA1.]
- SC22 holds a written ballot of participating members. Each National
Body is sent the document, and has three months to return a formal
vote. This vote can be to approve or disapprove; in the latter case
the vote is usually accompanied by comments that indicate the changes
that would get the vote changed to "approve". At least two-thirds of
the votes must approve, and no more than one-quarter of the members
disapprove. At the earlier registration stages, three negative votes
will usually result in the document being sent back for reconsideration.
- When the document is registered, it becomes, or becomes part of, the
ISO Standard.
My notes (which may be out of date) state that the following nations
participate in WG14:
Austria
Belgium ABSTAIN
Brazil
Bulgaria
Canada
China
* Denmark
Finland
France
Germany
Greece
Ireland ABSTAIN
Italy
* Japan
* Netherlands DISAPPROVE
New Zealand
Slovenia ABSTAIN
Sweden
Switzerland
* UK DISAPPROVE
* USA
[5 others I don't have a note of]
An asterisk indicates those usually attending WG14 meetings; the notes
on the right show the non-approving votes on the final registration of
Normative Addendum 1.
> I'm still trying to get a grip on the mechanics of this Committee.
> Is it ruled by the compendium of straw votes as interpreted by the
> Standard writer? Is the Standard voted on clause-by-clause ?
> Is it like the UN's security council where certain members have
> veto rights that are not reserved to the others?
Well, I hope that's helped. The Standard is greatly affected by the
compendium of straw votes, but the written registration votes can
override that. There is not a clause-by-clause vote (such a vote would
be nonsense). There is no distinction between the members; all have an
equal say.
--
Clive D.W. Feather | Santa Cruz Operation | If you lie to the compiler,
cl...@sco.com | Croxley Centre | it will get its revenge.
Phone: +44 1923 813541 | Hatters Lane, Watford | - Henry Spencer
Fax: +44 1923 813811 | WD1 8YN, United Kingdom |
Thanks very much for the detailed account of the process!
I would suppose the comments returned with a disapproving vote would
be the closest to a "clause-by-clause" vote that I had in mind.
Because almost everyone already had it. So it got thrown in (like gets)
rather than being redesigned.
>: - do you want something that affects _all_ the letters in your current
>: code page? If you do, which of the several code pages that may be
>: active at the same time do you want?
> The only environment I'm familiar with in which you have to be aware of
> code pages is DOS.
If you call them "locales", there are many systems where you have to be
prepared for them.
>: - have you considered the possibility that you may have to deal with
>: a string of multibyte characters, some of the bytes of which may
>: _accidentally_ coincide with the normal codes of letters?
>
> Oooh! Ahhh! The ace in the hole! Actually, yes, I did, but once again,
> it's irrelevant to my question in the first place. In the case of wide
> characters, there are no string-based functions that operate properly in
> every case. strlen(), for instance, expects single-byte characters to
> populate the char* it is passed.
strlen returns the number of bytes in the multibyte string it is passed.
It works even in the presence of multibyte characters. You can't pass it
a string of wide characters (aka a wide string, aka an array of wchar_t
elements).
> So does strcpy() and the other string.h
> functions.
strcpy() and other non-interpretive functions work correctly on multibyte
strings. Functions like strchr may generate false hits.
> Wide character strings normally have their OWN functions that
> are prepared to handle the possibility of a NULL as part of a wide char.
(1) NULL is a macro defined as a null pointer constant, and is irrelevant
to this.
(2) If by NULL you mean NUL (the ASCII control code 0), then a wide
character, being a single value, doesn't contain ASCII characters. A
wide string is terminated by the wide character L'\0' (which equals
((wchar_t) 0)).
> I'd expect to find something like "wc_toupper(wide_char c)" as an
> upper-case function, or I'd know to write one.
You want the C Standard function:
wint_t towupper (wint_t wc);
> For example, my Watcom
> compiler has various functions like mblen() and others to handle wide
> chars... but strlen() makes no mention of it.
mblen() gives the number of bytes in a multibyte character. If you want
to know the number of multibyte characters in a string, you do something
like:
int mbstrlen (char *s)
{
int n = 0;
while (*s != '\0')
s += mblen (s, (size_t) ULONG_MAX), n++;
return n;
}
>: In the mean time, just how much work does it take to write
[...]
> much more than it would take simply to call a library function.
The same is true of any piece of code. Adding features to a library has
tradeoffs; it was decided that strupr/strlwr were on the wrong side of
the line.
> But then again, if a function returns char *, but doesn't specify that
> it's returning wide characters, you wouldn't be expected to handle
> mutli-byte chars.
A function that returns char * cannot, in general, return wide characters.
A function that returns char * can always return multi-byte characters.
That's the way that C works.
No change is required for many functions, including strlen, strcpy, and
strcat. "multibyte character" is defined so that they work unmodified.
> tolower() and toupper() are inconsistent
> in this, however,
These functions return the corresponding *single-byte character if there
is one*. That might not be as useful as you think, but it has its uses
nevertheless.
> and many of the string.h functions have an inconsistent
> return value if this is done.
Huh ? Which ones ?
Even changing the types to unsigned char won't make this version work.
Why? For one thing, the first character of the string will never be changed.
--
Wayne Berke
be...@panix.com
An excellent question. Generally, the committee works fairly
informally. When discussing an issue, anyone in attendance is
permitted to speak and anyone may call for a "straw vote" at any time
to get a sense of the committee. The person calling for the vote is
given great latitude in the specifics: they may propose a number of
alternatives to choose among (as to a simple yea/nay), whether one may
vote for multiple alternatives or just one, whether discussion is
allowed prior to the vote (votes without discussion can avoid spending
a great deal of time discussing something that everyone already agrees
on), and whether the vote is to be of all present or some specific
subset (such as X3J11 voting members). The discussions and votes are
recorded in the minutes, but they have no formal standing. Formal votes
are taken when required by ANSI or ISO procedures.
> I'm still trying to get a grip on the mechanics of this Committee.
> Is it ruled by the compendium of straw votes as interpreted by the
> Standard writer? Is the Standard voted on clause-by-clause ?
> Is it like the UN's security council where certain members have
> veto rights that are not reserved to the others?
>
> Where are the checks-and-balances that regulate the process?
This is more difficult for me to answer because the formal procedures
for ISO are quite different from the formal procedures for ANSI. The
current standard was developed as an ANSI standard. The committee
started out using fairly informal procedures to develop the base
document: preliminary decisions were made based on straw votes then, at
the end of the meeting, the straw votes were reviewed, any voting member
was entitled to ask for a formal vote on any particular matter, finally
a formal vote was taken to ratify all of the unchallenged straw votes.
Once the base document was sent out for formal public review, however,
the committee adopted more formal procedures and any change to the
document required a 2/3 majority from a formal vote.
The ISO rules require "consensus" to be achieved among the various
national representatives. The way we're currently working is that X3J11
and WG14 meet in the same room at the same time with a single person
acting as chair for the meeting (typically the Chair of X3J11). X3J11
has no formal standing as far as ISO is concerned, it is just the
Technical Advisory Group for the US delegation (which is composed of a
few X3J11 members). It does, however, possess the vast majority of the
technical expertise, so it is quite useful to the other national
delegations to be able to observe and participate in its discussions.
Thus, we are using fairly informal precedures for the bulk of the
meeting, allowing all present to speak and using straw votes to
determine the sense of the committee and, hopefully, allow us to reach
consensus both withing X3J11 and WG14. Formally, X3J11 must meet
briefly without any non-US attendees in order to formulate a US position
on any matter which requires a formal ISO vote, and WG14 must meet
separately to take such votes.
It seems to me that the ISO process has a much different philosophy of
what are important checks and ballances than ANSI does. It seems to me
like ISO really doesn't care how the standard gets written as long as
all the member nations approve the final result, or at least don't
disapprove it. If a nation does disapprove, it is required to be
specific about what it finds objectionable and the committee is obliged
to consider addressing those objections, although it is not required to.
As long as the final ballot is positive, the standard is approved. ISO
does not dictate how the member nations are to form their positions --
in particular, there is no requirement for formal *public* review like
ANSI requires. X3J11, however, is firmly committed to public review.
It served us very well for the first standard and we have no intent of
not taking advantage of it again for the revision.
(Note that I've directed followups to comp.std.c which is the appropriate
place to discuss such procedural issues.)
----
Larry Jones, SDRC, 2000 Eastman Dr., Milford, OH 45150-2789 513-576-2070
larry...@sdrc.com
Aw Mom, you act like I'm not even wearing a bungee cord! -- Calvin
Here is corrected code.
#include <ctype.h>
char *strlwr(char *s) {
unsigned c;
unsigned char *p = (unsigned char *)s;
while ('\0' != (c = *p)) *p++ = tolower(c);
return s;
}
char *strupr(char *s) {
unsigned c;
unsigned char *p = (unsigned char *)s;
while ('\0' != (c = *p)) *p++ = toupper(c);
return s;
}
#ifdef TEST
#include <stdio.h>
int main(void) {
char buffer[256];
while (gets(buffer)) {
(void)puts(strlwr(buffer));
(void)puts(strupr(buffer));
}
return 0;
}
#endif
sqbo...@cs.wmich.edu ( Jerry Bolton) writes:
>This is very true, however, the fact remains that tolower() and
>toupper() are defined, and that sometimes it is USEFUL to convert
>case. You make it sound (see your lines below about DOS) as if it is
>unnecessary.
tolower() and toupper() are in the official standard because they were
in the de facto standard, just like other, um, imperfect functions like gets().
>With a few exceptions, most notably that your strupr() converts to
>lowercase, such functions are useful to some people. For example...
>one must adopt some standard when providing migration tools similar to
>the ones I develop and maintain every day. My clients demand a
>sensible remapping of uppercase and lowercase VMS-style names (which
>are case-insensitive and primarily dictated by preferred style) to UNIX
>names (which are case sensitive).
You have played into my hands. strlwr() and strupr() would not be especially
useful to you, because there is _far_ more involved in converting VMS names
to something that makes sense in UNIX than just case mapping. Let's take
a realistic example:
[.foo.bar]StrLwr.c;3
I don't know what you would do about the version number. There are various
"backup" conventions used by UNIX editors one might adapt, but they would
require processing an entire batch of files at once. For now, suppose we
drop the ;3 (not something that can be done by case conversion). We get
foo/bar/strlwr.c
This is actually an _easy_ example because I left out any device or hostname.
>These clients really don't care about chinese multibyte characters.
That's a surprise, because the last time I had access to VMS (1989), our
program _did_ have to worry about Kanji in all sorts of places.
The point remains, however, that your clients need a conversion algorithm
which does FAR more than change case. You have to deal with
DECnet host names
device names, including logical devices
version numbers
and you have to worry about dollar signs, which are technically legal in
UNIX file names but are likely to give shells fits.
>I'm not saying that such concerns are not valid for the standards
>committee, but you make it sound like strupr() and strlwr() are
>completely useless.
If someone expects to be able to convert VMS file names to UNIX form by
using strlwr(), they are not merely useless, they are positively dangerous.
I wrote:
Note that converting DOS file names to one case in order to determine whether
they must refer to the same file is neither necessary nor sufficient.
sqbo...@cs.wmich.edu ( Jerry Bolton) writes:
>See above. Migrating a applications from one OS to another frequently
>mandates some kind of filename conversion.
Both statements are true. If you are working strictly within DOS (or OS/2),
you don't have any need to convert file names to one case; the file system
code will do whatever conversion is necessary.
It is also true that migrating applications between OSses frequently requires
some kind of filename conversion (which is why I write the pathname package
for Quintus Prolog, closely modelled on the Common Lisp pathname stuff).
What is relevant to this discussion is that such conversion involves so much
more than case conversion that it isn't funny. I have done this kind of
thing for DOS, UNIX, VMS, VM/CMS, Macintosh and a couple of others, and for
_none_ of these conversions would strlwr() or strupr() have been of any use
whatsoever.
Not least of the concerns is that DOS has code pages, the Macintosh similarly
has a number of coded character sets, and UNIX systems typically have some
version of ISO 8859. Never mind converting case: from which of these ASCII
extensions to which of these ASCII extensions should I be converting? What
should I do when moving a file from DOS to Mac when a character has NO
translation? Case is the _least_ of our worries.
ok> jo...@freenet3.scri.fsu.edu (Joe Ottinger) writes:
ok> In the mean time, just how much work does it take to write
ok> #include <ctype.h>
ok> char *strlwr(char *s) {
ok> unsigned c;
ok> unsigned char *p = (unsigned char *)s;
ok> while (c = *p) *p++ = tolower(c);
ok> }
ok> char *strupr(char *s) {
ok> unsigned c;
ok> unsigned char *p = (unsigned char *)s;
ok> while (c = *p) *p++ = tolower(c);
ok> }
Hmm. Just how much work does it take? Apparently too much, because you made
at least two mistakes (one of them twice).
Michael.
True. That'll teach me to post without checking.
>> This body created the C Standard that we all know and love.
> Er, well, "mostly ignoring" X3J11 is all well and good, but assigning their
> credit to someone else is *a bit much*, don't you think? It was X3J11 that
> created the C standard, albeit not the current version of it.
Sorry - I didn't mean to steal X3J11's credit. I simply meant that the
document "The C Standard" is that standardized by ISO. Of course, this
was largely based on the old ANSI C Standard, created by X3J11.
Equally, the current ANSI C Standard is a previous version of the ISO C
Standard (unless ANSI are a lot more efficient at picking up the changes
than I've been led to believe they are).
My comments apply to ISO/IEC JTC 1, the committee
that "oversees" the standards of interest to most of us.
Committees do not produce standards, people do.
Where do those people come from? Well, "officially", you can participate
on an ISO committe only if you are part of a "delegation" from an NB
(National Body) or approved Liaison organisation (note that
"organization" is not the preferred spelling, tho ISO is itself
inconsistent in this area).
OK, so where do the people on a delegation come from? That varies from NB
to NB and Liaison organisation to Liaison organisation. However, in
nearly all cases, these individuals are funded by their employer or some
sponsoring, say, user or industry group. And then there are fools such as
I, but that has now stopped.
Are all delegations equal on an ISO committee? Officially, yes, in
practice, no way. On many ISO committees, the work of each project is
usually based on the input from, say, 1 or 2 delegations. I won't mention
names in particular subject areas, other than to indicate that ANSI and JIS
carry an awful lot of weight in most areas, BSI, DIN, etc., in other areas.
It is also tru that ANSI no longer has the implicit veto it had, say, 10 years
ago.
In the area of programming languages, the ANSI delegation does influence
the direction of SC22/WG14, but that only works as long as there is
apparent cooperation.
In the area of volume and file structure, i.e., SC15, all standards
produced for the past 10+ years have been led by ECMA (formerly known as
the European Computer Manufacturer's Association). ISO 1001 (tape), ISO
9293 (floppy), ISO 9660 (CD-ROM), ISO/IEC 13346 (non-sequential media),
ISO/IEC 13490 (write-once CDs) and ISO/IEC 13800 (registration procedures
for ...) all were heavily influenced, or done by ECMA and then
fast-tracked into ISO. However, the ECMA committee relies very heavily on
input from non-European committees, in particular, ANSI, JIS and industry
groups. For example, I wrote and was a member of the committee that
produced the High Sierra format for CDs. I then submitted the paper to
ECMA for processing as an ECMA standard and then guided it thru the ISO
fast-track process.
Given the processes involved, it is often difficult to get the "right"
people involved, but that's an entirely different subject.
In article
<1994Dec21.2...@sq.sq.com>, Mark Brader <m...@sq.sq.com> wrote:
>> Firstly, there are two committees to consider. The first is X3J11, which
>> is an ANSI-sponsored committee, and is local to the USA. I know little
>> about it, and will mostly ignore it in what follows.
>>
>> Secondly, there is (to use the full title) ISO/IEC JTC1/SC22/WG14. ...
>> [where ISO =]
>> The International Organisation for Standards
>
>That's spelled "International Organization for Standardization", actually.
>
>> This body created the C Standard that we all know and love.
>
>Er, well, "mostly ignoring" X3J11 is all well and good, but assigning their
>credit to someone else is *a bit much*, don't you think? It was X3J11 that
>created the C standard, albeit not the current version of it. (This is why
>some of us still prefer to say "ANSI C" -- to credit the right people.)
>--
>Mark Brader Summary of issue: Fix FORTRAN-8x.
>m...@sq.com Committee Response: This proposal contains
>SoftQuad Inc. insurmountable technical errors.
>Toronto -- X3J11 responses to 2nd public review
That's spelled "International Organization for Standardization", actually.
> This body created the C Standard that we all know and love.
Er, well, "mostly ignoring" X3J11 is all well and good, but assigning their
Where can these ancillary documents be obtained? Any hope of
electronic availability?
Glenn Herteg
IA Corporation
gl...@lia.com
>
>: It is actually astonishingly difficult to define case conversion in a way
>: that makes sense outside US English. To start with,
>: - do you want something that affects _only_ the 26 unaccented letters?
>
>Then why standardize tolower()? That works ONLY on the 26 unaccented
>characters.
>
I profess no special expertise with the Internationalization
bits of C, but as I understand it...
The standard does not say that tolower(), etc only works on the 26
unaccented character. Indeed, all that the standard says is that
tolower converts "...an upper case letter to a corresponding lower
case letter". What constitutes an upper and lower case letter is,
as I understand it, entirely locale dependant.
(Yes, I know section 5.2.1 defines the 13 uppercase and 13 lowercase
letters, but as it is implicitly refering to the "C" locale, I believe
the previous statement still holds true.)
You say that tolower doesn't work on accented characters. True, but
then in the standard "C" locale, (i.e. using the standard C execution
character set) there AREN'T any accented characters. So to try to
process a character set with accented characters while still in the
"C" locale will produce undefined (or is that unspecified?) behaviour
anyway.
You cannot meaningfully process an accented character set unless you
can set an appropriate locale. Once you have done that, tolower should
quite happily convert lowercase accented characters to an upper case
equivilent, provided, of course, that the language has a deterministic
rule for doing so. (If it doesn't have such a rule, you are in trouble
anyway! :-)
The changes are available via the URL
http://www.lysator.liu.se/c/index.html
The Standard text itself has not been published electronically.
-
Right, except that it is well defined in the C locale, where exactly 26
characters are affected by each of toupper and tolower.
> You say that tolower doesn't work on accented characters. True, but
> then in the standard "C" locale, (i.e. using the standard C execution
> character set) there AREN'T any accented characters. So to try to
> process a character set with accented characters while still in the
> "C" locale will produce undefined (or is that unspecified?) behaviour
> anyway.
Not quite true. The C locale is permitted to contain any number of
characters, including accented ones. What is specified is that isupper
and islower return false for all the extra characters, and so toupper
and tolower don't alter them.
> You cannot meaningfully process an accented character set unless you
> can set an appropriate locale. Once you have done that
[...]
Right, with emphasis on the word "meaningful".
In article <3f2el7$rv9$1...@usenet.pa.dec.com>,
Norman Diamond <dia...@jrd.dec.com> wrote:
>> Sorry, but you're wrong. The Standard, in 5.2.1, considers both the
>> *basic* execution character set and the *extended* execution character
>> set. The former is explicitly enumerated; the latter is a superset of
>> the former. It is permitted, in the C locale, for the extended execution
>> character set to include the character "latin small letter y with grave
>> accent", which I will represent by a y followed by a grave accent. If
>> this character so appears, then 'y`' is a valid character constant.
>>> Yup. And what's specified about isupper and islower is that they return
>>> true for all upper-case and lower-case characters respectively.
>> So, in the C locale, the value of "islower ((unsigned char) 'y`')" is
>> zero, and the y-grave character is *not* a lowercase letter in that locale.
> No. islower doesn't say that it tests for a locale-dependent set of
> characters. It says that it tests for an lower-case letter and other
> implementation-defined characters (with certain restrictions). In the
> C locale, the implementation must conform to an additional sentence in
> the description; the first sentence is not repealed. Since 'y`' is a
> lower-case letter, islower((unsigned char) 'y`') is required to return
> both true and false.
Here's your error. 7.1.1 says that "letter" refers *only* to the 52
characters 'a' to 'z' and 'A' to 'Z'. So 'y`' is not a lowercase letter.
Therefore your argument is bogus.
>>> The only way that such a function can return both true and false
>> Huh ? Who said that the function returned both true and false ?
> The standard's section on islower (and, similarly, isupper).
> (Only if 'y`' existed in the C locale though.)
No. 'y`' is not a lowercase letter *ever*. "letter" is defined by 7.1.1,
and 7.3.1.6 makes it clear that "lowercase letter" only applies to 26 of
the 52 letters.
So there is no requirement that islower return true, and a requirement
that it does return false, so it must return false. What's the problem ?
Or did you think that "lowercase letter" means "whatever I think a
lowercase letter is" ?
Note that it *is* required that, in the C locale:
isalnum ((unsigned char) 'y`') == 0
isalpha ((unsigned char) 'y`') == 0
!iscntrl ((unsigned char) 'y`') || !isprint ((unsigned char) 'y`')
isdigit ((unsigned char) 'y`') == 0
isgraph ((unsigned char) 'y`') == isprint ((unsigned char) 'y`')
isgraph ((unsigned char) 'y`') == ispunct ((unsigned char) 'y`')
islower ((unsigned char) 'y`') == 0
isspace ((unsigned char) 'y`') == 0
isupper ((unsigned char) 'y`') == 0
isxdigit((unsigned char) 'y`') == 0
isalnum ((unsigned char) 'Y`') == 0
isalpha ((unsigned char) 'Y`') == 0
!iscntrl ((unsigned char) 'Y`') || !isprint ((unsigned char) 'Y`')
isdigit ((unsigned char) 'Y`') == 0
isgraph ((unsigned char) 'Y`') == isprint ((unsigned char) 'Y`')
isgraph ((unsigned char) 'Y`') == ispunct ((unsigned char) 'Y`')
islower ((unsigned char) 'Y`') == 0
isspace ((unsigned char) 'Y`') == 0
isupper ((unsigned char) 'Y`') == 0
isxdigit((unsigned char) 'Y`') == 0
tolower ((unsigned char) 'y`') == 'y`'
toupper ((unsigned char) 'y`') == 'y`'
tolower ((unsigned char) 'Y`') == 'Y`'
toupper ((unsigned char) 'Y`') == 'Y`'
all have the value 1.
Right. And thus we see that "isalpha" doesn't test for a "letter", as
defined in 7.1.1, but only for a vague notion which is partially
defined, and partially locale specific.
>> Or did you think that "lowercase letter" means "whatever I think a
>> lowercase letter is" ?
> I thought it meant whatever is a lowercase letter in real life.
> You have correctly pointed out that I was wrong, but incorrectly and
> insultingly suggested which way I was wrong. Was there a reason?
My apologies. There was an element of sarcasm (I thought you were too
experienced to fall into the "that's the obvious meaning" trap), but no
intention of being insulting.