Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unix C\C++ uppercase string function

1,965 views
Skip to first unread message

leib

unread,
Dec 1, 1999, 3:00:00 AM12/1/99
to
Does anyone know if their is a c\c++ function under unix to convert a string
to uppercase or lowercase
like the C++ strupr or strlwr functions. There is no man entries for these
functions so I can only presume that they are not supported.
Is there anything else that would convert a whole string?

Thanks

Leib

Nate Eldredge

unread,
Dec 2, 1999, 3:00:00 AM12/2/99
to
"leib" <ll...@faber.org.uk> writes:

You can easily write one using the `toupper' and `tolower' functions.

--

Nate Eldredge
neld...@hmc.edu

Jordan Katz

unread,
Dec 2, 1999, 3:00:00 AM12/2/99
to
leib wrote:
>
> Does anyone know if their is a c\c++ function under unix to convert a string
> to uppercase or lowercase
> like the C++ strupr or strlwr functions. There is no man entries for these
> functions so I can only presume that they are not supported.
> Is there anything else that would convert a whole string?
>
> Thanks
>
> Leib

#include <ctype.h>
void stoupper(char *s)
{
char *p = s;

while (*p) {
*p = toupper(*p);
++p;
}
}

--
__________________________________________________
Jordan Katz <webm...@underlevel.net>

C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html

Michael Mauch

unread,
Dec 2, 1999, 3:00:00 AM12/2/99
to
leib <ll...@faber.org.uk> wrote:
> Does anyone know if their is a c\c++ function under unix to convert a string
> to uppercase or lowercase
> like the C++ strupr or strlwr functions.

strupr() and strlwr() are C functions that come with the libraries of
some compilers.

> There is no man entries for these
> functions so I can only presume that they are not supported.
> Is there anything else that would convert a whole string?

It's easy to roll your own with toupper() and tolower(), isn't it?

char* strupr(char* s)
{
char* p = s;
while(*p)
*p = toupper((int)*p);
return s;
}

But note the following caveat from the glibc.info:

>>>
*Compatibility Note:* In pre-ISO C dialects, instead of returning
the argument unchanged, these functions may fail when the argument is
not suitable for the conversion. Thus for portability, you may need to
write `islower(c) ? toupper(c) : c' rather than just `toupper(c)'.
<<<

Regards...
Michael

Andy Jeffries

unread,
Dec 3, 1999, 3:00:00 AM12/3/99
to
>Does anyone know if their is a c\c++ function under unix to convert a
string
>to uppercase or lowercase
>like the C++ strupr or strlwr functions. There is no man entries for these

>functions so I can only presume that they are not supported.
>Is there anything else that would convert a whole string?


int tolower(int c) -- Convert c to lowercase.
int toupper(int c) -- Convert c to uppercase.

for characters. Don't know if there is a plain C string function to do it.


AJ

Eric Sosman

unread,
Dec 3, 1999, 3:00:00 AM12/3/99
to
Jordan Katz wrote:
> [...]

> #include <ctype.h>
> void stoupper(char *s)
> {
> char *p = s;
>
> while (*p) {
> *p = toupper(*p);
> ++p;
> }
> }

This can fail on a system where `char' is signed.
Call toupper() -- and isdigit() and islower() and so on --
like this:

*p = toupper((unsigned char)*p);

Michael Mauch exhibited a similar function, but with
this interesting decoration:

*p = toupper((int)*p);

... which looks like it might be an attempt to handle the
signed `char' problem. Unfortunately, this does *not* work;
a `char' with the value -42 will be converted to an `int' with
the value -42, and you're no better off then when you began.

Neither Jordan nor Michael should feel bad, because this
is an embarrassingly easy mistake to make -- I committed it
blithely for many years before realizing the error of my ways.
When my programs started misbehaving when presented with
characters like 'ü' and 'ç' I at first believed isupper() and
friends were buggy; it took a good deal of convincing before
I realized that I was using them incorrectly.

Go thou and do likewise.

----
Eric....@east.sun.com

Chris Torek

unread,
Dec 3, 1999, 3:00:00 AM12/3/99
to
In article <k4o628...@elmicha.333200002251-0001.dialin.t-online.de>
Michael Mauch <michae...@gmx.de> writes:
> *p = toupper((int)*p);

You (and anyone else, but you .de types especially, if you use ISO
Latin 1 to write words like über :-) ) should use "unsigned char",
not int, in the cast -- or have "p" be "unsigned char *":

From: to...@elf.bsdi.com (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: toupper in c
Message-ID: <811pu8$ik2$1...@elf.bsdi.com>
Date: 18 Nov 1999 13:09:28 -0800

... [This] applies to all the <ctype.h> functions. The
problem is that their domain -- the set of values they take -- is
{EOF, [0..UCHAR_MAX]}. That is, toupper(EOF) is defined; toupper(0)
is defined; toupper(1), toupper(2), toupper(3), etc., are defined;
toupper(128) through toupper(255) are all defined; and if UCHAR_MAX
exceeds 255, additional toupper()s are defined.

On the other hand, unless EOF is -40, toupper(-40) is *not* defined.
If plain "char" is signed, and if "char *p" happens to point to a
plain "char" that has a value of -40, then:

toupper(*p)

is *not* defined. (It gives rise to the dreaded "undefined behavior".)
You can write either of these:

toupper((unsigned char)*p)
toupper(*(unsigned char *)p)

which have different meanings on one's complement and sign-and-magnitude
systems. So, which one you should write depends on what is in the
memory to which "p" points.

All "normal" characters are nonnegative, so toupper('a') is definitely
'A', whether 'a' is 0x61 (ASCII) or 0x81 (EBCDIC) or something else
entirely. (This implies that, on 8-bit EBCDIC systems such as IBM
mainframes, plain "char" must in fact be unsigned.) So if "p"
points to normal text, the undefined-behavior aspect of touppper(*p)
will not rear its ugly head. Unfortunately, all *that* really means
is that bugs tend to get past testing, and produce undefined behavior
when someone in Europe runs ISO-Latin-1 text through the program.
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA Domain: to...@bsdi.com +1 510 234 3167
http://claw.bsdi.com/torek/ (not always up) I report spam to abuse@.

Michael Mauch

unread,
Dec 4, 1999, 3:00:00 AM12/4/99
to
Eric Sosman <eric....@east.sun.com> wrote:

> Michael Mauch exhibited a similar function, but with
> this interesting decoration:
>
> *p = toupper((int)*p);
>
> ... which looks like it might be an attempt to handle the
> signed `char' problem. Unfortunately, this does *not* work;
> a `char' with the value -42 will be converted to an `int' with
> the value -42, and you're no better off then when you began.

Yes, you are right: casting to (unsigned char) or even (unsigned int) is
really the right way. My cast to (int) was triggered by a compiler /
C library at work, where the compiler emits a warning about each and
every toupper(*p) or isdigit(*p), complaining that it doesn't like array
indices of type char very much.

Here at home on my Linux box with gcc and glibc2, everything works fine
without any cast. My plain char type here is signed (I can even give
`-fsigned-char' to gcc), but the library makers obviously solved that
common programming mistake.

> Neither Jordan nor Michael should feel bad, because this
> is an embarrassingly easy mistake to make -- I committed it
> blithely for many years before realizing the error of my ways.
> When my programs started misbehaving when presented with
> characters like 'ü' and 'ç' I at first believed isupper() and
> friends were buggy; it took a good deal of convincing before
> I realized that I was using them incorrectly.

Yes, I had these problems some years ago. I guess I "solved" it by using
`-funsigned-char'. But a cast to (unsigned int) is the right thing to
do.

Regards...
Michael

Andrew Gierth

unread,
Dec 4, 1999, 3:00:00 AM12/4/99
to
>>>>> "Michael" == Michael Mauch <michae...@gmx.de> writes:

>> *p = toupper((int)*p);
>>
>> ... which looks like it might be an attempt to handle the
>> signed `char' problem. Unfortunately, this does *not* work;
>> a `char' with the value -42 will be converted to an `int' with
>> the value -42, and you're no better off then when you began.

Michael> Yes, you are right: casting to (unsigned char) or even
Michael> (unsigned int) is really the right way. My cast to (int) was
Michael> triggered by a compiler / C library at work, where the
Michael> compiler emits a warning about each and every toupper(*p) or
Michael> isdigit(*p), complaining that it doesn't like array indices
Michael> of type char very much.

The compiler is telling you something important here; listen to it.

Unfortunately the cast to (int) shuts the compiler up without actually
fixing the problem; same with any cast other than to (unsigned char).

Michael> Here at home on my Linux box with gcc and glibc2, everything
Michael> works fine without any cast. My plain char type here is
Michael> signed (I can even give `-fsigned-char' to gcc), but the
Michael> library makers obviously solved that common programming
Michael> mistake.

Highly unlikely. You're misunderstanding the problem; on a system
where char is signed, it is __ALWAYS__ wrong to do isdigit(c) where
c is of type char; the library writer can't "solve the mistake"
because it is inherent in the definition of what isdigit() etc. are
supposed to do.

>> Neither Jordan nor Michael should feel bad, because this
>> is an embarrassingly easy mistake to make -- I committed it
>> blithely for many years before realizing the error of my ways.
>> When my programs started misbehaving when presented with
>> characters like 'ü' and 'ç' I at first believed isupper() and
>> friends were buggy; it took a good deal of convincing before
>> I realized that I was using them incorrectly.

Michael> Yes, I had these problems some years ago. I guess I "solved"
Michael> it by using `-funsigned-char'. But a cast to (unsigned int)
Michael> is the right thing to do.

Absolutely not. The cast _must_ be to (unsigned char), the effect of
casting to unsigned int is quite different (except on those rare
platforms where sizeof(char)==sizeof(int)).

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Michael Mauch

unread,
Dec 5, 1999, 3:00:00 AM12/5/99
to
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:

> Unfortunately the cast to (int) shuts the compiler up without actually
> fixing the problem; same with any cast other than to (unsigned char).
>
> Michael> Here at home on my Linux box with gcc and glibc2, everything
> Michael> works fine without any cast. My plain char type here is
> Michael> signed (I can even give `-fsigned-char' to gcc), but the
> Michael> library makers obviously solved that common programming
> Michael> mistake.
>
> Highly unlikely. You're misunderstanding the problem; on a system
> where char is signed, it is __ALWAYS__ wrong to do isdigit(c) where
> c is of type char; the library writer can't "solve the mistake"
> because it is inherent in the definition of what isdigit() etc. are
> supposed to do.

But the writers of the GNU libc 2.0 and the writers of the Borland C
4.52 library did it. Programmers don't have to use that cast there,
because the library functions (isupper() et al.) take care whether they
get a negative value.

> Michael> Yes, I had these problems some years ago. I guess I "solved"
> Michael> it by using `-funsigned-char'. But a cast to (unsigned int)
> Michael> is the right thing to do.
>
> Absolutely not. The cast _must_ be to (unsigned char), the effect of
> casting to unsigned int is quite different (except on those rare
> platforms where sizeof(char)==sizeof(int)).

Ok, maybe it's (unsigned char), I can't check it out here at home,
because the libraries here are smart enough to help lazy programmers.

I append a little program that shows how smart these libraries are:

#ifdef __TURBOC__
#define __USELOCALES__
#endif

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <locale.h>

char* strupr(char* s)
{
char* p = s;
while(*p)

*p++ = toupper((signed char)*p);
/* purposely the wrong cast! */
return s;
}

int main(void)
{
#ifdef __MSDOS__
char* s = strdup("\x81""ber \x84\x94\x81");
char ae = '\x84';
#else
char* s = strdup("\xFC""ber \xE4\xF6\xFC");
char ae = '\xE4';
#endif
setlocale(LC_ALL,"");
puts(strupr(s));
printf("%d %d\n",ae,islower(ae));

/* let's see if we have signed or unsigned plain chars: */
printf("%d\n",(char)200==(unsigned char)200);
return 0;
}

The output of the program:

ÜBER ÄÖÜ
-28 512
0

Regards...
Michael


Andrew Gierth

unread,
Dec 5, 1999, 3:00:00 AM12/5/99
to
>>>>> "Michael" == Michael Mauch <michae...@gmx.de> writes:

>> Highly unlikely. You're misunderstanding the problem; on a system
>> where char is signed, it is __ALWAYS__ wrong to do isdigit(c) where
>> c is of type char; the library writer can't "solve the mistake"
>> because it is inherent in the definition of what isdigit() etc. are
>> supposed to do.

Michael> But the writers of the GNU libc 2.0 and the writers of the
Michael> Borland C 4.52 library did it. Programmers don't have to use
Michael> that cast there, because the library functions (isupper() et
Michael> al.) take care whether they get a negative value.

Think about the character ÿ in iso-8859-1. (code = 255)

Suppose I have a variable p of type signed char* pointing to such a
character, and I call islower(*p). Suppose also that I've called
setlocale() appropriately.

What is the call supposed to return? *p will sign-extend to the value -1,
which also happens to be EOF. islower(*p) is supposed to return true,
islower(EOF) must return false. How is the library supposed to resolve
this situation?

Answer: it _cannot_. Programs that fail to use the correct forms _will_
give incorrect results.

>> Absolutely not. The cast _must_ be to (unsigned char), the effect of
>> casting to unsigned int is quite different (except on those rare
>> platforms where sizeof(char)==sizeof(int)).

Michael> Ok, maybe it's (unsigned char), I can't check it out here at
Michael> home, because the libraries here are smart enough to help
Michael> lazy programmers.

The library is giving you a false sense of security. IMO that could be
even more dangerous than actually failing.

Michael> I append a little program that shows how smart these
Michael> libraries are:
[snip]

Try it again with ÿ (\xff) somewhere in the input.

Michael Mauch

unread,
Dec 5, 1999, 3:00:00 AM12/5/99
to
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:

> Think about the character ÿ in iso-8859-1. (code = 255)
>
> Suppose I have a variable p of type signed char* pointing to such a
> character, and I call islower(*p). Suppose also that I've called
> setlocale() appropriately.
>
> What is the call supposed to return? *p will sign-extend to the value -1,
> which also happens to be EOF. islower(*p) is supposed to return true,
> islower(EOF) must return false. How is the library supposed to resolve
> this situation?

You're right. islower('ÿ') and islower((unsigned int)'ÿ') both
erroneously return 0, whereas islower((unsigned char)'ÿ') returns
true, like it is supposed to do.

If I give -funsigned-char to gcc, islower('ÿ') and islower((unsigned
int)'ÿ') also work.

> The library is giving you a false sense of security. IMO that could be
> even more dangerous than actually failing.

I don't think so. The library has no chance to actually fail, it only
might give obviously false results more often - but that's no help for
programmers who don't use non-ASCII characters. The library can repair
the problem in most cases, with the \xFF character being the only
exception. Personally, I learnt about these things years ago, but still
I keep forgetting them from time to time. So I do like that the library
repairs my broken programs.

Thank you for reminding me and clarifying these things again.

Regards...
Michael

Kenneth C Stahl

unread,
Dec 6, 1999, 3:00:00 AM12/6/99
to
Jordan Katz wrote:

>
> leib wrote:
> >
> > Does anyone know if their is a c\c++ function under unix to convert a string
> > to uppercase or lowercase
> > like the C++ strupr or strlwr functions. There is no man entries for these
> > functions so I can only presume that they are not supported.
> > Is there anything else that would convert a whole string?
> >
> > Thanks
> >
> > Leib

>
> #include <ctype.h>
> void stoupper(char *s)
> {
> char *p = s;
>
> while (*p) {
> *p = toupper(*p);
> ++p;
> }
> }
>

The above code is fine, but it relies on an implicit casts. An improved
version is:

#include <ctype.h>
void stoupper(char *s)
{
char *p = s;

while (*p != (char *) NULL) {
*p = _toupper(*p);
++p;
}
}

0 new messages