string -> upper case using function to

David Frank

unread,

Dec 19, 2001, 11:38:00 AM12/19/01

to

Over 2 years back, Clive Page had a topic asking whether anyone could write
a one-line statement to convert strings to upper/lower case.. All the usual
suspects here (except yours truly) had a crack at it in the ensuing 22
message debate,
but IMO no-one came up with anything useful that readers are likely using
today..

Since its been pretty quiet around here this month, I thought posting one of
my upper/lower functions would serve as
my belated response to his topic and demo the varying string output
capability of F9x in a useful function.

! -----------------------------
module my_functions
contains

function to_upper(s1) result (s2)
implicit none
integer,parameter :: duc = ichar('A') - ichar('a')
character(*) :: s1
character(len(s1)) :: s2
character :: ch
integer :: n
do n = 1,len(s1)
ch = s1(n:n)
if (ch >= 'a'.AND.ch <= 'z') ch = char(ichar(ch)+duc)
s2(n:n) = ch
end do
end function to_upper
end module my_functions
! -----------------------------

program test
use my_functions
implicit none
character(10) :: string = 'abcdef 123'

write (*,'(a)') to_upper('abc123') ! outputs 6chars = ABC123
string = to_upper(string) ! demo 1-statement conversion of arbitrary
length string
write (*,'(a)') string ! outputs 10chars = ABCDEF 123
end program

Paul van Delst

unread,

Dec 19, 2001, 3:08:20 PM12/19/01

to

David Frank wrote:
>
> ! -----------------------------
> module my_functions
> contains
>
> function to_upper(s1) result (s2)
> implicit none
> integer,parameter :: duc = ichar('A') - ichar('a')
> character(*) :: s1
> character(len(s1)) :: s2
> character :: ch
> integer :: n
> do n = 1,len(s1)
> ch = s1(n:n)
> if (ch >= 'a'.AND.ch <= 'z') ch = char(ichar(ch)+duc)
> s2(n:n) = ch
> end do
> end function to_upper
> end module my_functions

I thought use of CHAR() was discouraged since it's processor dependent? Should it be ACHAR
maybe?

FWIW, I use what Cooper Redwine detailed in his book "Upgrading to Fortran 90". I find it more
intuitive (and probably slower, but, <shrug>) than doing arithmetic "on" characters or using <
or > for character comparisons if all you want to do is change case. Just my personal opinion
of course since case insensitivity is pretty low on my list of what I worry about.

cheers,

paulv

p.s. The functions are named after their IDL equivalent (less for me to remember :o)

MODULE string_stuff
IMPLICIT NONE
PRIVATE
PUBLIC :: strupcase, strlowcase
CHARACTER( LEN = 26 ), PARAMETER :: lower_case = 'abcdefghijklmnopqrstuvwxyz', &
upper_case = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
INTRINSIC INDEX, &
LEN

CONTAINS

FUNCTION strupcase ( input_string ) RESULT ( output_string )
CHARACTER( * ), INTENT( IN ) :: input_string
CHARACTER( LEN( input_string ) ) :: output_string
INTEGER :: i, n
output_string = input_string
DO i = 1, LEN( output_string )
n = INDEX( lower_case, output_string( i:i ) )
IF ( n /= 0 ) &
output_string( i:i ) = upper_case( n:n )
END DO
END FUNCTION strupcase

FUNCTION strlowcase ( input_string ) RESULT ( output_string )
CHARACTER( * ), INTENT( IN ) :: input_string
CHARACTER( LEN( input_string ) ) :: output_string
INTEGER :: i, n
output_string = input_string
DO i = 1, LEN( output_string )
n = INDEX( upper_case, output_string( i:i ) )
IF ( n /= 0 ) &
output_string( i:i ) = lower_case( n:n )
END DO
END FUNCTION strlowcase

END MODULE string_stuff

The test program:

PROGRAM string_stuff_test

USE string_stuff
IMPLICIT NONE

CHARACTER( 49 ) :: string = 'The Quick Brown 123 Fox Jumped Over The Lazy Dogs', &
new_string

WRITE( *, '( /, "Original string:" )' )
WRITE( *, '( 1x, a )' ) string

new_string = strupcase( string )
WRITE( *, '( /, "Uppercase string:" )' )
WRITE( *, '( 1x, a )' ) new_string

new_string = strlowcase( string )
WRITE( *, '( /, "Lowercase string:" )' )
WRITE( *, '( 1x, a )' ) new_string

END PROGRAM string_stuff_test

The output:

Original string:
The Quick Brown 123 Fox Jumped Over The Lazy Dogs

Uppercase string:
THE QUICK BROWN 123 FOX JUMPED OVER THE LAZY DOGS

Lowercase string:
the quick brown 123 fox jumped over the lazy dogs

--
Paul van Delst Religious and cultural
CIMSS @ NOAA/NCEP purity is a fundamentalist
Ph: (301)763-8000 x7274 fantasy
Fax:(301)763-8545 V.S.Naipaul

Paul Curtis

unread,

Dec 19, 2001, 3:28:19 PM12/19/01

to

The code would look (and compile and run) better with a CASE construct
instead of the IF....

SELECT CASE (s1(n:n))
CASE (a:z)
s2(n:n) = CHAR( ICHAR( s1(n:n) ) + upper_case_offset )
END SELECT

Gib Bogle

unread,

Dec 19, 2001, 3:40:44 PM12/19/01

to

You had better define a and z.

Paul van Delst

unread,

Dec 19, 2001, 4:06:01 PM12/19/01

to

Maybe he meant:

SELECT CASE (s1(n:n))
CASE ('a':'z')

s2(n:n) = CHAR( ICHAR( s1(n:n) ) + upper_case_offset )
END SELECT

--

Herman D. Knoble

unread,

Dec 19, 2001, 4:03:41 PM12/19/01

to

David: Clive Page suggested this code several months ago:
http://ftp.cac.psu.edu/pub/ger/fortran/hdk/ucase.f90

Fortran 77 code that's been around for a long time (but
uses the Char function):
CHARACTER*8 TEST, UPIT
TEST='abcd12yz'
CALL UPPER(TEST,UPIT)
WRITE(*,*) 'Upper=',UPIT
STOP
END
SUBROUTINE UPPER(LOW,UP)
CHARACTER*(*) LOW, UP
INTEGER diff
diff=ichar('A')-ichar('a')
UP=char(ichar(LOW)+diff)
RETURN
END

Skip Knoble, Penn State

On Wed, 19 Dec 2001 16:38:00 GMT, "David Frank" <dave_...@hotmail.com> wrote:

Herman D. (Skip) Knoble, Research Associate
Mailto:h...@psu.edu
Web: http://www.personal.psu.edu/hdk
Center for Academic Computing
Penn State University
214C Computer Building
University Park, PA 16802-2101
Phone:+1 814 865-0818 Fax:+1 814 863-7049

Richard Maine

unread,

Dec 19, 2001, 4:14:51 PM12/19/01

to

Paul Curtis <pcu...@kiltel.com> writes:

My opinion (to label it clearly :-)) is that an IF statement looks
better (meaning that I more immediately comprehend its intent - I'm
not talking about visual appearance) than a SELECT with only one CASE.

I guess I'd have expected the IF to run faster than the SELECT also
(or be the same), but I don't have any data to back up that expectation,
so its not even quite up to the status of either an opinion or a fact.
I am talking only about the question of IF vs SELECT - not the other
differences in these code samples such as the use of a temporary
intermediate variable.

--
Richard Maine | Good judgment comes from experience;
email: my last name at host.domain | experience comes from bad judgment.
host: altair, domain: dfrc.nasa.gov | -- Mark Twain

robin

unread,

Dec 19, 2001, 4:18:00 PM12/19/01

to

"David Frank" <dave_...@hotmail.com> writes:
> Over 2 years back, Clive Page had a topic asking whether anyone could write

> a one-line statement to convert strings to upper/lower case.. All the usual

> suspects here (except yours truly) had a crack at it in the ensuing 22

> message debate,

> but IMO no-one came up with anything useful that readers are likely using

> today..

>
> Since its been pretty quiet around here this month, I thought posting one of

> my upper/lower functions would serve as

> my belated response to his topic and demo the varying string output

> capability of F9x in a useful function.
>

> ! -----------------------------
> module my_functions
> contains
>
> function to_upper(s1) result (s2)
> implicit none

> integer,parameter :: duc = ichar('A') - ichar('a')

> character(*) :: s1
> character(len(s1)) :: s2
> character :: ch
> integer :: n
> do n = 1,len(s1)
> ch = s1(n:n)

> if (ch >= 'a'.AND.ch <= 'z') ch = char(ichar(ch)+duc)

Isn't this code going to change other non-alphabetic
characters in non-ASCII systems?

> s2(n:n) = ch
> end do
> end function to_upper
> end module my_functions

> ! -----------------------------
>
> program test
> use my_functions
> implicit none

> character(10) :: string = 'abcdef 123'
>

> write (*,'(a)') to_upper('abc123') ! outputs 6chars = ABC123

> string = to_upper(string) ! demo 1-statement conversion of arbitrary

> length string

> write (*,'(a)') string ! outputs 10chars = ABCDEF 123

> end program
>
>
>

Richard Maine

unread,

Dec 19, 2001, 4:42:10 PM12/19/01

to

robin <rob...@bigpond.nospam.com> writes:

> Isn't this code going to change other non-alphabetic
> characters in non-ASCII systems?

Yep. And even with ASCII, it doesn't handle the national characters
(umlats et. al.). Most things don't handle those, but it is worth
noting. When I once inquired about why we didn't have something like
case-conversion intrinsics in the standard (being pretty basic and
widely used things), the answer I got was that the
internationalization issues were what made it complicated.

For what (very little) it is worth, here's the version I use,
extracted from a module of string utility functions. Yes my
up_map_ascii table is a little ugly to set up, but simple (and
probably efficient) to use.

No claim is implied that this code is better than any other - its
just what I use.

!-- string.f90
!-- 29 Apr 92, Richard Maine: Version 1.0.
...
module fdas_string
...
!-- All character case stuff applies only to U.S. characters.
!-- National ASCII characters are treated as non-alphabetic.
...

integer :: i_do
...
integer, parameter :: up_map_ascii(0:127) = &
(/ (i_do, i_do=0,96), (i_do-32, i_do=97,122), (i_do, i_do=123,127) /)
...
contains
...
function upper_case (string) result(result)

!-- Change all lower case characters in a string to upper case.
!-- generic version.
!-- System-dependent versions might be more efficient.
!-- 29 Apr 92, Richard Maine.

!-------------------- interface.
character*(*), intent(in) :: string !-- An arbitrary string.
character*(len(string)) :: result !-- The converted string.

!-------------------- local.
integer :: i

!-------------------- executable code.

do i = 1 , len(string)
result(i:i) = achar(up_map_ascii(iachar(string(i:i))))
end do
return
end function upper_case
...
end module fdas_string

James Giles

unread,

Dec 19, 2001, 4:47:16 PM12/19/01

to

"David Frank" <dave_...@hotmail.com> wrote in message
news:In3U7.124081$oj3.21...@typhoon.tampabay.rr.com...

A corrected version that actually conforms to the standard
is:

> module my_functions
> contains
>
> function to_upper(s1) result (s2)
> implicit none

integer,parameter :: duc = iachar('A') - iachar('a')
> character(*) :: s1
> character(len(s1)) :: s2
integer, parameter:: lca = iachar('a'), lcz = iachar('z')
integer :: n, ch
> do n = 1,len(s1)
ch = iachar(s1(n:n))
if (ch >= lca .AND. ch <= lcz) ch = ch+duc
s2(n:n) = achar(ch)

> end do
> end function to_upper
> end module my_functions
> ! -----------------------------
>
> program test
> use my_functions
> implicit none
> character(10) :: string = 'abcdef 123'
>
> write (*,'(a)') to_upper('abc123') ! outputs 6chars = ABC123
> string = to_upper(string) ! demo 1-statement conversion of arbitrary
> length string
> write (*,'(a)') string ! outputs 10chars = ABCDEF 123
> end program

Note that neither version converts the case of letters outside of
the English alphabet (26 letters, no diacriticals). Both are
probably equally fast since on most systems ACHAR, CHAR,
IACHAR, and ICHAR are all implemented with the same
code (which is minimal). The revised version is correct
because the standard does not require the alphabetic characters
to be consecutive in the collating sequence of the default
character set, but ASCII does.

Neither works as fast as a table lookup which, one would hope,
would be implemented as a loop around an XLAT instruction on
an Intel - or clone - CPU. It's hard to code the XLAT version of
the code in a high-level language though. You can't do it in
C either. An approximation is:

function to_upper(s1) result (s2)
implicit none

character(*) :: s1
character(len(s1)) :: s2
character(1), parameter :: table(0:255) = (/ &
(achar(i), i=0,iachar('a')-1), &
(achar(i), i=iachar('A'), iachar('Z')), &
(achar(i), i=iachar('z')+1, 127) &
/)
integer :: n

do n = 1,len(s1)
s2(n:n) = table(iachar(s1(n:n)))
end do

end function to_upper

As I say, this is only approximately what you want. It requires
that the compiler be smart enough to recognize that the assignment
in the loop is actually an instance of a single XLAT instruction.
I am also assuming that characters are 8-bit, which increasingly
won't be true into the future. I am also assuming that all the
characters in the actual argument to S1 have ASCII counterparts.
(Hence, I don't claim the above to be standard conforming - it's
based on too many assumptions.)

You could have several TABLEs, one for lower to upper
case, one for upper to lower, etc. These could correctly translate
*all* letters in your character set, not just the 26 in the English
version of the latin alphabet. (I could have done this for Latin-1
in the above example, but I was too lazy to write out the definition
of such an array.) You could have a version of the TABLE that
translates ASCII to EBCDIC or vice-versa. Indeed, I've often
wondered why TRANSLATE wasn't an intrinsic in Fortran:
many machines have it as a single instruction in hardware.

--
J. Giles

Dan Tex1

unread,

Dec 19, 2001, 5:10:16 PM12/19/01

to

>From: Herman D. Knoble h...@psu.edu
>Date: 12/19/01 1:03 PM Pacific Standard Time
>Message-id: <0sv12ucpjlrebfugn...@4ax.com>

>
>David: Clive Page suggested this code several months ago:
>http://ftp.cac.psu.edu/pub/ger/fortran/hdk/ucase.f90
>
>Fortran 77 code that's been around for a long time (but
>uses the Char function):
> CHARACTER*8 TEST, UPIT
> TEST='abcd12yz'
> CALL UPPER(TEST,UPIT)
> WRITE(*,*) 'Upper=',UPIT
> STOP
> END
> SUBROUTINE UPPER(LOW,UP)
> CHARACTER*(*) LOW, UP
> INTEGER diff
> diff=ichar('A')-ichar('a')
> UP=char(ichar(LOW)+diff)
> RETURN
> END

The above code doesn't work.

Dan :-(

Paul Curtis

unread,

Dec 19, 2001, 5:41:43 PM12/19/01

to

Well, personally I think CASE statements are much easier to read (and
also to write!) than IFs, particularly when there are multiple
conditions within the same CASE, and also when one can take advantage of
scoping syntax to compress the logic, as in the upshift example that
started this thread. In my code, I seldom use IFs anymore, except for
the simplest single-comparison test. A CASE construct makes clear that
there could potentially be other CASES as well as the one presented, and
it then becomes a snap to insert additional CASEs within the SELECT as
new circumstances may warrant; this is much easier than fighting one's
way through a long and complex set of IF statements.

I had also formed the impression, based at least in part (if I recall
correctly) on some earlier comments from you (and others), to the effect
that F90/95 compilers deal with CASEs much more efficiently than IFs.

I further owe a profound debt of gratitude to whoever noted that I
should have used quoted constants, 'a' and 'z', as the scoping limits
in my sample, instead of the literal characters themselves. You can
never be too precise around here.

====================================================
Process Control Systems for the Ceramics Industry
KIL-TEL Systems Inc. Paul M. Curtis, PhD
9013 NE 37th Place tel 425-451-7689
Bellevue, WA 98004 fax 425-450-1722
sa...@kiltel.com pcu...@kiltel.com
====================================================

Richard Maine

unread,

Dec 19, 2001, 6:29:08 PM12/19/01

to

Paul Curtis <pcu...@kiltel.com> writes:

> Well, personally I think CASE statements are much easier to read (and
> also to write!) than IFs, particularly when there are multiple

> conditions within the same CASE...

I was talking about the particular code sample presented - not about all
possible comparisons of CASE and IF. This particular sample did
not have multiple conditions. That's exactly why I thought the IF
form more clear. If it had multiple conditions, I might have thought
differently.

Of course, you have every right to an opinion different from mine.
Heck, I might not have posted at all if it weren't that I had read two
consecutive posts - one post in which you berated someone for failing
to explicitly label an opinion as such, followed by another post in
which you offered an opinion ("would look better") without explicitly
labeling it. This struck me as amusing. I probably should have resisted
the resulting temptation.

> I had also formed the impression, based at least in part (if I recall
> correctly) on some earlier comments from you (and others), to the effect
> that F90/95 compilers deal with CASEs much more efficiently than IFs.

Again, I suspect that those comments were probably based on the assumption
that there were multiple conditions. It does make a difference. I haven't
gobe back to research the particular posts, but that would be my guess.

James Van Buskirk

unread,

Dec 19, 2001, 6:53:21 PM12/19/01

to

"Herman D. Knoble" <h...@psu.edu> wrote in message
news:0sv12ucpjlrebfugn...@4ax.com...

> David: Clive Page suggested this code several months ago:
> http://ftp.cac.psu.edu/pub/ger/fortran/hdk/ucase.f90

Although I like the above method because it uses array
syntax and no explicit loops, I present some alternatives,
tested with LF95 5.60g (I'll send you a bug report, Steve):

program to_upper_test
implicit none
type node
character c
type(node), pointer :: next
end type node
type(node), pointer :: head => NULL(), cursor
integer :: char_count = 0
character c

write(*,'(a)',advance='no') ' Enter a string:> '
do
read(*,'(a1)',advance = 'no', EOR = 10, ERR = 10, END = 10) c
allocate(cursor)
cursor%c = c
cursor%next => head
head => cursor
char_count = char_count+1
end do
10 continue
call process_string(head, char_count)
contains
subroutine process_string(head, char_count)
type(node), pointer :: head
integer, intent(in) :: char_count
character(char_count) z
integer k
type(node), pointer :: cursor
character(len(z)) s

do k = char_count, 1, -1
z(k:k) = head%c
cursor => head
head => cursor%next
nullify(cursor%next)
deallocate(cursor)
end do
! All the code up to this point was to get the input string.
! Now processing can proceed

! First method: uses magic numbers instead of iachar('a'), &c, but OK
! because the ASCII collating sequence isn't going to change.
s = z
forall(k = 1:len(s), 97 <= iachar(s(k:k)) .AND. &
iachar(s(k:k)) <= 122) s(k:k) = achar(iachar(s(k:K))-32)
write(*,*) s

! Second method: uses array constructors
s = ""
s = transfer((/(merge(achar(iachar(z(k:k))-32),z(k:k),97 <= &
iachar(z(k:k)).AND.iachar(z(k:k)) <= 122),k=1,len(z))/),z)
write(*,*) s

! Third method: uses elemental functions
s = ""
s = to_upper(z)
write(*,*) s
end subroutine process_string

elemental function to_upper(z)
character(*), intent(in) :: z
character(len(z)) to_upper

to_upper = transfer(to_upper_char(transfer(z,(/'z'/))),z)
end function to_upper

elemental function to_upper_char(z)
character to_upper_char
character, intent(in) :: z

to_upper_char = merge(achar(iachar(z)-32), z, 97 <= iachar(z) &
.AND. iachar(z) <= 122)
end function to_upper_char
end program to_upper_test

James Giles

unread,

Dec 19, 2001, 6:58:08 PM12/19/01

to

"Paul Curtis" <pcu...@kiltel.com> wrote in message
news:3C2117A6...@kiltel.com...

> Well, personally I think CASE statements are much easier to read (and
> also to write!) than IFs, particularly when there are multiple
> conditions within the same CASE, and also when one can take advantage of
> scoping syntax to compress the logic, as in the upshift example that
> started this thread. In my code, I seldom use IFs anymore, except for
> the simplest single-comparison test. A CASE construct makes clear that
> there could potentially be other CASES as well as the one presented, and
> it then becomes a snap to insert additional CASEs within the SELECT as
> new circumstances may warrant; this is much easier than fighting one's
> way through a long and complex set of IF statements.

I believe this to be entirely a matter of taste in this instance.
When the condition being tested can easily be reduced to a
small set of discrete alternatives that can be indexed, the
SELECT/CASE is appropriate. When the condition being
tested cannot be reduced to such a set of indexable possibilities,
IF is more appropriate. There is an overlap between the two
in which either is good, and this is one instance of that.

Select (iachar(ch))
Case (iachar('a') :iachar('z'))
...
End select

vs.

If (iachar(ch) >= iachar('a') .and. iachar(ch) <= iachar('z')) then
...
End if

These are semantically the same, and I actually see them as
such. Adding another selection is similar in both: either another
CASE block or an ELSE IF block. Again, I see little difference.

> I had also formed the impression, based at least in part (if I recall
> correctly) on some earlier comments from you (and others), to the effect
> that F90/95 compilers deal with CASEs much more efficiently than IFs.

It depends on the condition being tested. If the condition is not
easily handled as a jump-table (what the original computed GOTO
actually implemented), then CASE is no better, and possibly worse
than IFs. A naive jump-table for the above would use 26 entries
(at least). That's not too bad. And it means the time required for
the construct was the same regardless of which condition was selected
(except that a distant jump might be out of cache). But suppose you had:

Select (I)
Case (0:2000000)
...
End select

The compiler (you hope) would implement this the same as
the corresponding IF. Otherwise you'd have a rather enormous
jump-table.

Finally, if the conditions you are testing for are not easily reduced
to discrete index values, it's hard to even apply SELECT/CASE.
Consider:

If (abs(x) < 0.05) then
...
Else if (x > 0.00 .and. x < 100.0) then
...
Else if (x < 0.00 .and. x > -100.0) then
...
Else
...
End if

Note that the conditions overlap and are REALs. Converting this
to a SELECT/CASE construct would require the programmer to
rework conditions and would require the compiler to do as much
work as with the IF version. Further, the IF version allows the
programmer to prioritize: by putting the most common case first,
for example. In the SELECT/CASE, the order of checking the
possibilities is up to the compiler.

It depends on what you're doing, but it is my experience that most
conditions appearing in Fortran are, at least, tests of REALs rather
than CHARACTERs, INTEGERS, or LOGICALs. That is, the last
example above is more representative of usual code. I think most
people intuitively select an appropriate construct for what they're
doing.

--
J. Giles

James Giles

unread,

Dec 19, 2001, 7:15:01 PM12/19/01

to

"James Van Buskirk" <not_...@attbi.com> wrote in message
news:RL9U7.10034$Ah.512618@rwcrnsc52...
...

> Although I like the above method because it uses array
> syntax and no explicit loops, I present some alternatives,
> tested with LF95 5.60g (I'll send you a bug report, Steve):

...

??? If you don't like explicit loops, why the following?

> ! First method: uses magic numbers instead of iachar('a'), &c, but OK
> ! because the ASCII collating sequence isn't going to change.
> s = z
> forall(k = 1:len(s), 97 <= iachar(s(k:k)) .AND. &
> iachar(s(k:k)) <= 122) s(k:k) = achar(iachar(s(k:K))-32)
> write(*,*) s

A FORALL is an explicit looping construct. Semantically, it
is explicitly order independent, so it can be vectorized, or otherwise
parallelized without any extensive compiler analysis. But it's
a loop all the same.

> ! Second method: uses array constructors
> s = ""
> s = transfer((/(merge(achar(iachar(z(k:k))-32),z(k:k),97 <= &
> iachar(z(k:k)).AND.iachar(z(k:k)) <= 122),k=1,len(z))/),z)
> write(*,*) s

An implied loop is an explicit one. You just wrote it differently (and
certainly no more legibly).

> ! Third method: uses elemental functions
> s = ""
> s = to_upper(z)
> write(*,*) s
> end subroutine process_string
>
> elemental function to_upper(z)
> character(*), intent(in) :: z
> character(len(z)) to_upper
>
> to_upper = transfer(to_upper_char(transfer(z,(/'z'/))),z)
> end function to_upper
>
> elemental function to_upper_char(z)
> character to_upper_char
> character, intent(in) :: z
>
> to_upper_char = merge(achar(iachar(z)-32), z, 97 <= iachar(z) &
> .AND. iachar(z) <= 122)
> end function to_upper_char
> end program to_upper_test

This double-function is too bad. I still believe it was a bad
idea that character strings are not the same as arrays of character*1.
All this work with TRANSFER and having to have a second nested
function could have been elided. :-(

--
J. Giles

David Frank

unread,

Dec 20, 2001, 1:04:25 AM12/20/01

to

I posted my source in comp.lang.pl1 as an example of modern fortran and
asked for a translation of ALL the source.
and Robin came up with:

> PL/I does it in one line:
> s = TRANSLATE (s, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz');

but he hasnt provided a complete translation as yet..

bendel boy

unread,

Dec 20, 2001, 5:10:19 AM12/20/01

to

"David Frank" <dave_...@hotmail.com> wrote in message news:<JbfU7.130401$oj3.22...@typhoon.tampabay.rr.com>...

But he has provided a sufficient part, surely? The question was about
upper to lower/lower to upper, and this code fragment appears to be
sufficient.

I like the use of an explicit character string. Then it is easy for
anybody to expand to their own additional characters. I used to use

character UPPER*(*)
parameter UPPER = 'ABC ...'
integer LenUp
parameter LenUp = 26

(Yes, Fortran 77)

as changes to the character set were easy. No expectation of the ASCII
character set. Easy to see what is going on. Easy to work out what
went wrong, if necessary.

(Now I use the Visual Basic Upper$ and Lower$ functions - which also
appear to handle the international character sets.)

David Frank

unread,

Dec 20, 2001, 7:32:53 AM12/20/01

to

"bendel boy" <sbdp...@aol.com> wrote in message
news:4db10a80.01122...@posting.google.com...

> "David Frank" <dave_...@hotmail.com> wrote in message
news:<JbfU7.130401$oj3.22...@typhoon.tampabay.rr.com>...
> > I posted my source in comp.lang.pl1 as an example of modern fortran and
> > asked for a translation of ALL the source.
> > and Robin came up with:
> >
> > > PL/I does it in one line:
> > > s = TRANSLATE (s, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
> > 'abcdefghijklmnopqrstuvwxyz');
> >
> > but he hasnt provided a complete translation as yet..
>
> But he has provided a sufficient part, surely? The question was about
> upper to lower/lower to upper, and this code fragment appears to be
> sufficient.
>

Yes, but there is more involved than above, ie: pli's translation of
module,
indexing a string within a to_upper function, specifying varying length
function output, etc.
Would you want to see his above TRANSLATE with its 2 full-length
translation strings at each point where a conversion is required? I think
most would prefer to see the details hidden in a to_upper function, thats
what I mean about he hasnt completely translated my program..

BTW, his argument is PLI language (which is dying) cant have its programs
translated to Fortran
because a PLI program uses a superior syntax to Fortran that inhibits this
migration, I am trying to refute this by
translating his examples, and also posting some Fortran source for the
newsgroup's edification..

Herman D. Knoble

unread,

Dec 20, 2001, 8:46:42 AM12/20/01

to

Dan, Thanks. Sorry about that.
This was a boo boo. That code (CHAR) will do only
one character at a time; e.g.,

CHARACTER*1 TEST,UPIT
TEST='q'

On 19 Dec 2001 22:10:16 GMT, dan...@aol.com (Dan Tex1) wrote:

-|>-|Since its been pretty quiet around here this month, I thought posting one

-|>of
-|>-|my upper/lower functions would serve as
-|>-|my belated response to his topic and demo the varying string output
-|>-|capability of F9x in a useful function.

-|>-|
-|>-|! -----------------------------

-|>-|module my_functions
-|>-|contains
-|>-|

-|>-|function to_upper(s1) result (s2)

-|>-|implicit none
-|>-|integer,parameter :: duc = ichar('A') - ichar('a')
-|>-|character(*) :: s1
-|>-|character(len(s1)) :: s2
-|>-|character :: ch
-|>-|integer :: n
-|>-|do n = 1,len(s1)
-|>-| ch = s1(n:n)
-|>-| if (ch >= 'a'.AND.ch <= 'z') ch = char(ichar(ch)+duc)
-|>-| s2(n:n) = ch
-|>-|end do
-|>-|end function to_upper
-|>-|end module my_functions
-|>-|! -----------------------------

-|>-|
-|>-|program test

-|>-|use my_functions
-|>-|implicit none
-|>-|character(10) :: string = 'abcdef 123'

-|>-|
-|>-|write (*,'(a)') to_upper('abc123') ! outputs 6chars = ABC123

-|>-|string = to_upper(string) ! demo 1-statement conversion of
-|>arbitrary
-|>-|length string
-|>-|write (*,'(a)') string ! outputs 10chars = ABCDEF 123
-|>-|end program
-|>-|
-|>-|
-|>

-|>
-|> Herman D. (Skip) Knoble, Research Associate
-|

Herman D. (Skip) Knoble, Research Associate

Bill

unread,

Dec 20, 2001, 11:14:22 AM12/20/01

to

James Giles wrote:<snip>

>
> You could have several TABLEs, one for lower to upper
> case, one for upper to lower, etc. These could correctly translate
> *all* letters in your character set, not just the 26 in the English
> version of the latin alphabet. (I could have done this for Latin-1
> in the above example, but I was too lazy to write out the definition

> of such an array.) <snip>

The translation to a table is straightforward for a character set only if upper
and lower case have a one to one mapping. Unfortunately because of one character,
the german small sharp s which is a "ligature" for ss or sz, the mapping is not
one to one for Latin-1. I believe that this is the only character which is always
a problem in Latin-1 for conversion to upper case, but some accented characters
may in some contexts change accents in case conversions if you want to make the
rules reflect specific language rules.

This problem existsts in various forms in many other character sets. Dealing with
this problem and similar ones, greatly complicates the robust treatment of
character sets. Strings almost have to be dynamic one dimensional arrays of
single characters, (or one dimensional arrays of multibyte integers if you want
to deal "simply" with sets with more than 256 characters.).

James Giles

unread,

Dec 20, 2001, 3:58:18 PM12/20/01

to

"Bill" <wclo...@lanl.gov> wrote in message news:3C220E5...@lanl.gov...
...

> The translation to a table is straightforward for a character set only if
upper
> and lower case have a one to one mapping. Unfortunately because of one
character,
> the german small sharp s which is a "ligature" for ss or sz, the mapping is
not
> one to one for Latin-1. I believe that this is the only character which is
always
> a problem in Latin-1 for conversion to upper case, but some accented
characters
> may in some contexts change accents in case conversions if you want to make
the
> rules reflect specific language rules.

Latin-1 has a lower case y-umlaut (ÿ), but not an upper case one.
Also, are the feminine and masculine ordinals (ª and º respectively),
that some languages treat as letters, upper case or lower case?
How do they transform if you want to change case?

So, you're right that there are additional problems. I was merely
addressing the relatively simple problem of translating the more
old fashioned character sets.

One the other hand, some of the problems you mention above
can still be handled with a simple translate. For example, if an
accented character changes accent in the case conversion, that
can be done, though you may need a different translation table
for each language that does this. The characters which can't
change case without changing the number of characters is
considerably harder (ß is the only one I know of, as you mentioned).

By the way, I don't know why you were so reticent about using the
characters themselves in your article. Latin-1 is the default character
set for usenet news (by international standard). Your news reader
*should* be able to handle them just fine.

> This problem existsts in various forms in many other character sets. Dealing
with
> this problem and similar ones, greatly complicates the robust treatment of
> character sets. Strings almost have to be dynamic one dimensional arrays of
> single characters, (or one dimensional arrays of multibyte integers if you
want
> to deal "simply" with sets with more than 256 characters.).

I agree.

--
J. Giles

Dan Tex1

unread,

Dec 20, 2001, 4:05:07 PM12/20/01

to

>> ! Third method: uses elemental functions
>> s = ""
>> s = to_upper(z)
>> write(*,*) s
>> end subroutine process_string
>>
>> elemental function to_upper(z)
>> character(*), intent(in) :: z
>> character(len(z)) to_upper
>>
>> to_upper = transfer(to_upper_char(transfer(z,(/'z'/))),z)
>> end function to_upper
>>
>> elemental function to_upper_char(z)
>> character to_upper_char
>> character, intent(in) :: z
>>
>> to_upper_char = merge(achar(iachar(z)-32), z, 97 <= iachar(z) &
>> .AND. iachar(z) <= 122)
>> end function to_upper_char
>> end program to_upper_test
>
>This double-function is too bad. I still believe it was a bad
>idea that character strings are not the same as arrays of character*1.
>All this work with TRANSFER and having to have a second nested
>function could have been elided. :-(
>
>--
>J. Giles

I still like EQUIVALENCE for similar reasons. For example

SUBROUTINE UpperCase ( string )
Character(*), intent(in) :: string
Character(132) :: s
Character(1) :: t(132)
Equivalence ( s, t )
s = string

! I then operate on t(1) to t(whatever). The syntax is easy for me to read
this way.
! Of course, the above example limits my input string to 132 characters.
! Maybe there is a simple, elegant way around this limit while still using
! this basic technique?

End Subroutine UpperCase

Dan :-)

Bill

unread,

Dec 20, 2001, 6:46:09 PM12/20/01

to

James Giles wrote:<snip>

>
> Latin-1 has a lower case y-umlaut (ÿ), but not an upper case one.
> Also, are the feminine and masculine ordinals (ª and º respectively),
> that some languages treat as letters, upper case or lower case?
> How do they transform if you want to change case?

I believe that lower case y-umlaut (ÿ) never appears at the beginnings of words
(but my knowledge of French is bad and nonexistent for Welsh and Dutch). If that
is the case the natural useage of the charcter does not result in upper case
conversion. For special cases where it has been converted, I suspect you need a
dictionary to recognize, as part of the translation process, whether the word
containing the character contains the accented or unaccented character. In the
worse case you may need to heuristics and examine nearby words for additional
context to distinguish similar words

What languages use the ordinals as letters? While their typical forms are
inspired by letters, a for the terminal a of adjectives modifying feminine nouns,
o for the terminal o of adjectives modifying masculine nouns, they are
distinguished from the letters that inspired them, and, from the limited
descriptions I have seen. I would expect them to be caseless, similar to digits.

>
>
> So, you're right that there are additional problems. I was merely
> addressing the relatively simple problem of translating the more
> old fashioned character sets.
>
> One the other hand, some of the problems you mention above
> can still be handled with a simple translate. For example, if an
> accented character changes accent in the case conversion, that
> can be done, though you may need a different translation table
> for each language that does this. The characters which can't
> change case without changing the number of characters is
> considerably harder (ß is the only one I know of, as you mentioned).

There are also languages with three cases (a special one for titles of
documents?) and arabic appears to have four cases.
<snip>

Donald Arseneau

unread,

Dec 20, 2001, 8:16:59 PM12/20/01

to

Bill <wclo...@lanl.gov> writes:

> > Latin-1 has a lower case y-umlaut (ÿ), but not an upper case one.
>

> I believe that lower case y-umlaut (ÿ) never appears at the beginnings of
> words

It is not a y-umlaut but an i-j digraph. Its uppercase is IJ. Just like...

> > for each language that does this. The characters which can't
> > change case without changing the number of characters is
> > considerably harder (ß is the only one I know of, as you mentioned).

Donald Arseneau as...@triumf.ca

James Giles

unread,

Dec 20, 2001, 8:16:22 PM12/20/01

to

"Bill" <wclo...@lanl.gov> wrote in message news:3C227821...@lanl.gov...

> James Giles wrote:<snip>
...

> > Latin-1 has a lower case y-umlaut (ÿ), but not an upper case one.
> > Also, are the feminine and masculine ordinals (ª and º respectively),
> > that some languages treat as letters, upper case or lower case?
> > How do they transform if you want to change case?
>
> I believe that lower case y-umlaut (ÿ) never appears at the beginnings of
words
> (but my knowledge of French is bad and nonexistent for Welsh and Dutch). If
that
> is the case the natural useage of the charcter does not result in upper case

> conversion. [...]

Well, I was thinking not just of normal typographic conventions
but of all legitimate uses. It's not unusual for signs, advertising copy,
and so forth to use all caps for words and even sentences that aren't
usually capitalized. The first word in a chapter is sometimes given
in small-caps. Etc.. For those you might still need a capitalized y-umlaut
even though it's not normally used in the language. There is actually
a capital y-umlaut in the present full definition of Unicode (ISO 10646):
u0178.

> What languages use the ordinals as letters? While their typical forms are
> inspired by letters, a for the terminal a of adjectives modifying feminine
nouns,
> o for the terminal o of adjectives modifying masculine nouns, they are
> distinguished from the letters that inspired them, and, from the limited
> descriptions I have seen. I would expect them to be caseless, similar to
digits.

I only know of the ordinals indirectly. The UNICODE definition
classifies them as letters. Perhaps more to the point in this group,
the C standard classifies them as letters for the purpose of allowing
them in identifiers. If Fortran ever extends its syntax to allow
non-ASCII letters in identifiers, it's something that will have to
be considered (and, for the purposes of interoperability, maybe
we should already be considering it). This includes such (apparently
trivial) questions such as how do you upcase your list of identifiers
if any of them contain the ordinals?

And, we haven't even addressed the problem of how to alphabetize
all these characters. Different languages have different rules. From
what I understand, U and V are treated as the same for the purposes
of alphabetization in some languages. And, does Ä preceed or follow
A in the alphabet - or is it the same as if you wrote AE? Etc.. I don't
have any personal knowledge of this stuff at all and have only been
told indirectly that such issues arise. Probably someone will have
to hash it all out (possibly someone has already done so).

To paraphrase someone's law: it's all messier than we think it is,
even taking into account that it will be messier than we think.

--
J. Giles

Clive Page

unread,

Dec 21, 2001, 6:04:35 AM12/21/01

to

In article <3C2117A6...@kiltel.com>,

Paul Curtis <pcu...@kiltel.com> wrote:
>Well, personally I think CASE statements are much easier to read (and
>also to write!)

Except that you've already shown us how easy it is to get one wrong. :-)

But without checking the small print of the Standard, I wouldn't be certain
that case('a':'z') would work on a non-ASCII machine (say one using EBCDIC,
where the letters are not contiguous), whereas conversions using ACHAR and
IACHAR are guaranteed to use ASCII collating sequence.

I agree that's a pedantic comment as I think EBCDIC has gone the way of
the dinosaurs, but what about Unicode?

--
Clive Page c...@le.ac.uk

Bill

unread,

Dec 21, 2001, 12:07:34 PM12/21/01

to

James Giles wrote:

> "Bill" <wclo...@lanl.gov> wrote in message news:3C227821...@lanl.gov...
>

> > James <snip>

>
> Well, I was thinking not just of normal typographic conventions
> but of all legitimate uses. It's not unusual for signs, advertising copy,
> and so forth to use all caps for words and even sentences that aren't
> usually capitalized. The first word in a chapter is sometimes given
> in small-caps. Etc.. For those you might still need a capitalized y-umlaut
> even though it's not normally used in the language. There is actually
> a capital y-umlaut in the present full definition of Unicode (ISO 10646):
> u0178.

But then you are no longer dealing with Latin-1. In French I believe ÿ can only be
upper cased to Y using the Latin-1 character set, in Dutch where ÿ is sometimes
used in Latin-1 for the ij ligature, ÿ should be upper cased to IJ, I don't know
Welsh.
<snip>

> I only know of the ordinals indirectly. The UNICODE definition
> classifies them as letters. Perhaps more to the point in this group,
> the C standard classifies them as letters for the purpose of allowing
> them in identifiers.

But C and Fortran also allow numerals and _ in identifiers. Those characters are
monocased and so I believe are the ordinals. Upper case has no effect on them.

> If Fortran ever extends its syntax to allow
> non-ASCII letters in identifiers, it's something that will have to
> be considered (and, for the purposes of interoperability, maybe
> we should already be considering it). This includes such (apparently
> trivial) questions such as how do you upcase your list of identifiers
> if any of them contain the ordinals?
>
> And, we haven't even addressed the problem of how to alphabetize
> all these characters.

The rest of this suggests you mean not alphebetize per se, but rather lexically
order words (identifiers) containing the characters.

> Different languages have different rules. From
> what I understand, U and V are treated as the same for the purposes
> of alphabetization in some languages. And, does Ä preceed or follow
> A in the alphabet - or is it the same as if you wrote AE? Etc.. I don't
> have any personal knowledge of this stuff at all and have only been
> told indirectly that such issues arise. Probably someone will have
> to hash it all out (possibly someone has already done so).

It is a mess to define something in general. Lexical orderings are almost always
application specific. Even in English the ordering used in a dictionary is
different from that used in a phone book. Other languages often have very
different rules: European spanish,in most contexts, treats ch as a single
character between c and d, includes ll as a special ligature (not available in
Latin-1) that is a character between l and m. Languages differ greatly in their
treatment of accents and case on lexical orderings, accents can be ignored,
considered only in special contexts, treated as separate characters, less or more
important than case, etc. Any rule of lexical ordering is so application specific
that it should not be incorproated in Fortran.

What might be of interest to Fortran, is the incorporation of non-ascii characters
in Fortran identifiers. This does not require an ordering per se, but a simple
rule for distinguising which identifiers are unique. This is relatively simple to
generalize for languages which are case sensitive (e.g., C, C++, and Java), in
these languages of the subset of characters that are allowed in identifiers all
characters are unique.

Unfortunately Fortran is a caseless language. The 26 "latin" letters are not
distinguished by case in Fortran identifiers. If Fortran wants to include
additional characters in identifiers it has the following options:

1. Treat each character outside of ASCII as unique, so that Fortran is caseless
only for the 26 latin letters. This is inconsistent in many ways but easy to
implement.

2. Treat the lowest seven bits of a well defined character set (e.g. Unicode) as
defining a mapping to ASCII, and in this mapping use Fortran's rules for ASCII on
this mapping. I suspect this result would be horrible, but it is consistent and
easy to implement.

3. Define Fortran specific rules for each non-ASCII character allowed in
identifiers. But the complexity of the task increases with the number of
characters (I cannot see doing this for Unicode), and will be very politically
sensitive for some characters where the rules differ between languages.

Some Latin-1 examples: Does ÿ map to Y, IJ, or some special Welsh rule. Are ª and
º unique or do they map to A and O. I supsect that the users might want Æ, æ, ?,
?, and ß to map to AE, AE, FI, FL, and SS, but implementors might object to the
change in identifier size in translation. Do Ñ and ñ map to N or NY (I beleive
different spanish speaking countries have different conventions here). For other
characters, do you simply drop the accents and convert to upper case, or retain
the accents and convert to upper case. Do you allow the supperscript numerals, and
do they map to regular numerals.

James Giles

unread,

Dec 21, 2001, 5:51:31 PM12/21/01

to

"Bill" <wclo...@lanl.gov> wrote in message news:3C236C51...@lanl.gov...
...

> > For those you might still need a capitalized y-umlaut
> > even though it's not normally used in the language. There is actually
> > a capital y-umlaut in the present full definition of Unicode (ISO 10646):
> > u0178.
>
> But then you are no longer dealing with Latin-1. In French I believe ÿ can
only be
> upper cased to Y using the Latin-1 character set, in Dutch where ÿ is
sometimes
> used in Latin-1 for the ij ligature, ÿ should be upper cased to IJ, I don't
know
> Welsh.

Ah, but I was answering your previous suggestion that
uppercase Y with diaresis wasn't ever used. Evidently
someone disagreed with you to a sufficient extent to have
it added to the extended Latin codepage of Unicode.

...

> > I only know of the ordinals indirectly. The UNICODE definition
> > classifies them as letters. Perhaps more to the point in this group,
> > the C standard classifies them as letters for the purpose of allowing
> > them in identifiers.
>
> But C and Fortran also allow numerals and _ in identifiers. Those characters
are
> monocased and so I believe are the ordinals. Upper case has no effect on them.

So, you're answering my implied question (what should upper
case conversion do to the ordinals?) buy saying it should do
nothing to them. That's certainly a supportable opinion. I was
merely pointing out that it was a decision that would have to be
made. I don't necessarily support or oppose your choice. It is
something that will probably be decided outside the control of
either of us. (As a practical matter, in code that I'm presently
writing, "do nothing" will probably be exactly what I do with
the ordinals when changing case. The same with ÿ and ß.
Whether that's "right" or not will be decided by others.)

> > And, we haven't even addressed the problem of how to alphabetize
> > all these characters.
>
> The rest of this suggests you mean not alphebetize per se, but rather
lexically
> order words (identifiers) containing the characters.

That's the primary definition of the word "alphabetize" in
all three of my dictionaries (it's the *only* definition of the
word in one pocket dictionary; my other pocket dictionary
doesn't list the word). So, whatever subtle semantic point
you intended to make missed.

The remainder of your article shows that you do indeed
recognize that such issues are both important and difficult.
That was my only point. I don't have, nor do I recommend, any
particular solution to these difficulties. I haven't the expertise,
for one thing. I'm interested in discovering what others decide
is appropriate though.

> Unfortunately Fortran is a caseless language. [...]

For most purposes (at least for English speaking programmers)
this is far from unfortunate. It is the best design decision. The
productivity of programmers suffers in languages which are case
sensitive. (Seems like there were some experiments on the subject
in hte '70s. Unfortunately, I don't have copies of the reports from
those studies. :-( So, I'll just leave it as just my *opinion* that case
sensitivity is bad.)

--
J. Giles

string -> upper case using function to_upper

David Frank

Paul van Delst

Paul Curtis

Gib Bogle

Paul van Delst

Herman D. Knoble

Richard Maine

robin

Richard Maine

James Giles

Dan Tex1

Paul Curtis

Richard Maine

James Van Buskirk

James Giles

James Giles

David Frank

bendel boy

David Frank

Herman D. Knoble

Bill

James Giles

Dan Tex1

Bill

Donald Arseneau

James Giles

Clive Page

Bill

James Giles