Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Built-in "string" type for dynamically sized strings

414 views
Skip to first unread message

spectrum

unread,
Oct 29, 2017, 12:06:06 PM10/29/17
to
Although there is already a dynamically sized character string in current Fortran
(i.e., character(:), allocatable or pointer), I think it would be very useful if we had
a built-in, native "string" type in the Standards that is analogous
to such a type in other languages. For example,

program main
implicit none
string :: s

s = "hello"
print *, s
end

This is partly motivated by the possible future introduction of generics,
because otherwise various generic types or function calls might become
pretty lengthy, for example,

List<character(:), allocatable> :: foo

as compared to

List<string> :: foo

(and the situation becomes worse for more complicated cases
like Dict<string,string> etc...)

In addition to simplicity, I think such a native string type would be convenient
to make an array of dynamically sized strings, e.g.,

string, allocatable :: colors(:)
allocate( names( 3 ) )

names( 1 ) = "blue"
names( 2 : 3 ) = [ string :: "orange", "red" ] !! each item can differ in length

This could probably be achieved by making a wrapper type such as

type string_t
character(:), allocatable :: str
contains
!! methods here
endtype

type( string_t ), allocatable :: names(:)
...

and we have various nice libraries for this (e.g., StringiFor by Stefano
https://github.com/szaghi/StringiFor )

However, my opinion is that if the use cases are so very clear
and a lot of other languages have such "string" types built-in,
it would be much easier for general users to be able to
make use of such a type out-of-the-box (without further installation
or search for an external library).

My additional hopes for such string type are something like:

* split() method that returns an array of strings

string, allocatable :: words( : )
string :: line

line = "foo baa bazbaz"

words = line % split() !! returns [ string :: "foo", "baa", "bazbaz" ]

real, allocatable :: vals( : )
line = "1.1, 2.2, 3.3"

val = line % split( delim="," type=real ) !! returns [ real :: 1.1, 2.2., 3.3 ]

* capability to read data directly into strings (w/o pre-allocation)

* patten match with regular expressions

string :: name
if ( name ~= "^[0-9]" ) then ...

---
I think such string enhancement in Fortran would make text data I/O
and analysis pretty convenient (in a way similar to other languages).

spectrum

unread,
Oct 29, 2017, 12:12:41 PM10/29/17
to
> string, allocatable :: colors(:)
> allocate( names( 3 ) )
>
> names( 1 ) = "blue"
> names( 2 : 3 ) = [ string :: "orange", "red" ] !! each item can differ in length

Hmm, the "names" above is a type of "colors"... xD

Ian Harvey

unread,
Oct 29, 2017, 4:38:15 PM10/29/17
to
Have a look at ISO/IEC 1539 part 2.

robin....@gmail.com

unread,
Oct 29, 2017, 8:19:44 PM10/29/17
to
On Monday, October 30, 2017 at 3:06:06 AM UTC+11, spectrum wrote:
> Although there is already a dynamically sized character string in current Fortran
> (i.e., character(:), allocatable or pointer), I think it would be very useful if we had
> a built-in, native "string" type in the Standards that is analogous
> to such a type in other languages. For example,
>
> program main
> implicit none
> string :: s
>
> s = "hello"
> print *, s
> end
>
> This is partly motivated by the possible future introduction of generics,
> because otherwise various generic types or function calls might become
> pretty lengthy, for example,
>
> List<character(:), allocatable> :: foo
>
> as compared to
>
> List<string> :: foo

One word extra?

FortranFan

unread,
Oct 29, 2017, 9:14:22 PM10/29/17
to
On Sunday, October 29, 2017 at 12:06:06 PM UTC-4, spectrum wrote:

> Although there is already a dynamically sized character string in current Fortran
> (i.e., character(:), allocatable or pointer), I think it would be very useful if we had
> a built-in, native "string" type in the Standards ..


@spectrum,

Yes, I totally agree.

As you know, there have been umpteen discussions on this forum over a number of years:
https://groups.google.com/d/msg/comp.lang.fortran/qONziG36nFs/EIqVvdnEBQAJ

But there has been little action by the standards committee and unless a majorly disrupting and influential group of folks join the committee and shake things up, nothing is going to happen either.

Basically, when you suggest a "built-in, native string type", it will either be viewed as calling for a new *intrinsic type* and rejected immediately on account of having a major impact on the type system in the language which the yet another minor update planned for Fortran 2020 cannot absorb; or be seen as a trivial variant to the existing (but practically useless) ISO_VARYING_STRING in the *other* part i.e., 2 of the standard (that gets paid no attention by any of the implementations).

It's the coders who keep getting shafted in the process. In my opinion, there are many coding aspects that belong in libraries, an ODE solver for example. But there are a few basic things such as 'string' class or generic containers that a language as long in the tooth as Fortran must provide as *intrinsic* facilities. Hopefully the Fortran standards committee will eventually start to see things that way.

robin....@gmail.com

unread,
Oct 29, 2017, 10:15:07 PM10/29/17
to
To suggest that it's neccessary to have a major addition to the language
to save writing one word is ridiculous.

spectrum

unread,
Oct 30, 2017, 12:55:24 AM10/30/17
to
On Monday, October 30, 2017 at 11:15:07 AM UTC+9, robin....@gmail.com wrote:
> To suggest that it's neccessary to have a major addition to the language
> to save writing one word is ridiculous.

Dear Robin,

Please read again my posts, which tries to explain a bit more about more
implications and usages other than simplicity. Besides, because of the weakness
of string handling, I've been already doing essentially everything related to
text data handling (e.g., preparing input and analysis) by using other languages
(simply because codes in Fortran for that purpose tends to become much
longer for parsing and string handling).

Another concern is about generics. Without a primitive string type,
it becomes inevitable to define a user-defined wrapper type to use it
for generics (unless "T" allows "character(:), allocatable").

A different approach might be to introduce type alias things,

typealias( character(:), allocatable ) :: string
or
typealias string = ...
or similar,

# I remember I have seen some posts about typedef and typealias
(which also people seem not very interested), so this feature is not
present in current Fortran.

But I agree that introducing too many redundant features and making
the languages "fat" is not necessarily a good idea (and increases the cost
and time of implementation and debugging). I will check also other languages
about the situation of strings.

Dear Ian and FortranFan,

Thanks very much for your information. I will first check the suggested ISO
docs and the previous post.

Best regards,

Clive Page

unread,
Oct 30, 2017, 6:37:19 AM10/30/17
to
On 29/10/2017 16:06, spectrum wrote:
> Although there is already a dynamically sized character string in current Fortran
> (i.e., character(:), allocatable or pointer), I think it would be very useful if we had
> a built-in, native "string" type in the Standards that is analogous
> to such a type in other languages. For example,
>
> program main
> implicit none
> string :: s
>
> s = "hello"
> print *, s
> end
[snip]

I agree with you that the current facilities in Fortran are inadequate and messy, and indeed I said so a couple of years ago at a BCS Fortran Group meeting:
http://www.fortran.bcs.org/2015/suggestion_string_handling.pdf

What you have suggested has two parts, essentially:
(1) A simple string type which is essentially a syntactic shorthand for
character(:), allocatable.

This would save typing effort and make programs more readable. Since it is essentially just syntactic sugar it would surely be very easy to implement in all modern compilers.

It would be highly desirable, in my view, also to remove current restrictions on "character(:), allocatable" also known as "string" such as not having the length allocated automatically when used in contexts other than an assignment statement, e.g. in the I/O list of a READ statement. This is one of the quite highly-ranked items being put forward for Fortran2020 so it may even happen if we are patient.

(2) Extend this string type to allow a string array in which the elements could have different lengths. The little-used ISO_VARYING_STRING already supports this, indeed it is now one of the few advantages it has over allocatable strings, but I understand that this addendum to the Standard may well be removed from the standard language before long. So having a string array with elements of varying length, is in my opinion, also a rather useful addition to Fortran. My guess is that it would take quite a bit more effort on the part of compiler-writers to implement it. So we probably need a good bit more support to get this considered for Fortran2020.



--
Clive Page

Thomas Koenig

unread,
Oct 30, 2017, 1:10:25 PM10/30/17
to
Clive Page <use...@page2.eu> schrieb:

> (2) Extend this string type to allow a string array in which
> the elements could have different lengths.

This can already be done using a derived type. Directly, it is
currently not possible because of the "no array of arrays" rule,
which would need to be modified. I am not sure why this was added,
but I suspect horrible ambiguities if this were to be allowed.

spectrum

unread,
Dec 10, 2017, 2:59:00 PM12/10/17
to
Thanks very much for pointing to the slides, which is a 1e300 % more complete
version of what I wished to write :-)

Indeed, I feel "character(:), allocatable" is somewhere between primitive char
types in C and std::string in C++. Although "character(:)" may have been introduced
to minimize the effort for new implementation, I guess jumping to a neat (builtin)
"string" type would have been much cleaner in various syntax.

But the real need is not whether "string" is provided as a builtin type, but that
the user can use it (with convenient methods) out-of-the-box (= directly, easily)
for string handling. So, if the standard (or de fact or semi-standard) library
provides an anlog for this (e.g. type(string_t)), I believe it would also be extremely
useful.

For example, I have just looked at this Q/A:
https://stackoverflow.com/questions/47742535/split-fortran-character-string-of-unknown-length-but-known-substring-format-by-c

and I think it would be very nice if we could write this as, e.g.:

program main
use stdlib !! or gnulib or gfortlib if provided as an add-on for gfortran
implicit none
type( string_t ) :: inpstr
type( string_t ), allocatable :: words( : )

inpstr = "04-DEC-2015,10-DEC-2015,23-DEC-2015,25-DEC-2015"
words = inpstr % split( "," )

do i = 1, size( words )
print *, "-", words( i )
enddo
end

Ian Harvey

unread,
Dec 11, 2017, 6:25:39 AM12/11/17
to
On 11/12/2017 5:28 AM, spectrum wrote:
> On Monday, October 30, 2017 at 7:37:19 PM UTC+9, Clive Page wrote:
...
>> (2) Extend this string type to allow a string array in which the elements could have different lengths. The little-used ISO_VARYING_STRING already supports this, indeed it is now one of the few advantages it has over allocatable strings, but I understand that this addendum to the Standard may well be removed from the standard language before long. So having a string array with elements of varying length, is in my opinion, also a rather useful addition to Fortran. My guess is that it would take quite a bit more effort on the part of compiler-writers to implement it. So we probably need a good bit more support to get this considered for Fortran2020.

ISO_VARYING_STRING can be implemented using Fortran 2003 source. Its
status under the standard does not affect the ability for anyone to use
such a source implementation.

A quite fundamental principle in the Fortran language is that elements
within an array may only differ in "value" (as the language defines the
concept of value), not in other characteristics such as type, type
parameters, allocatableness, pointerness, targetness or any of the other
myriad of attributes that the language supports. This principle is
necessary to enable many of the array features of the language that we
all take for granted, it is not there just for fun.

Trying to shoehorn some specific feature into the language against that
principle would be quite difficult.

This is the same principle that prevents arrays of arrays, arrays of
pointers, or other similar concepts that also come up for discussion
here from time to time.

Many of the characteristics/attributes mentioned above, not considered
part of the value of something, can be transformed to be part of the
value of some other thing, through the use of a derived type that holds
appropriate components. VARYING_STRING from ISO_VARYING_STRING is an
example of this, there are many, many, many, many others.

> Thanks very much for pointing to the slides, which is a 1e300 % more complete
> version of what I wished to write :-)
>
> Indeed, I feel "character(:), allocatable" is somewhere between primitive char
> types in C and std::string in C++. Although "character(:)" may have been introduced
> to minimize the effort for new implementation, I guess jumping to a neat (builtin)
> "string" type would have been much cleaner in various syntax.
>
> But the real need is not whether "string" is provided as a builtin type, but that
> the user can use it (with convenient methods) out-of-the-box (= directly, easily)
> for string handling. So, if the standard (or de fact or semi-standard) library
> provides an anlog for this (e.g. type(string_t)), I believe it would also be extremely
> useful.

Did you look up part two of the standard? What you are asking for above
pretty much already exists.

ISO_VARYING_STRING is far from perfect, it dates from a time when the
base language was more limited (~Fortran 95), interfaces of its
procedures don't take advantage of more recent language features like
type parameters, allocatable dummy arguments, allocatable function
results, user defined io, etc., but it is a reasonable starting point.

Wrap an allocatable character scalar in a VARYING_STRING like derived
type, and most of the discussion in this thread is moot. Language wise,
you can do that today (compiler wise, it is a different story).

> For example, I have just looked at this Q/A:
> https://stackoverflow.com/questions/47742535/split-fortran-character-string-of-unknown-length-but-known-substring-format-by-c
>
> and I think it would be very nice if we could write this as, e.g.:
>
> program main
> use stdlib !! or gnulib or gfortlib if provided as an add-on for gfortran
> implicit none
> type( string_t ) :: inpstr
> type( string_t ), allocatable :: words( : )
>
> inpstr = "04-DEC-2015,10-DEC-2015,23-DEC-2015,25-DEC-2015"
> words = inpstr % split( "," )

You don't want that. The type bound procedure reference strongly
implies that the type is extensible, and it makes very little sense for
a string type to be extensible. Implementation wise, a type bound
procedure call may also incur the overhead of construction of a
descriptor for the argument.

words = split(inpstr, ",")

Clive Page

unread,
Dec 11, 2017, 11:32:49 AM12/11/17
to
On 11/12/2017 11:25, Ian Harvey wrote:
> Wrap an allocatable character scalar in a VARYING_STRING like derived type, and most of the discussion in this thread is moot.

Most of perhaps, but not all. What one still can't do (or at least I haven't worked out how) is something like this:

character(:), allocatable :: mystring
! ...
read(unit,format) mystring

And then have the length of mystring set by the number of characters ingested by the read statement. I think that would be a rather useful facility.

--
Clive Page

Ian Harvey

unread,
Dec 11, 2017, 2:43:12 PM12/11/17
to
That is a bare allocatable character variable. Wrap that allocatable character as a component in a derived type (like the f95+allocatable tr implementation of varying_string wraps an allocatable array), then you can use uddtio and the like to do all sorts of fancy stuff automatically in io statements. I have strong recollections of posting examples of such here in the last year or so.

Writing that code is easy, finding a compiler that runs such code correctly is a different matter.

Jos Bergervoet

unread,
Dec 11, 2017, 2:50:36 PM12/11/17
to
Or it could be:
read(*,*,terminator=",") mystring

(where terminator could perhaps also be a set of characters
to be treated as such, and the default would then perhaps be
a blank..)

In addition, I think that having to wrap anything into a
derived type to get something as basic as a strin type, is
a bit like having to wrap two real numbers into a type to
get a complex.

The fact that everyone understands what Ian means with "a
VARYING_STRING like derived type", means that this is in
fact an archetype that should have been in the language!

--
Jos

Jos Bergervoet

unread,
Dec 11, 2017, 2:54:34 PM12/11/17
to
On 12/11/2017 8:43 PM, Ian Harvey wrote:
> That is a bare allocatable character variable. Wrap that allocatable character as a component in a derived type (like the f95+allocatable tr implementation of varying_string wraps an allocatable array), then you can use uddtio and the like to do all sorts of fancy stuff automatically in io statements. I have strong recollections of posting examples of such here in the last year or so.
>
> Writing that code is easy, finding a compiler that runs such code correctly is a different matter.

Yes, that was a problem when I last tried it. But wasn't it
also *still* not possible to have:
write(*,*) "The answer is ", mystring, ". Does that make sense?"

because uddtio did not allow any default behavior for "*" format?

--
Jos


Ian Harvey

unread,
Dec 11, 2017, 3:15:55 PM12/11/17
to
UDDTIO can be used for list directed ouput and input. What do you think is the problem?

FortranFan

unread,
Dec 11, 2017, 4:44:37 PM12/11/17
to
On Monday, December 11, 2017 at 6:25:39 AM UTC-5, Ian Harvey wrote:

> ..
>
> You don't want that. The type bound procedure reference strongly
> implies that the type is extensible, and it makes very little sense for
> a string type to be extensible. Implementation wise, a type bound
> procedure call may also incur the overhead of construction of a
> descriptor for the argument.
> ..


The above comments by @Ian Harvey in this thread are rather didactic, presumptuous, and misleading enough that they hurt any healthy engagement and discussion of any future revisions to the language related to strings.

For Fortran, it really comes down to: WHAT FOR and FOR WHOM?

Is the advancement of Fortran only intended for the likes of *certain users* and them only and that too only for the applications they think the language makes sense to use?

And is the performance and/or the sense of appropriateness by these users the only items of consideration for any revisions to the language?

Or is it possible to make it a *wider tent* and consider the needs of a broader set of users who may also seek convenience and ease-of-use, consistency, improved readability and packaging in terms of software design, etc.?

@spectrum,

On Sunday, December 10, 2017 at 2:59:00 PM UTC-5, spectrum wrote:
> ..
> and I think it would be very nice if we could write this ..

Please note it's trivially possible now to write in Fortran what you indicate will be "very nice" - please see below and try it out - you can make it a lot more efficient, what's below is just a quick first-pass attempt:

--- begin code ---
module string_m

implicit none

private

type, public :: string_t
private
character(len=:), allocatable :: m_s
contains
private
procedure, pass(this) :: assign_s
procedure, pass(this) :: write_s
procedure, pass(this), public :: split
procedure, pass(this), public :: s => get_s
generic, public :: assignment(=) => assign_s
generic, public :: write(formatted) => write_s
end type string_t

contains

elemental subroutine assign_s( this, rhs )

class(string_t), intent(inout) :: this
character(len=*), intent(in) :: rhs

this%m_s = rhs

return

end subroutine assign_s

subroutine write_s(this, lun, iotype, vlist, istat, imsg)

! argument definitions
class(string_t), intent(in) :: this
integer, intent(in) :: lun
character(len=*), intent(in) :: iotype
integer, intent(in) :: vlist(:)
integer, intent(out) :: istat
character(len=*), intent(inout) :: imsg

! local variable
character(len=9) :: sfmt

sfmt = "(A)"
if ( (iotype == "DT").and.(size(vlist) >= 1) ) then

! vlist(1) to be used as the field width of the character component.
write(sfmt,"(A,I2,A)", iostat=istat, iomsg=imsg ) "(A", vlist(1), ")"
if (istat /= 0) return

end if

write(lun, fmt=sfmt, iostat=istat, iomsg=imsg) this%m_s

return

end subroutine write_s

elemental function get_s( this ) result( s )

class(string_t), intent(in) :: this
! Function result
character(len=len(this%m_s)) :: s

s = this%m_s

end function get_s

subroutine split( this, token, strings )

! Argument list
class(string_t), intent(in) :: this
character(len=1), intent(in) :: token
type(string_t), allocatable, intent(out) :: strings(:)

! Local variables
integer :: numstrings
integer :: idx_token

if ( allocated(this%m_s) ) then
if ( len(this%m_s) <= 1) then
return
end if
else
return
end if

numstrings = numtoken( this%m_s, token ) + 1

if ( numstrings > 0 ) then
allocate( strings(numstrings) )
idx_token = len( this%m_s )
do while (numstrings > 1 )
call crop_right( this%m_s(1:idx_token), token, idx_token, strings(numstrings)%m_s )
numstrings = numstrings - 1
end do
! Fill left-most string
strings(numstrings)%m_s = this%m_s(1:idx_token)
end if

return

end subroutine

function numtoken( string, token ) result( num )
character(len=*), intent(in) :: string
character(len=1), intent(in) :: token
integer :: num
num = count( transfer(source=string, mold="a", size=len(string)) == token)
return
end function

subroutine crop_right( string, token, idx_token, crop )

! Argumwnt list
character(len=*), intent(in) :: string
character(len=1), intent(in) :: token
integer, intent(inout) :: idx_token
character(len=:), allocatable, intent(out) :: crop

idx_token = scan(string, token, back=.true. )
if ( idx_token == 0 ) return
if ( len(string) > idx_token ) then
crop = string( idx_token+1: )
end if

idx_token = idx_token - 1

return

end subroutine crop_right

end module string_m

program main

use, intrinsic :: iso_fortran_env, only : compiler_version

use string_m, only : string_t

implicit none

type( string_t ) :: inpstr
type( string_t ), allocatable :: words( : )
integer :: i

print *, "Compiler Version: ", compiler_version()

inpstr = "04-DEC-2015,10-DEC-2015,23-DEC-2015,25-DEC-2015"
call inpstr%split( ",", words )

do i = 1, size( words )
print *, "-", words( i )
end do

end
--- end code ---

Upon execution, you should get something along the following lines assuming the processor you employ supports the standard language features:

--- begin output ---
Compiler Version: GCC version 8.0.0 20171112 (experimental)
- 04-DEC-2015
- 10-DEC-2015
- 23-DEC-2015
- 25-DEC-2015
--- end output ---

You have hit the nail on the head with your point about "the user can use it (with convenient methods) out-of-the-box (= directly, easily) for string handling. So, if the standard (or de fact or semi-standard) library provides an anlog for this (e.g. type(string_t)), I believe it would also be extremely useful. "

I agree with you wholeheartedly, that for many containers and algorithms, there is a need for some form of *standard* solution of users, the precise nature of which is something that can be deferred to the Fortran standards committee, but god damnit, it's high time something got done about this. I hope this will get through the thick-heads of the committee members.

For improved string handling, an infinite number of libraries can be put together - the above snippet being just a small illustration - using the existing facilities in the language. But that is besides the point. For what is a long-solved computer science problem, what is needed are standardized interfaces. Consider your SPLIT example: you can call it that, I can call it TOKENIZE, someone else PARSE, others DECODE, and so forth; some can make it a FUNCTION subprogram, others a SUBROUTINE subprogram. For a variety of situations, it will help if the users have the option to employ *a standard way* of doing things that they can be sure will work with all the conforming compilers.

Just as the Fortran standard extends its foot into the math business with a standardized interface for intrinsic procedures of DOT_PRODUCT and MATMUL, etc., it will be very useful - as you indicate - for the standard to have a set of *intrinsic derived types with bound procedures* for certain commonly used aspects such as 'strings'.

Separately, I find comments such as "You don't want that. The type bound procedure reference strongly implies that the type is extensible, and it makes very little sense for a string type to be extensible. Implementation wise, a type bound procedure call may also incur the overhead of construction of a
descriptor for the argument. " rather ill-informed and narrow-minded too:

Note many programming paradigms and language developments have found the concept of SEALED CLASSES quite practical and useful and have utilized the concepts in their 'string class' design:
https://msdn.microsoft.com/en-us/library/system.string(v=vs.110).aspx
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/sealed

Fortran can consider bringing in SEALED or NON_OVERRIDABLE attribute into play for its derived types and perhaps stipulate its intrinsic 'string_t' derived type too be sealed. These are all topics that should be actively discussed on forums such as comp.lang.fortran and not be simply misdirected by a few here with these "you don't want that" remarks or "there be ghosts and gremlins lurking around" type of fear-mongering whenever anyone has a suggestion.

The point about the overhead is mostly about the performance aspect. As I indicate above, users will have situations where other factors such as convenience and ease-of-coding, packaging, readability, etc. matter more and under the circumstances they may have prefer the approach you indicate with 'inpstr%split( .. '. Why should their points-of-view not be considered in the evolution of Fortran? Other languages such as C++, Python, Microsoft .NET, etc. constantly strive to serve its practitioners, why not Fortran?

I find users are quite discerning in terms of what they seek in Fortran and strings are at the very core of the needs.

Something is "deeply rotten in the state of" Fortran that the regular pleas of many Fortran users are being ignored when it comes to strings.

Cheers,

spectrum

unread,
Dec 11, 2017, 6:55:36 PM12/11/17
to
Dear Ian and FortranFan,

Thanks very much for various comments, and I'm afraid that my sentence
was again not clear enough and so misleading... (I tried to be as clear as
possible, but well, a natural language is always difficult (maybe than Fortran :)

Specifically, in this part,

> For example, I have just looked at this Q/A:
> https://stackoverflow.com/questions/47742535/split-fortran-character-string-of-unknown-length-but-known-substring-format-by-c
>
> and I think it would be very nice if we could write this as, e.g.:
>
> program main
> use stdlib !! or gnulib or gfortlib if provided as an add-on for gfortran
> implicit none
> type( string_t ) :: inpstr
> type( string_t ), allocatable :: words( : )
>
> inpstr = "04-DEC-2015,10-DEC-2015,23-DEC-2015,25-DEC-2015"
> words = inpstr % split( "," )

I know that we can write this way by using the current facility of Fortran
(indeed, there are several string libraries like Stefano's StringiFor).
What I wanted to write is something like:

"I think it would be very nice if any user could use a convenient string functionality
(without writing it by themselves, just via 'use stdlib' etc)" as in the following...

In other words, my emphasis is more on this line (in a sense):

use stdlib !! or gnulib or gfortlib if provided as an add-on for gfortran

If such string type is provided as builtin, it would be also be nice (even nicer),
but my concern is that probably the work amount (efforts) for the Standard
committee grows more and more this way (if the user requests many library-ish
functionality). That is, my wish (for general Fortran users (*)) is like:

* There is an easy access to using a convenient functionality or library
(without writing them by him/herself);
* The library is ideally some semi- or de fact standard position so that
many contributors can contribute/join and participate in its maintenance;
* The new functionality etc can be introduced as "experimental"; and
* Something that turns out to be very useful will be considered as candidates
for inclusion to the next standard,
* and so on...

# In short, it's a Fortran counterpart of Boost :)

(*) In my sentence, I suppose "general Fortran users" to be a standard-level
student without computer science knowledge, nor interest in CS, but
just want to use Fortran as a tool to calculate something (for their applications).

And as for interface, yes, I totally agree that it does not have to be object methods.
It could be usual functions or subroutines if they are more performant.

call split( inpstr, words, sep="," ) !! where sep is optional

seems also be nice. (I remember Julia takes this form, because it does not
use the usual OO model.)

---
@FortranFan

Thanks very much for an example of the implementation! I will try it later :)

Wolfgang Kilian

unread,
Dec 12, 2017, 4:30:17 AM12/12/17
to
Yes.

I have to admit that I consider this not a fundamental principle, but a
fundamental limitation of the language (syntax and semantics), that
really hurts a lot when dealing with data management. I know that there
is a straightforward solution which comes down to introducing a wrapper
type.

If there could be an alteration to the fundamentals, that eliminates the
need for *explicit* wrapper types (such as iso_varying_string) on the
user side, Fortran would go a long way to becoming a general-purpose
language again. As a matter of convenience.

I would prefer a solution that is as generic as you point out above, not
just a solution for strings, for instance.

(If generic containers get support in F202x, that would help, but still
the container most likely would have to appear explicitly in syntax. In
that case, iso_varying_string would stay, in some disguise.)

-- Wolfgang


Ian Harvey

unread,
Dec 12, 2017, 5:29:39 AM12/12/17
to
On 12/12/2017 7:00 PM, Wolfgang Kilian wrote:
> On 11.12.2017 12:25, Ian Harvey wrote:
...
>> A quite fundamental principle in the Fortran language is that elements
>> within an array may only differ in "value" (as the language defines
>> the concept of value), not in other characteristics such as type, type
>> parameters, allocatableness, pointerness, targetness or any of the
>> other myriad of attributes that the language supports.  This principle
>> is necessary to enable many of the array features of the language that
>> we all take for granted, it is not there just for fun.
>>
>> Trying to shoehorn some specific feature into the language against
>> that principle would be quite difficult.
>>
>> This is the same principle that prevents arrays of arrays, arrays of
>> pointers, or other similar concepts that also come up for discussion
>> here from time to time.
>
> Yes.
>
> I have to admit that I consider this not a fundamental principle, but a
> fundamental limitation of the language (syntax and semantics), that
> really hurts a lot when dealing with data management.  I know that there
> is a straightforward solution which comes down to introducing a wrapper
> type.
>
> If there could be an alteration to the fundamentals, that eliminates the
> need for *explicit* wrapper types (such as iso_varying_string) on the
> user side, Fortran would go a long way to becoming a general-purpose
> language again.  As a matter of convenience.

I don't think you can eliminate the need for some sort of container
type, but I think there is quite a bit that could be done to reduce the
source cost of having to have such a wrapper, for both its definition
and at points of use.

But, we've been here before.

Wolfgang Kilian

unread,
Dec 12, 2017, 9:53:57 AM12/12/17
to
Precisely. (Both comments)

-- Wolfgang

Bálint Aradi

unread,
Dec 12, 2017, 10:57:17 AM12/12/17
to
Dear spectrum,

> And as for interface, yes, I totally agree that it does not have to be object
> methods.
> It could be usual functions or subroutines if they are more performant.
>
> call split( inpstr, words, sep="," ) !! where sep is optional

As a general note: If you make it as a generic function with appropriate interface definitions, as suggested by Ian Harvey, you will have a very uncomfortable side effect: if you try to call the routine with the wrong arguments (e.g. passing a single string for words, instead of an allocatable array), the compiler won't be very useful in telling what you are doing wrong. It will just say, something like: 'Could not find a matching subroutine for the generic split()'. On the other hand, if it is defined as a type bound procedure, the error message will exactly tell you, which argument was messed up.

Also, in terms of name spacing, I find type bound procedures more appealing. You just import one name (the name of the type) into your scope and not hundreds of generic function names.

And as for timing, I was not able to find any significant difference between generic functions and type bound procedures, as long as you instantiate a type and not a class of your type. The following program gives very comparable times for both with all 4 different compilers have I tried.

module testmod
implicit none

type :: TStatic
private
integer :: val = 1
end type TStatic

type :: TPoly
private
integer :: val = 1
contains
procedure :: incValue => TPoly_incValue
procedure :: getValue => TPoly_getValue
end type TPoly

interface incValue
module procedure TStatic_incValue
end interface incValue

interface getValue
module procedure TStatic_getValue
end interface getValue

contains

subroutine TStatic_incValue(this, increment)
type(TStatic), intent(inout) :: this
integer, intent(in) :: increment

this%val = this%val + increment

end subroutine TStatic_incValue


function TStatic_getValue(this) result(val)
type(TStatic), intent(in) :: this
integer :: val

val = this%val

end function TStatic_getValue


subroutine TPoly_incValue(this, increment)
class(TPoly), intent(inout) :: this
integer, intent(in) :: increment

this%val = this%val + increment

end subroutine TPoly_incValue


function TPoly_getValue(this) result(val)
class(TPoly), intent(in) :: this
integer :: val

val = this%val

end function TPoly_getValue

end module testmod


program test
use testmod
implicit none

type(TStatic) :: staticInst
type(TPoly) :: polyInst, polyInst2
class(TPoly), allocatable :: classInst

integer :: nCycles
integer :: ii
real :: t1, t2

nCycles = 1000000000 ! 1e9
print '(A,I0)', 'Nr. of iterations:', nCycles
call cpu_time(t1)
do ii = 1, nCycles
call incValue(staticInst, ii)
end do
call cpu_time(t2)
print '(A,T30,I0,F6.2)', 'Static:', getValue(staticInst), t2 - t1

call cpu_time(t1)
do ii = 1, nCycles
call polyInst%incValue(ii)
end do
call cpu_time(t2)
print '(A,T30,I0,F6.2)', 'Polymorhic via type:', polyInst%getValue(), t2 - t1

allocate(classInst, source=polyInst2)
call cpu_time(t1)
do ii = 1, nCycles
call classInst%incValue(ii)
end do
call cpu_time(t2)
print '(A,T30,I0,F6.2)', 'Polymorphic via class:', polyInst%getValue(), t2 - t1

end program test

Jos Bergervoet

unread,
Dec 12, 2017, 4:46:36 PM12/12/17
to
On 12/11/2017 9:15 PM, Ian Harvey wrote:
> UDDTIO can be used for list directed ouput and input. What do you think is the problem?

I don't remember the details of the discussion where it came up.
Maybe it was just about incomplete compiler support. Or perhaps
the standard gives no way to specify the user-defined behavior
for the list-directed case? How do you specify what
read(*,*) mystring
should do?

--
Jos

FortranFan

unread,
Dec 12, 2017, 5:00:16 PM12/12/17
to
On Monday, December 11, 2017 at 6:55:36 PM UTC-5, spectrum wrote:

> .. my concern is that probably the work amount (efforts) for the Standard
> committee grows more and more this way


It should only be expected of the Fortran Standards Committee to do MORE and MORE.

..
> It could be usual functions or subroutines if they are more performant.
>

@spectrum,

Please don't get into "drinking the Kool Aid" that the "usual functions or subroutines .. are more performant." Anyone who tells you such nonsense, ask them first to show you clearly measurable, reproducible, consistent and MEANINGFUL difference in performance for a conceivable task a Fortranner anywhere is likely to attempt: I tell you they are going to fail.
https://en.wikipedia.org/wiki/Drinking_the_Kool-Aid

Secondly, as I stated earlier, there are other aspects besides performance that are relevant to Fortranner users in a variety of instances. Also as Bálint Aradi wrote: "in terms of name spacing .. type bound procedures are more appealing. You just import one name (the name of the type) into your scope"
https://groups.google.com/d/msg/comp.lang.fortran/RAukSLOlmIY/x-KY4mfKBgAJ

FortranFan

unread,
Dec 13, 2017, 12:58:49 AM12/13/17
to
On Tuesday, December 12, 2017 at 4:46:36 PM UTC-5, Jos Bergervoet wrote:

> .. How do you specify what
> read(*,*) mystring
> should do? ..


@Jos Bergervoet,

Re: "How do you specify what read(*,*) mystring should do?", you would need to grok the Fortran standard for list-directed input and write the code with the required logic for defined input. You need to be mindful of sections in the standard that have paragraphs such as:
---------------
the delimiting apostrophes or quotation marks are not required. If the
delimiters are omitted, the character sequence is terminated by the first
blank, comma (if the decimal edit mode is POINT), semicolon (if the decimal
edit mode is COMMA), slash, or end of record; in this case apostrophes and
quotation marks within the datum are not to be doubled.
---------------

Otherwise the Fortran standard provides all the necessary facilities and thankfully a couple of implementations now support it (though with some bugs.

Once you have developed such a string wrapper type, you can write a program such as

--- begin code ---
use, intrinsic :: iso_fortran_env, only : compiler_version
use string_m, only : string_t

type(string_t) :: s

print *, "Compiler Version: ", compiler_version()
read (*,*) s
print *, s

stop

end program
--- end code ---

You can then execute the program output built with a supporting version of gfortran as follows:

--- begin console output ---
Compiler Version: GCC version 8.0.0 20171112 (experimental)
Hello\
Hello
-- end console output ---

Note the backslash in the output above is used as a sequence terminator.

Or with Intel Fortran compiler,
--- begin console output ---
Compiler Version:
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(

R) 64, Version 18.0.1.156 Build 20171018

Hello
Hello
-- end console output ---

Or an apostrophe limited case with either compiler, but shown below with Intel Fortran,
--- begin console output ---
Compiler Version:
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(

R) 64, Version 18.0.1.156 Build 20171018

'Hello World!'
Hello World!
-- end console output ---

The above shown code builds on the wrapper type string_t I posted earlier with defined input included as follows (note UDDTIO is NOT a recognized acronym in the standard; besides UDDTIO sounds superfluous):

--- begin library code ---
module string_m

use, intrinsic :: iso_fortran_env, only : iostat_end, iostat_eor, output_unit, &
iostat_inquire_internal_unit

implicit none

private

type, public :: string_t
private
character(len=:), allocatable :: m_s
contains
private
procedure, pass(this) :: assign_s
procedure, pass(this) :: read_s
procedure, pass(this) :: write_s
procedure, pass(this), public :: split
procedure, pass(this), public :: s => get_s
generic, public :: assignment(=) => assign_s
generic, public :: read(formatted) => read_s
generic, public :: write(formatted) => write_s
end type string_t

contains

elemental subroutine assign_s( this, rhs )

class(string_t), intent(inout) :: this
character(len=*), intent(in) :: rhs

this%m_s = rhs

return

end subroutine assign_s

subroutine read_s(this, lun, iotype, vlist, istat, imsg)

! argument definitions
class(string_t), intent(inout) :: this
integer, intent(in) :: lun
character(len=*), intent(in) :: iotype
integer, intent(in) :: vlist(:)
integer, intent(out) :: istat
character(len=*), intent(inout) :: imsg

!.. Local variables

select case ( iotype )

case ( "DT" )

!.. vlist(1), if present, implies field width
if ( size(vlist) >= 1 ) then
if ( vlist(1) > 0 ) then
this%m_s = repeat( " ", ncopies=vlist(1) )
read( unit=lun, fmt="(a)", iostat=istat, iomsg=imsg ) this%m_s
return
end if
else
call charreader( lun, this%m_s, istat, imsg )
end if

case ( "LISTDIRECTED" )

call charreader( lun, this%m_s, istat, imsg )

case default

end select

return

end subroutine read_s

subroutine charreader( lun, s, istat, imsg )

! Argument list
integer, intent(in) :: lun
character(len=:), allocatable, intent(out) :: s
integer, intent(inout) :: istat
character(len=*), intent(inout) :: imsg

!.. Named constants
character(len=*), parameter :: sfmt = "(*(g0))"
integer, parameter :: IBLANK = iachar( c=" " )
integer, parameter :: IQUOTATION = iachar( c='"' )
integer, parameter :: IAPOSTROPHE = iachar( c="'" )
integer, parameter :: ICOMMA = iachar( c="," )
integer, parameter :: ISEMICOLON = iachar( c=";" )
integer, parameter :: ISLASH = iachar( c="\" )
integer, parameter :: LENCHUNK = 100

!.. Local variables
character(len=LENCHUNK) :: chunk
character(len=256) :: rmsg
character(len=1) :: c
integer :: counter
integer :: rstat
logical :: IsInternalFile
logical :: loadstring
logical :: delimited
logical :: begin_quote
logical :: begin_apos
logical :: end_read

!..
IsInternalFile = .false.
inquire( unit=lun, size=counter, iostat=rstat, iomsg=rmsg )
if ( rstat == iostat_inquire_internal_unit ) then
IsInternalFile = .true.
end if

!.. Read unit character-by-character and load in chunks
counter = 0
loadstring = .false.
delimited = .false.
begin_quote = .false.
begin_apos = .false.
end_read = .false.
loop_read: do

if ( counter == 0 ) chunk = ""
counter = counter + 1

read( unit=lun, fmt="(a1)", iostat=rstat, iomsg=rmsg ) c
slc_1: select case ( rstat )
case ( 0 )
slc_2: select case ( iachar(c) )
case ( IQUOTATION )
if ( counter == 1 ) then
! Beginning delimiter located
delimited = .true.
begin_quote = .true.
! Reset the counter and cycle the read loop
counter = 0
cycle loop_read
else if ( begin_quote ) then
! Matched delimiting quotations, exit read
loadstring = .true.
end_read = .true.
counter = counter - 1
else
chunk(counter:counter) = c
end if
case ( IAPOSTROPHE )
if ( counter == 1 ) then
! Beginning delimiter located
delimited = .true.
begin_apos = .true.
! Reset the counter and cycle the read loop
counter = 0
cycle loop_read
else if ( begin_apos ) then
! Matched delimiting quotations, exit read
loadstring = .true.
end_read = .true.
counter = counter - 1
else
chunk(counter:counter) = c
end if
case ( IBLANK, ISLASH )
if ( .not. delimited ) then
loadstring = .true.
end_read = .true.
counter = counter - 1
else
chunk(counter:counter) = c
end if
case ( ICOMMA )
! to do
chunk(counter:counter) = c
case ( ISEMICOLON )
! to do
chunk(counter:counter) = c
case default
chunk(counter:counter) = c
end select slc_2

if ( counter == LENCHUNK ) loadstring = .true.

case ( iostat_eor, iostat_end, 67 ) ! allowance for Intel Fortran

loadstring = .true.
end_read = .true.
exit slc_1

case default

istat = rstat
imsg = rmsg
end_read = .true.
exit slc_1

end select slc_1

!.. Load string when LENCHUNK have been processed
if ( loadstring ) then
if ( allocated(s) ) then
if (counter > 0) s = s // chunk(1:counter)
else
if (counter > 0) s = chunk(1:counter)
end if
counter = 0
loadstring = .false.
end if

if ( end_read ) exit loop_read

end do loop_read

return

end subroutine charreader
--- end library code ---

Bottomline: any user can conceivably put together a Fortran library for a string 'class' that supports all of the functionality desired by Fortranners for string handling bar @Clive Page's oft-touted substring processing with the exact same ':' based selection as available with the intrinsic CHARACTER type.

But as I mentioned earlier being able to put together such libraries is besides the point. For something as basic as strings, coders just don't want to get tied down with dependencies on library A or B or C, etc. However with intrinsic types such as CHARACTER and given the rest of the language semantics, it's unclear how far Fortran can go toward string handling capabilities with CHARACTER type itself. So an option instead can be to consider with *intrinsic derived types* and the IEC ISO standard can include definitions for them that compilers can implement ala C_PTR intrinsic derived type. Then the Fortranners can have confidence their code using such derived types will work the same with all the conforming compilers. And with SEALED classes, the language can provide an avenue for compilers to include optimizations that deliver high performance with bound procedures of these intrinsic derived types.

But now if the Fortran standards committee will not understand the importance of this and should they fail to deliver a satisfactory solution for something as basic as a string facility yet again i.e., by the time Fortran 202X is finalized, it will be the death knell for Fortran, people might as well give up on it.

Clive Page

unread,
Dec 13, 2017, 4:58:57 AM12/13/17
to
On 13/12/2017 05:58, FortranFan wrote:

[snip hundreds of lines of complicated but ingenious code to prove the point]


> But as I mentioned earlier being able to put together such libraries is besides the point. For something as basic as strings, coders just don't want to get tied down with dependencies on library A or B or C, etc. However with intrinsic types such as CHARACTER and given the rest of the language semantics, it's unclear how far Fortran can go toward string handling capabilities with CHARACTER type itself.

I quite agree - the existing CHARACTER type (probably) can't be enhanced to support all the desirable facilities. I think a new intrinsic STRING type is needed. It was the same in 1977 when the old Hollerith facilities were deemed too awful to be extended, and were instead essentially abandoned.

> But now if the Fortran standards committee will not understand the importance of this and should they fail to deliver a satisfactory solution for something as basic as a string facility yet again i.e., by the time Fortran 202X is finalized, it will be the death knell for Fortran, people might as well give up on it.

I'm not sure I'd go as far as that - I think Fortran might survive in specialist areas, but in my experience rather a lot of programs do involve string-processing, and it would be highly desirable to have modern facilities there. But it would be a lot of work for the Standards bodies and they depend mostly on volunteers. The compiler vendors seem not yet to appreciate the needs in this area.


--
Clive Page

spectrum

unread,
Dec 17, 2017, 6:51:49 PM12/17/17
to
Hi Bálint Aradi,

> As a general note: If you make it as a generic function with appropriate interface definitions, as suggested by Ian Harvey, you will have a very uncomfortable side effect: ...(snip)
>
> Also, in terms of name spacing, I find type bound procedures more appealing. You just import one name (the name of the type) into your scope and not hundreds of generic function names.

Yeah, I agree that the OO approach is very comfortable
because once an object is imported ("use"d), all the associated methods
become automatically accessible without contaminating the current scope,
which is very convenient. On the other hand, if the subroutine
is not specifically bound to a particular type conceptually (e.g., sub( a, b )
that receives 'a' and 'b' on an equal footing with different types for them),
an "outer" function might be more convenient. But as far as I experienced,
heavily overloaded routines are sometimes more difficult to understand
(e.g., to see where it is defined). As for string, it may be more natural to
associate convenient methods to the object itself (if possible).

# But one possible scenario is that the Standards provide only "bare"
builtin string type (without providing convenient methods, to reduce
the work amount). Then, I guess other libraries will need to provide
split( str ) etc as outer routines.

> And as for timing, I was not able to find any significant difference between generic functions and type bound procedures, as long as you instantiate a type and not a class of your type. The following program gives very comparable times for both with all 4 different compilers have I tried.

I think this is good news, meaning that we have various possibilities
for implementation. Thanks for compiler developers for making the compilers
so very efficient...

---
Dear FortranFan,

> Please don't get into "drinking the Kool Aid" ... (snip)
https://en.wikipedia.org/wiki/Drinking_the_Kool-Aid

Yes, thanks very much for your advice. But here, no worry because I will usually
test code performance etc by myself when I really want to use it.
But for general discussion, things are sometimes difficult because the benchmark
results often change depending on compilers, options, environment, etc
(I recently hit such case), also it is pretty difficult to make a very fair/rigorous
comparison sometimes. Nevertheless, any comparison is interesting (as long as
the details are provided).
0 new messages