String partitioning: Split a string to several ones based on a delimiter

2828 views
Skip to first unread message

Mohammad

unread,
Feb 10, 2018, 11:12:11 AM2/10/18
to
Dr Cleve Page has posted a very interesting question several years ago at
http://computer-programming-forum.com/49-fortran/8ca2fb68be175f35.htm

"""
I have a character string character (len=100) :: s which contains a comma separated list of integers (e.g. "1,10,123,15,654,12"). I am trying to write a function that returns an integer array containing the values.
"""
His solution was:

"""
program decodestr
implicit none
character :: string*30 = "1,10,123,15,654,12"
integer :: n, iarray(100)
n = count(transfer(string, 'a', len(string)) == ",")
read(string, *) iarray(1:n+1) ! N+1 because one more int than comma
print *, 'nvalues=', n+1
print '(i10)', iarray(1:n+1)
end program

"""
It works amazingly!!!!!


==============================================
No I get the idea by Cleve and wrote a modern fortran code to split a string to array of substrings delimited by ";"
"""

program splitstr

character(len=100) :: string = 'This;is;a test;hello;world!'
integer :: n
character(80), allocatable :: strarray(:)
n = count(transfer(string, 'a', len(string)) == ";")
allocate(strarray(n+1))
read(string, *) strarray(1:n+1) !N+1 because one more parts than semicolon
print *, 'nvalues=', n+1
print '(a)', strarray(1:n+1)

end program splistr

"""


It works. My question is this correct and how we can have the minimum change to be able to split strings starts with delimiter ;

Nasser M. Abbasi

unread,
Feb 10, 2018, 8:47:44 PM2/10/18
to
Are you saying Fortran does not have a String Split build-in
intrinsic function as many languages these days do.

In Mathematica

str = "1,10,123,15,654,12";
ToExpression@StringSplit[str,","]

{1, 10, 123, 15, 654, 12}

Or in Matlab, it is strsplit(), or Python split(), etc...

These types of operations are so basic and should be
part of any computer language standard library.

--Nasser






Mohammad

unread,
Feb 10, 2018, 11:24:35 PM2/10/18
to
I think there is not an intrinsic one!

Mohammad

unread,
Feb 11, 2018, 2:49:01 AM2/11/18
to
***********
This is a solution while because of two loops is not an elegant solution


subroutine split2array()
! splitstring splits a string to an array of
! substrings based on a selected delimiter
! note any facing space/blank in substrings will be removed

implicit none
character(len=80), allocatable :: strarray(:)
character(len=80) :: string = ";This;is; rose;;great;author;m; t 'my plot'"
integer :: n, m, p
integer :: i, idx
character(len=80):: strtmp
character :: delimiter =';'


! 0. remove initial blanks if any
strtmp=trim (adjustl(string) )

! 1. count the number substrings separated by delimiter
n = count( [ (strtmp(i:i), i=1, len_trim(strtmp)) ] == delimiter)
! 4. allocate the output string array
allocate(strarray(n+1))

m=1
do i=1, n
idx=index(strtmp(m:),delimiter)
strarray(i) = adjustl( strtmp(m:m+idx-2) )
m = m + idx
end do
strarray(n+1)=adjustl(strtmp(m:) )


! 6. pring the results
print*, '"'//string//'"'
print *, 'nvalues=', n+1
print '(a)', strarray(1:n+1)
print*,'----------------------'

end subroutine split2array

Stefano Zaghi

unread,
Feb 11, 2018, 5:18:49 AM2/11/18
to
Dear Mohamed,

I have not the time to review your split method, I am very sorry. For comparison you can read some of mine, but I had not taken particular care about efficiency only on the correct results

https://github.com/szaghi/StringiFor

My best regards

Stefano Zaghi

unread,
Feb 11, 2018, 5:20:21 AM2/11/18
to
I am sorry for my typos, Mohammad I had misspelled your name, my bad.

Cheers

Mohammad

unread,
Feb 11, 2018, 6:45:38 AM2/11/18
to
Hello Stefano,
Many thanks, I got the great library StringiFor and it is a big help!

Cheers

Mohammad

unread,
Feb 11, 2018, 6:45:54 AM2/11/18
to
On Sunday, February 11, 2018 at 1:50:21 PM UTC+3:30, Stefano Zaghi wrote:
> I am sorry for my typos, Mohammad I had misspelled your name, my bad.
>
> Cheers

No problem!

Thomas Koenig

unread,
Feb 11, 2018, 10:17:12 AM2/11/18
to
Nasser M. Abbasi <n...@12000.org> schrieb:

> Are you saying Fortran does not have a String Split build-in
> intrinsic function as many languages these days do.
> In Mathematica
>
> str = "1,10,123,15,654,12";
> ToExpression@StringSplit[str,","]

For comma separation, quite easy:

program main
integer, dimension(6) :: a
character(len=:), allocatable :: str
str = "1,10,123,15,654,12"
read (unit=str,fmt=*) a
print *,a
end program main

FortranFan

unread,
Feb 11, 2018, 10:27:37 AM2/11/18
to
On Saturday, February 10, 2018 at 11:12:11 AM UTC-5, Mohammad wrote:
> .. wrote a modern fortran code to split a string to array of substrings delimited by ";"
..

On Sunday, February 11, 2018 at 2:49:01 AM UTC-5, Mohammad wrote:

> .. because of two loops is not an elegant solution
>
..> character(len=80), allocatable :: strarray(:)
> character(len=80) :: string = ";This;is; rose;;great;author;m; t 'my plot'"
..

@Mohammad,

You may want to review this very recent thread on this forum with very similar discussion topics:
https://groups.google.com/d/msg/comp.lang.fortran/RAukSLOlmIY/3-XiiW9lAAAJ

In your original post, you mentioned 'modern Fortran' and subsequently you referred to "elegant solution". Please note then working with fixed-length objects of CHARACTER intrinsic type is going to prove limiting and unfortunately it won't jive with any sense of modernity nor elegance.

Given wonderful efforts such as StringiFor by Stefano Zaghi, users can generally benefit from arriving at a state as shown below which will be somewhat equivalent to what they will see in other languages and compute environments: note there is no need for a coder to deal with string lengths.

type( string_t ) :: string
type( string_t ), allocatable :: substrings( : )

string = ";This;is; rose;;great;author;m; t 'my plot'"
call string%split( ";", substrings )

and should they then output the parsed result line by line, they will see:

This
is
rose

great
author
m
t 'my plot'

The above is possible given existing facilities in the language, what Fortranners can benefit from is a standardized 'string' type since it is such a basic aspect of computing.

dpb

unread,
Feb 11, 2018, 12:40:04 PM2/11/18
to
For all-numeric, list-directed internal i/o is indeed handy.

And you can often do a delimiter substitution to turn the one given into
comma and use same trick...

Doesn't help with mixed types or strings, though...

--



Mohammad

unread,
Feb 11, 2018, 3:18:11 PM2/11/18
to
@FortranFan
Many thanks for your detailed explanation and example!
I got the point.

Cheers

Mohammad

unread,
Feb 11, 2018, 3:22:26 PM2/11/18
to
@Thomas
I tried to use this for string and different delimiters and it failed.
for numeric data and common delimiters, yes it works!.
By the way I used the StringiFor and it does the job simply!

Gary Scott

unread,
Feb 11, 2018, 5:36:02 PM2/11/18
to
comma delimited read is kinda trivial, but then so are keyword=
delimited reads kinda trivial. Still I wish there was a
generic/templated built in function for parsing such markup, that
included error detection and reporting on an item by item basis. You
can markup a namelist internal read and get a really handy parser
function, but the error reporting functionality is really limited.

Beliavsky

unread,
Feb 11, 2018, 8:17:53 PM2/11/18
to
Usually you don't know into how many strings the initial string will be split. Your example assumes you do know.

Stefano Zaghi

unread,
Feb 11, 2018, 11:49:22 PM2/11/18
to
Dear Gary,

I have made a small library to parse INI like "markup" that has some errors handling facility (it could be easily improved)

https://github.com/szaghi/FiNeR

My best regards.

Clive Page

unread,
Feb 12, 2018, 5:07:06 AM2/12/18
to
On 12/02/2018 01:17, Beliavsky wrote:
> Usually you don't know into how many strings the initial string will be split. Your example assumes you do know.

Indeed. The real problem, it seems to me, is that you really want to have a split_string function which splits on comma (or some other separator) into an array of sub-strings, when you don't know in advance how many there will be.

Many of us have now got used to using variable length strings (i.e. allocatable length ones) but these really only work with scalars. If you have an array as the output of the split_string function then all elements have to have the same length. This is, at present, a fundamental limitation of Fortran. When I have discussed it with experienced people involved in drafting new Standards they have told me that it would be extremely difficult to change it.

My conclusion is that Fortran needs a new string type, where not only do scalars have variable length, but so does each element of a string array. Of course you can invent your own but there are some things that are pretty hard to do with a derived type. And I don't think that we should expect each Fortran programmer to have to hand-craft their own string type in order to implement something as basic as a split_string function. But that's just my opinion.


--
Clive Page

Thomas Koenig

unread,
Feb 12, 2018, 5:35:56 AM2/12/18
to
Beliavsky <beli...@aol.com> schrieb:
The following code took me less than five minutes to write:

module split
implicit none
character, parameter :: sep = ','
contains
subroutine split_read_int(str, a)
character(len=*), intent(in) :: str
integer, dimension(:), allocatable :: a
integer :: i,n
n = 1
do i=1, len(str)
if (str(i:i) == sep) n = n + 1
end do
allocate (a(n))
read (unit=str,fmt=*) a
end subroutine split_read_int
end module split

program main
use split
implicit none
integer, allocatable, dimension(:) :: a
character(len=50) :: str = "1,24,5,6"
call split_read_int(str, a)

Mohammad

unread,
Feb 12, 2018, 6:34:11 AM2/12/18
to
@Cleve
Agree! Yes, a new type of data say string is required to not break the old libraries and previous code and to empower the Fortran with string processing capability.

By the way, the developed libraries like StringiFor are of great help.

Mohammad

unread,
Feb 12, 2018, 7:09:22 AM2/12/18
to
@Thomas
Thank you! The problem is blank or white spaces which in the read from internal files are treated like delimiter!

Stefano Zaghi

unread,
Feb 12, 2018, 7:35:08 AM2/12/18
to
Dear Thomas,

I agree with you that this was a simple task achieved in few minutes, but it results also into a somehow not flexible/robust method, i.e. it does not work with "=" as separator or with real values. I agree with Clive and others, Fortran standard is lacking when string support is concerned.

My best regards.

Thomas Koenig

unread,
Feb 12, 2018, 9:09:00 AM2/12/18
to
Stefano Zaghi <stefan...@gmail.com> schrieb:
> Il giorno lunedì 12 febbraio 2018 11:35:56 UTC+1, Thomas Koenig ha scritto:
>> Beliavsky <beli...@aol.com> schrieb:
>> > On Sunday, February 11, 2018 at 10:17:12 AM UTC-5, Thomas Koenig wrote:
>> >> Nasser M. Abbasi <n...@12000.org> schrieb:
>> >>
>> >> > Are you saying Fortran does not have a String Split build-in
>> >> > intrinsic function as many languages these days do.
>> >> > In Mathematica
>> >> >
>> >> > str = "1,10,123,15,654,12";
>> >> > ToExpression@StringSplit[str,","]
>> >>
>> >> For comma separation, quite easy:
>> >>
>> >> program main
>> >> integer, dimension(6) :: a
>> >> character(len=:), allocatable :: str
>> >> str = "1,10,123,15,654,12"
>> >> read (unit=str,fmt=*) a
>> >> print *,a
>> >> end program main
>> >
>> > Usually you don't know into how many strings the initial string will be split. Your example assumes you do know.
>>
>> The following code took me less than five minutes to write:

[...]

> Dear Thomas,
>
> I agree with you that this was a simple task achieved in few
> minutes, but it results also into a somehow not flexible/robust
> method, i.e. it does not work with "=" as separator

See below.

>or with real
> values.

Now, that is _really_ easy to change :-)

>I agree with Clive and others, Fortran standard is lacking
>when string support is concerned.

Maybe.

The code below took a little longer to write, but that was mostly
because I wanted to have the lookup table as a parameter for reasons
of efficiency, and had to make sure of the MERGE syntax.

Now, as an excercise for the reader, modify this so that
it takes an optional argument to indicate if several adjacent
separators should be counted a a single one.

module split
implicit none
character(len=2), parameter :: sep = ",="
integer, private :: ii
logical, parameter, private, dimension(0:255) :: &
fld = merge(.true.,.false., [(index(sep,achar(ii))>0,ii=0,255)])
private :: is_sep
contains
subroutine split_read_int(str, a)
character(len=*), intent(in) :: str
integer, dimension(:), allocatable :: a
integer :: n,i
integer :: from, to, mylen
n = 1
mylen = len_trim(str)
do i=1, mylen
if (is_sep(str(i:i))) n = n + 1
end do
allocate (a(n))
n = 1
from = 0
to = 1
do while (to <= mylen)
if (is_sep(str(to:to))) then
read (unit=str(from+1:to-1),fmt=*) a(n)
n = n + 1
from = to
end if
to = to + 1
end do
read (unit=str(from+1:mylen),fmt=*) a(n)
end subroutine split_read_int

logical function is_sep(c)
character(len=1), intent(in) :: c
is_sep = fld(ichar(c(1:1)))
end function is_sep
end module split

program main
use split
implicit none
integer, allocatable, dimension(:) :: a
character(len=50) :: str = "1,24,5=6"
integer :: i

Stefano Zaghi

unread,
Feb 12, 2018, 9:26:19 AM2/12/18
to
Dear Thomas,

this is not new for me, it is much similar to what I did in my string implementation (https://github.com/szaghi/StringiFor), but it is simply a "thing" that other modern languages have built-in and even much more powerful, e.g. I have implemented many methods (https://github.com/szaghi/StringiFor#auxiliary-methods) that Python offers out-of-the-box. Writing 10 sloc is always 10 times more than writing 1. To me, Fortran standard committee has under-estimated the relevance that a good string support could add to the language for all the uses other than pure-numbers-crunching.

My best regards.
Reply all
Reply to author
Forward
0 new messages