Inspired by some of the recent threads about allocatable deferred length
character, I have updated the implementation of ISO_VARYING_STRING that
I occasionally use to take advantage of more of Fortran 2003.
The sources is at
http://www.megms.com.au/download/aniso_varying_string.f90
Some documentation is at
http://www.megms.com.au/aniso_varying_string.htm
This implementation was originally the one by Rich Townsend, using a
deferred size allocatable character array to store the character data of
the string, but I have changed this to an allocatable deferred length
character scalar (so all bugs are hence my fault).
This change simplifies many of the operations associated with the type.
The allocatable deferred length scalar component is also publicly
accessible, which permits things like substring operations directly on
that component and use of the structure constructor for the type.
But here is where the real fun begins... I have also provided defined
input and output for the type.
Unformatted input/output simply reads/writes the string length as a
default integer followed by the character data of the string.
Formatted defined input and output for list directed and namelist
formatting are implemented to mirror that of list directed and namelist
formatting of the intrinsic character type, including the requirement
that namelist input be delimited. Explicitly formatted input, without
the optional character literal in the format specifier, also behaves
similarly to list directed input, while explicitly formatted output (the
character literal and v_list array must not be present in the specifier)
behaves just like the A edit descriptor.
For explicitly formatted input I have also allowed the behaviour to be
modified by the contents of the character literal following the DT edit
descriptor (the v_list array is not used and must not be present). In
the face of the practically infinite range of possibilities for how
people might want to read character data from a file into a string, I
found the "design" of these modifiers to be a bit arbitrary. Perhaps
readers have some better suggestions.
Some examples of the latter, assuming the variables starting with `vs`
are declared of type `varying_string`, and with the example record
delimited with backticks:
- A CSV-like record with a known number of fields, such as:
`noblanks, leading blanks,trailing blanks ,"delimited,value"`
could be read, perhaps with use of Fortran 2008's unlimited
format count:
READ (unit, "(*(DT'comma,noskipblank',:,1X))") &
vs1, vs2, vs3, vs4
and would result in:
vs1 = 'noblanks'
vs2 = ' leading blanks'
vs3 = 'trailing blanks '
vs4 = 'delimited,value'
where comma specifies that, in the absence of delimited
input, input for a value is terminated by a comma and
`noskipblank` specifies that leading blanks of a value
are significant.
- An entire line, such as:
` "An entire "" line" `
could be read:
READ (unit, "(DT'eor,nodelimited')") vs
and would result in:
vs = '"An entire "" line" '
where `eor` specifies that, in the absence of delimited
input, input for a value is terminated by the end of
record, and `nodelimited` suppresses consideration of
delimited input. Without specification to the contrary,
leading blanks in the value are not considered
significant.
If nodelimited was dropped from the character literal in
the format specification, the result would be:
vs = 'An entire " line'
i.e. the input is take to be the delimited value only,
with the usual character literal conventions for how
doubled delimiters are treated as a single delimiter
inside the value.
- Classic fixed width input is supported using the `fixed`
modifier. The letter characters out of the following:
`123abcdef456`
could be read:
READ (unit, "(3X,DT'fixed(6)')") vs
~~~
I am not sure whether my implementation of formatted defined
input/output is correct with respect to questions like "when
should/shall (answers might differ between those two) the defined input
procedure signal an end-of-record condition". For fixed width input
this might be obvious (the combination of format spec and item
*required* more characters, but there were no more in the record, so
defined input complains), but for varying length data, where end of
record might well be a valid indication of the end of the value, it is
not so clear. I've tended to go with the option of suppressing
end-of-record conditions if I got any data I considered useful, but I am
not sure if that is the right thing to do. I was also a bit uncertain
about how defined input was supposed to function in the context of list
directed and namelist output - where does responsibility sit for things
like skipping whitespace, separator characters, repeat specifications, etc.
Unfortunately I ran into some compiler issues here (using ifort 16.0.2)
that mean I cannot fully test/use this in anger yet (including for the
examples above), but I'm interested in any comments in the meantime.