I 'd like to know if there is any way to read entirely an ASCII file,
to store data in a character buffer and rewrite data on disc ?
With the following subroutines, EOR are not conserved :
subroutine read_file_to_buffer(filename, buffer)
character(len=*), intent(in) :: filename
character, dimension(:), allocatable, intent(inout) :: buffer
integer :: err, numfich, file_size, i
character :: c
numfich = 20
file_size = 0
open(unit=numfich, file=filename, &
action="read", form="formatted", status="old")
do
read(numfich, '(a1)', advance="no", iostat=err) c
if (err == iostat_end) exit
file_size = file_size + 1
enddo
allocate(buffer(file_size))
rewind(numfich)
i = 1
do
read(numfich, '(a1)', advance="no", iostat=err) c
if (err == iostat_end) exit
if (err == iostat_eor) then
write(buffer(i), *)
else
write(buffer(i), '(a1)') c
endif
i = i + 1
enddo
close(numfich)
end subroutine read_file_to_buffer
subroutine write_buffer_to_file(filename, buffer)
character(len=*), intent(in) :: filename
character(len=*), intent(in) :: buffer
integer :: err, numfich, file_size, i
character :: c
file_size = 0
numfich = 20
print *, "File name :: ", filename
open(unit=numfich, file=filename, &
action="write", form="formatted", status="new")
write(numfich, *) buffer
close(numfich)
end subroutine write_buffer_to_file
Is there any idea ?
Thanks a lot,
Cyril.
You might like to change
write(buffer(i), *)
to
write(buffer(i), '(/)')
Relying on list-directed I/O for specific requirements is always to be
avoided.
Regards,
Mike Metcalf
You should store the end-of-records as well in the array buffer.
Whatever characters indicate the end-of-record (or end-of-line) are
consumed by the READ statement and used to flag the condition "end-of-
record".
So, these characters never make it into the buffer.
You could use unformatted (!) stream I/O instead to absorb all
characters
(essentially as individual bytes). Pretty much what you have already,
but
with access='stream' and no formats in the reads.
Regards,
Arjen
> I 'd like to know if there is any way to read entirely an ASCII file,
> to store data in a character buffer and rewrite data on disc ?
>
> With the following subroutines, EOR are not conserved :
[code elided]
Your code mostly does what I would recommend, at least for f90/f95. See
a bit below for an f2003 approach. You are using non-advancing input and
paying attention to the eor status to detect the end of record. That's
the "right" (as in specified by the f95 standard) way to read in such a
file and detect exact record sizes.
Your problem is not in the input, but in storing the information in your
buffer. You appear to be assuming that
> write(buffer(i), *)
will put something related to an end of record into the buffer. That
isn't so. You are probably stuck in a mode of thinking that formatted
records are always represented using an end-of-record marker character.
That isn't necessarily so. (See my comments below about Arjen's answer,
which has simillar issues). Even on systems that use markers for end of
record, the markers are not necessarily single characters; Dos/Windows,
for example, uses a 2-character sequence for the marker.
But more directly to the point for your code, a record marker character
is never used for internal files, even if the system does use them for
external files. (An internal file is a "file" stored in a character
variable; your buffer(i) is an internal file here.) The records of an
internal file that is an array are the scalar elements; a scalar
internal file has only one record. There are no record markers stored in
the file. Thus your attempt to write an empty record does not write one
of those markers that don't exist. Instead, it will just fill the record
(a single character in this case) with blanks. When you later reference
the buffer, that blank isn't going to be any diferent from any other
blank.
Unfortunately, this problem is more fundamental than the use of
list-directed formatting. Mike was probably skimming when he suggested
that you change to an explicit format; that won't solve the fundamental
problem that internal files don't explicitly store record boundaries at
all. MIke's comment does apply to the later code where you write the
data out. Using list-directed formatting there will cause you multiple
problems. You appear to be trying to hack around one of them by using a
character string for the write subroutine even though you used an array
for the read subroutine. I'm not sure how you get from one to the other.
Maybe sequence association? Hmm, I forget whether that can work in
conjunction with assumed length, but it isn't worth checking, as it does
not solve all the related problems anyway. You really just don't want to
use list-directed formatting there at all. But back to the problem of
storing the information internally.
If you want to store information about record boundaries, and you are
using f90/f95, you'll have to set up some convention of your own to
store them. The language won't do it for you. One obvious and simple
approach is to do what you were probably thinking of anyway - select a
character that is known not to be in the data and use that as a record
marker. Something like the ASCII newline character achar(10) comes to
mind. You just have to put the character there explicitly (as in
buffer(i)=achar(10)) instead of relying on the compiler to use this
convention for you. Then when you write the data, you'll have to do your
write one character at a time and check for the special character.
That should work in f90/f95. Slightly laborious, but not horrible. This
post is long enough anyway (and I'm lazy enough) that I'll not try to
put it all together in code, but those are the ideas. Maybe someone else
will feel suuitably industrious.
Let me note that, while the approach mentioned by Arjen (using
unformatted stream) will probably work in practice, it makes assumptions
that are not guaranteed by the standard. There have existed systems
where you cannot open a formatted file as unformated at all; the Fortran
standard explicitly and intentionally allows for that kind of thing.
Arjen's method assumes both that you can open the file as unformatted at
all and that formatted records are represented using some kind of end of
record marker character. Neither assumption is nececessarily so. I have
used systems where they were not.
But....
As long as Arjen brought up stream I/O (and f2003 feature), I'll mention
the way that the standard *DOES* specify for doing this with stream. It
involves formated stream. I suspect most f95 compilers today will do
this, even though it is an f2003 feature. I'd be more confident of
compilers supporting unformatted stream, which is a simpler feature, but
unformatted stream isn't actually guaranteed to apply, as noted above.
If you read a formatted file as formated stream, the compiler (well, its
runtime support library) will translate each record boundary into an
achar(10) character. Note that this has nothing to do with however the
records might be represented in the external file. The external file
might also happen to use achar(10) for the purpose (as on Unix systems),
it might use other characters (including the multi-character cr/lf
sequence), or it might not use record marker characters at all. None of
that matters. The compiler runtimes will make sure that you will see an
achar(10) character when you read the end of record. The same thing will
happen in reverse when you write the data; an achar(10) in your buffer
will be translated when written into whatever the system uses for an end
of record. Again, I'll beg off showing actual code, half guessing that
others here might fill that blank.
You might note that this scheme is essentially the same as the f95 one I
recommended in representing the record boundaries as achar(10)
characters. The difference is just that f2003 formatted stream specifies
that the compiler has to handle this stuff for you, whereas if you are
sticking to standard f90/f95 features, you have to do it yourself.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
> Your code mostly does what I would recommend, at least for f90/f95. See
> a bit below for an f2003 approach. You are using non-advancing input and
> paying attention to the eor status to detect the end of record. That's
> the "right" (as in specified by the f95 standard) way to read in such a
> file and detect exact record sizes.
There have been over the years many Fortran systems that used
fixed length records padded with blanks. On those systems, the
usual way was to remove any trailing blanks from the record as
it was read.
> Your problem is not in the input, but in storing the information in your
> buffer. You appear to be assuming that
>> write(buffer(i), *)
> will put something related to an end of record into the buffer. That
> isn't so. You are probably stuck in a mode of thinking that formatted
> records are always represented using an end-of-record marker character.
> That isn't necessarily so. (See my comments below about Arjen's answer,
> which has simillar issues). Even on systems that use markers for end of
> record, the markers are not necessarily single characters; Dos/Windows,
> for example, uses a 2-character sequence for the marker.
More specifically, there are systems that use fixed length records,
padding with blanks, and systems that use a length field to know where
the end is. In those cases, all possible bit patterns might be used
in the data.
(snip on internal files)
In the cases I am thinking of, internal write will pad the record
with blanks to the length of the character variable.
(snip on list-directed I/O and not keeping record boundaries)
> If you want to store information about record boundaries, and you are
> using f90/f95, you'll have to set up some convention of your own to
> store them. The language won't do it for you. One obvious and simple
> approach is to do what you were probably thinking of anyway - select a
> character that is known not to be in the data and use that as a record
> marker. Something like the ASCII newline character achar(10) comes to
> mind. You just have to put the character there explicitly (as in
> buffer(i)=achar(10)) instead of relying on the compiler to use this
> convention for you. Then when you write the data, you'll have to do your
> write one character at a time and check for the special character.
My choice would be to keep the length somewhere. Then you can write
the record out with a single WRITE statement using the known length.
> That should work in f90/f95. Slightly laborious, but not horrible. This
> post is long enough anyway (and I'm lazy enough) that I'll not try to
> put it all together in code, but those are the ideas. Maybe someone else
> will feel suuitably industrious.
> Let me note that, while the approach mentioned by Arjen (using
> unformatted stream) will probably work in practice, it makes assumptions
> that are not guaranteed by the standard. There have existed systems
> where you cannot open a formatted file as unformated at all; the Fortran
> standard explicitly and intentionally allows for that kind of thing.
I believe the systems still exist, and not just inside museums.
(At least those the use RECFM=VBS for unformatted, and don't allow
any other record format (RECFM) for unformatted.)
(snip)
> As long as Arjen brought up stream I/O (and f2003 feature), I'll mention
> the way that the standard *DOES* specify for doing this with stream. It
> involves formated stream. I suspect most f95 compilers today will do
> this, even though it is an f2003 feature. I'd be more confident of
> compilers supporting unformatted stream, which is a simpler feature, but
> unformatted stream isn't actually guaranteed to apply, as noted above.
> If you read a formatted file as formated stream, the compiler (well, its
> runtime support library) will translate each record boundary into an
> achar(10) character. Note that this has nothing to do with however the
> records might be represented in the external file. The external file
> might also happen to use achar(10) for the purpose (as on Unix systems),
> it might use other characters (including the multi-character cr/lf
> sequence), or it might not use record marker characters at all. None of
> that matters. The compiler runtimes will make sure that you will see an
> achar(10) character when you read the end of record.
You might also see an achar(10) if that is allowed inside the record,
on systems that don't use it for a record marker. (Or on systems
that use CRLF as a record marker.)
> The same thing will
> happen in reverse when you write the data; an achar(10) in your buffer
> will be translated when written into whatever the system uses for an end
> of record. Again, I'll beg off showing actual code, half guessing that
> others here might fill that blank.
> You might note that this scheme is essentially the same as the f95 one I
> recommended in representing the record boundaries as achar(10)
> characters. The difference is just that f2003 formatted stream specifies
> that the compiler has to handle this stuff for you, whereas if you are
> sticking to standard f90/f95 features, you have to do it yourself.
-- glen
> Richard Maine <nos...@see.signature> wrote:
> (snip)
>
> > Your code mostly does what I would recommend, at least for f90/f95. See
> > a bit below for an f2003 approach. You are using non-advancing input and
> > paying attention to the eor status to detect the end of record. That's
> > the "right" (as in specified by the f95 standard) way to read in such a
> > file and detect exact record sizes.
>
> There have been over the years many Fortran systems that used
> fixed length records padded with blanks. On those systems, the
> usual way was to remove any trailing blanks from the record as
> it was read.
The usual way of what? That's not the usual way or any way at all of
doing what was specified - to detect the exact record size. It may be
the usual way of addressing some other question, even a closely related
one - but not the question posed.
Blanks are perfectly valid characters and can be significant. It is not
always the case that deleting trailing blanks gives anything like what
is desired. It can sometimes be the case that it does, but one has to
actually verify that to be what is wanted. One is likely to have unhappy
customers if you get in the habit of doing something different from what
is actualy asked for without asking whether it would be an acceptable
substitute. Proposing substitutes can sometimes be much better customer
service than blindly doing what was asked for, but it does require
giving feedback to make sure it does meet the need. In the case of a
newsgroup posting, that means adding appropriate qualifiers such as "if
trailing blanks in records are not significant."
If you do have a file with fixed-length records, then the literal
correct answer to the question of what the record lengths are is that
they are that fixed length. If you want to copy the file preserving
record lengths, then you would write that full length, regardless of any
blanks. One could imagine different specifications. One that is
particularly problematic is to determine the length of the data before
any blank padding that was added at the end. Unfortunately, that
particular problem would be impossible to solve without additional data.
(see below)
> Blanks are perfectly valid characters and can be significant. It is not
> always the case that deleting trailing blanks gives anything like what
> is desired.
Yes. I used to use an editor that would remove trailing blanks
from lines when editing a file. The editor was designed to be
similar to another editor that was commonly used on fixed record
length files, so maybe that made some sense. Mostly it didn't
bother me, but diff would detect the difference and that made it
harder sometimes to find the actual differences.
> It can sometimes be the case that it does, but one has to
> actually verify that to be what is wanted. One is likely to have unhappy
> customers if you get in the habit of doing something different from what
> is actualy asked for without asking whether it would be an acceptable
> substitute. Proposing substitutes can sometimes be much better customer
> service than blindly doing what was asked for, but it does require
> giving feedback to make sure it does meet the need. In the case of a
> newsgroup posting, that means adding appropriate qualifiers such as "if
> trailing blanks in records are not significant."
Now, consider the results if you treat such trailing blanks as
significant when the actually are not. In the case of text formatting,
for example, you get extra blanks where they aren't supposed to be,
and line justification is completely wrong.
> If you do have a file with fixed-length records, then the literal
> correct answer to the question of what the record lengths are is that
> they are that fixed length. If you want to copy the file preserving
> record lengths, then you would write that full length, regardless of any
> blanks. One could imagine different specifications. One that is
> particularly problematic is to determine the length of the data before
> any blank padding that was added at the end. Unfortunately, that
> particular problem would be impossible to solve without additional data.
Yes, that is exactly the problem. If you read a file that was written
on a system that pads to a fixed record length, what do you do with
the result? If you believe that you have such a file, then one way
of processing it is to remove trailing blanks. In years past, I had
many programs that did that. You read in 80A1 format, then find
the last non-blank character in the line.
Support for such programs has complicated the Fortran I/O system
for many years. If a program reading a whole line up to 80 characters
(written for fixed length records) was run on a system with variable
length records, then it could get an I/O error, reading past the
end of the record. The fix was that the Fortran I/O system would
pad out as needed. I believe that is still in the standard.
(Adobe just decided to update my reader, so I can't read
the standard files right now.)
In any case, programs using ACHAR(10) as a line terminator will
not work right for lines containing ACHAR(10) on systems that
allow it.
-- glen
I wrote an example program for using stream I/O that I was going to
polish up
and place on the Fortran wiki site; but apparently never did. It uses
f2003 features
(that work with recent g95, gfortran, and Intel Fortran compilers at a
minimum); if that
is OK I believe it does what you want:
http://home.comcast.net/~urbanjost/CLONE/EXAMPLES/STREAMIO/juslurp.f90
That being said, the previously posted discussion contains many valid
points and caveats; and I personally generally
avoid reading entire files into memory. Doing so is often a waste of
memory compared to record-based processing,
among other things. The example reads a file into memory and then
writes it back out in reverse order, by the way.
Here is the code if It could help :
##############################################
module readwrite
use iso_fortran_env, only : iostat_end, iostat_eor
implicit none
integer, parameter :: MIN_NUMFICH = 10
integer, parameter :: MAX_NUMFICH = 100
contains
function free_numfile(minid) result(numfile)
integer, intent(in), optional :: minid
integer :: numfile, i, local_minid
local_minid = MIN_NUMFICH
if (present(minid)) local_minid = minid
do i=local_minid,MAX_NUMFICH
inquire(unit=i, number=numfile)
if (numfile == -1) then
numfile = i
exit
endif
enddo
if (numfile == -1) then
print *, "Can't find a free file descriptor, stop"
stop
endif
end function
subroutine read_file_to_buffer(filename, buffer)
character(len=*), intent(in) :: filename
character, dimension(:), allocatable, intent(inout) :: buffer
integer :: err, numfile, file_size, offset
character :: c
numfile = free_numfile()
file_size = 1
open(unit=numfile, file=filename, &
action="read", access="stream", form="unformatted",
status="old")
do
read(numfile, pos=file_size, iostat=err) c
if (err == iostat_end) exit
file_size = file_size + 1
enddo
allocate(buffer(file_size))
buffer = ""
offset = 1
do
read(numfile, pos=offset, iostat=err) c
if (err == iostat_end) exit
buffer(offset) = c
offset = offset + 1
enddo
close(numfile)
end subroutine read_file_to_buffer
subroutine write_buffer_to_file(filename, buffer)
character(len=*), intent(in) :: filename
character(len=*), intent(in) :: buffer
integer :: numfile
numfile = free_numfile()
open(unit=numfile, file=filename, &
action="write", access="stream", form="unformatted",
status="replace")
write(numfile, pos=1) buffer
close(numfile)
end subroutine write_buffer_to_file
end module readwrite
################################
Best regards,
Cyril.
> Thanks a lot for your comments.
> Finally I succeded in writing a working module on Linux Ubuntu and
> ifort 11.1 which uses STREAMIO.
> However, I understand this method does not work on all systems.
>
> Here is the code if It could help :
Yes, it could. A few comments.
1. You appear to have an of-by-one error in the file size. You increment
the file size by 1 *AFTER* sucessfully reading a character at that size,
so your file_size value is one greater than the actual file size. I
noticed this by compiling and running your code on this Mac (with both
NAG Fortran and g95) and noticing that the output file was 1 byte
larger.
2. You don't show the calling code, so I still haven't figured out how
you have a character array returned from the read routine, but then use
it as a scalar character string in the write routine. The sequence
association rules are messy, and I didn't restudy them in detail to be
sure, but I don't think you can do that, at least if the calling code
looks like I would assume (it not being shown means I have to guess).
Anyway, both compilers that I tried rejected it. There is probably a way
to trick things into working, perhaps with some messiness with transfer
and/or specification expressions, but I took the simpler (to me) route
of changing buffer in the output routine to be an array, just like the
one in the read routine, except that it doesn't need to be allocatable.
3. I'm not sure why you are using pos= in all the read/write statements.
While that is valid, I see no point. You are basically reading and
writing sequentially anyway. I wonder whether perhaps you thought the
pos= was required with stream I/O. It isn't. Using pos also slightly
complicates things if you change to formatted I/O (because record
markers can be multi-byte).
4. Since you did most of the work, I went ahead and rewrote it to show
how to use formatted stream, as I previously recommended. Although the
unformatted will probably work on systems you are likely to use, I
figure why not do it correctly when it isn't any harder.
My converted code is below, along with the small main program I used to
test. I used a copy of the module source code for the test file to copy,
it being a text file that was handy at the time. I basically did minimal
changes to make it work with formatted stream.
A few things that I noted.
1. I was wrong before (it happens) when I said that record ends were
converted to achar(10) in formatted stream input. I forgot that
conversion is only on output. As a result, my first attempt at testing
this put the output all on a single line. The statement
if (err == iostat_eor) c = achar(10)
in the code below is my fix for that. Nothing needed on the output side,
as it is automatically converted there.
2. The format 99999a1 in the output is a bit hacky, but valid (as long
as the file isn't longer than that). I could have changed the code to
write one character at a time in a DO loop to avoid that hack. Or I
think f2008 has a cute feature with something like * as a repeat factor,
but I wouldn't count on compilers supporting that yet.
3. Note the business about buffer(1:size(buffer)-1) in the write. That's
because a formatted write is always going to eventually add a record end
in addition to any data written. Advance='no' can procrastinate that
record end, but eventually it will happen, even if it waits until the
close of the file (or end of the program, which implicitly closes the
file). If I write the last newline as data, then there will be an extra
one added. This all does assume that the file ends in a newline; note
that I never actualy write the last character of the buffer; I just
assume it must have been a newline. Valid text files are supposed to end
with newlines, but that doesn't mean all text files actually do. If I
were writing this code "for real", I'd be a little less sloppy here.
Maybe I would check for the last newline. I certainly would check for
the case of a zero-sized file, which this code as shown will fail on.
module readwrite
implicit none
contains
action="read", access="stream", form="formatted", status="old")
do
read(numfile, '(a1)', advance='no', iostat=err) c
if (err == iostat_end) exit
file_size = file_size + 1
enddo
file_size = file_size - 1
allocate(buffer(file_size))
buffer = ""
offset = 1
rewind(numfile)
do
read(numfile, '(a1)', advance='no',iostat=err) c
if (err == iostat_end) exit
if (err == iostat_eor) c = achar(10)
buffer(offset) = c
offset = offset + 1
enddo
close(numfile)
end subroutine read_file_to_buffer
subroutine write_buffer_to_file(filename, buffer)
character(len=*), intent(in) :: filename
character(len=1), intent(in) :: buffer(:)
integer :: numfile
numfile = free_numfile()
open(unit=numfile, file=filename, &
action="write", access="stream", form="formatted",
status="replace")
write(numfile, '(99999a1)',advance='no') buffer(1:size(buffer)-1)
close(numfile)
end subroutine write_buffer_to_file
end module readwrite
program testrw
use readwrite
implicit none
character*1, allocatable :: buffer(:)
call read_file_to_buffer('rw.f90',buffer)
call write_buffer_to_file('out.f90',buffer)
end program