How to read a input file of unknown size using dynamic allocation?

John Chauvin

unread,

Apr 3, 2003, 8:52:45 PM4/3/03

to

I have a basic question on the best procedure for reading a input file
of unknown size. The input file has a unknown number of columns
separated by one or more spaces. Each column has a unknown number of
rows:

1 5 9 ...
2 6 10 ...
3 7 11 ...
4 8 12 ...
5 9 13 ...
6 10 14 ...
. . . ...
. . . ...
. . . ...
. . . ...

I would like to develop a routine which reads this data without the
use of any fixed sized arrays therefore providing maximum flexibility
to handle input files of any size.

My current routine first determines the number of columns using a
fixed size character string (read the first line and count the number
of distinct numeric values). The data is then read into a fixed size
2-D array. So the real questions are:

How do I determine the number of columns in the data without some
assumption regarding the size or number of columns? Is it possible to
count the number of rows and read in the data at the same time? The
only way I see to get the number of rows to dynamically allocate the
input array is to open the file.... count the lines....rewind the
file....and read the values...not very efficient.

Thanks for the help,

John C.

Walt Brainerd

unread,

Apr 3, 2003, 10:57:31 PM4/3/03

to

Here is an F program that might get you started.
It makes two assumptions: a) there is an upper
bound on the quantity of numbers (100 in this
program) and b) no input value will be equal to
-huge(0).

You can get around (a) with dynamic arrays.

The "obvious" thing to try--list-directed formatting
with advance="no"--unfortunately is not legal.

program read_em
integer, dimension(100) :: a
integer, parameter :: signal = -huge(a)
integer :: ios
a = signal
read (unit=*, fmt=*, iostat=ios) a
print *, a(:count(a/=signal))
end program read_em

task

unread,

Apr 4, 2003, 12:57:50 AM4/4/03

to

try to use

do while (.not.eof(unit_of_opened_file))
........... instructions
enddo

Uzytkownik "John Chauvin" <jcha...@panix.com> napisal w wiadomosci
news:53cc3f57.0304...@posting.google.com...

Heiko Neus

unread,

Apr 4, 2003, 2:08:07 AM4/4/03

to

Hi there

what about reading the values first in a connected list of pointer
elements. Here you just allocate a new element and increment a counter, if
there is a new value.

Perhaps, if you don't want to handle a connected list (perhaps think of
one connected in both directions) you can allocate an array with the size
of the counter and just copy the values of the list to the array

hth

Heiko

John Chauvin

unread,

Apr 4, 2003, 9:25:53 AM4/4/03

to

Thanks Walt. I am not familiar with the F language. It is not clear
(to me at least) exactly what your example is trying to do. I have
compiled and tested your example using f90. Using sample column data,
the printed result was zero.

It is not clear there is a elegant solution to my question. To restate
another way:

Given a data file of unknown columns and rows, what is the best way to
read in and store this data? This breaks down into:

1. How best to determine the number of columns and
2. How best to determine the number of rows.

The second part is how to do the above using dynamic arrays. It is the
typical Catch-22 problem. You cannot read the data into a 2-D array
unless the array has the proper size. But you do not known the proper
size until you read the data. How do people solve this common problem?

Thanks,

John Chauvin

Walt Brainerd <wa...@fortran.com> wrote in message news:<3E8D02A...@fortran.com>...

John Chauvin

unread,

Apr 4, 2003, 9:38:27 AM4/4/03

to

Thanks for the reply. Your example shows how to cycle through each row
of data but the "devil is in the details". My problem really has two
parts:

1. How best to determine the number of columns and rows contain in the
input file.
2. How to read this data into a 2-D array which has been dynamically
size to fit the data.

The "brute force" method is to count the number of rows using a fixed
size array. Count each row via a read statement...allocate the 2-D
input array...rewind the input file and read the data into the array.

Is there a better approach to this solution?

Thanks,

John Chauvin

"task" <lord...@wp.pl> wrote in message news:<b6j6vp$mro$1...@korweta.task.gda.pl>...

Jan C. Vorbrüggen

unread,

Apr 4, 2003, 10:03:44 AM4/4/03

to

You count the number of columns from the first line. You start with a
guess for the number of rows, and allocate an appropriate 2D array. When
you read the first line over and above the pre-allocated amount, double
the number of rows, allocate a new array, copy the old data into the new
array, deallocate the old array, and continue. Repeat as necessary. When
you reach EOF, optionally trim (via re-allocation as above) the final
array size to the actual number of rows; this should only be done if you
will later on be tight on memory.

Jan

Paul van Delst

unread,

Apr 4, 2003, 10:05:25 AM4/4/03

to

John Chauvin wrote:
>
> Thanks Walt. I am not familiar with the F language. It is not clear
> (to me at least) exactly what your example is trying to do. I have
> compiled and tested your example using f90. Using sample column data,
> the printed result was zero.
>
> It is not clear there is a elegant solution to my question. To restate
> another way:
>
> Given a data file of unknown columns and rows, what is the best way to
> read in and store this data? This breaks down into:
>
> 1. How best to determine the number of columns and
> 2. How best to determine the number of rows.
>
> The second part is how to do the above using dynamic arrays. It is the
> typical Catch-22 problem. You cannot read the data into a 2-D array
> unless the array has the proper size. But you do not known the proper
> size until you read the data. How do people solve this common problem?

I would count the number of rows first reading in as little as possible, e.g.:

CHARACTER(1) :: dummyrow
CHARACTER(10000) :: dummycol
OPEN( lun, formatted, sequential, iostat etc etc)

! -- Count the rows
n_rows = 0
row_count_loop: DO
READ( lun, '(a)', IOSTAT = IO_Status ) dummyrow
IF ( IO_Status < 0 ) THEN
EXIT row_count_loop
ELSE IF ( IO_Status > 0 ) THEN
...Process error....
END IF
n_rows = n_rows + 1
END DO row_count_loop
REWIND(lun)

! -- Count the columns
READ( lun, '(a)', IOSTAT = IO_Status ) dummycol
IF ( IO_Status /= 0 ) THEN
...Process error...
END IF
...Parse string "dummycol" for distinct numerical values....
...store value in n_cols...
REWIND( lun )

ALLOCATE( Data_Array( n_cols, n_rows ), STAT = Allocate_Status )
IF ( Allocate_Status /= 0 ) THEN
...Process error...
END IF

...Now read all the data for real...

(You said you already know how to determine the number of columns)

The above might not be syntactically correct, but you get the gist of it. I don't
understand why counting the rows, rewinding, counting the columns, rewinding, and reading
the data is bad. How else are you going to know how much data you have? (Can you tell I'm
not a sophisticated programmer? :o) A well used tool in IDL is written the in the same
way.

The thing I dislike the most is the declaration of dummycol as length 10000. What happens
if you're line is longer? (Is there a maximum allowed value for the length of a character
variable in f90/95?)

cheers,

paulv

--
Paul van Delst
CIMSS @ NOAA/NCEP/EMC
Ph: (301)763-8000 x7748
Fax:(301)763-8545

Tom McGlynn

unread,

Apr 4, 2003, 10:43:59 AM4/4/03

to John Chauvin

John Chauvin wrote:
> Thanks for the reply. Your example shows how to cycle through each row
> of data but the "devil is in the details". My problem really has two
> parts:
>
> 1. How best to determine the number of columns and rows contain in the
> input file.
> 2. How to read this data into a 2-D array which has been dynamically
> size to fit the data.
>

...

As you point out, the devil is in the details, and that's true of
the problem specification as well...

If each line has exactly the same length, then you can try to get
the number of rows by dividing the total size of the file
by the length of the first row. I can't remember if Fortran has
any standard way to get file lengths, or if you'd need to resort
to extensions or system calls.

Does each row contain the same number of columns? Can there
be incomplete rows, or rows on multiple lines? We're all
assuming, yes and no, but our code is going to be a little
fragile unless we check this.

What are you trying to optimize here: Simplicity of the code?
Use of memory? Speed? Can you always easily re-read the data (e.g.,
if you're reading from a pipe that may not be possible)?

If you're reading from a file and you want to have nice simple
code and minimize memory, then the two-pass approach: reading
the file once to get the dimensions and then reading the data
into a correctly sized array may be desireable. This is
Paul van Delst's approach. I think you can simplify his code a
little. The first read in the first pass needs to read in
a big enough buffer to ensure it reads an entire row which is
parsed to get the number of columns.
Subsequent first pass reads can be dummies -- they don't need to read
and data at all since we're just trying to count to the end.

If you may sometimes be reading from non-rewindable inputs, or
if the overhead of re-reading the file is too painful, then
you might look at the list approach, or using initial guesses
for the number of rows and reallocating the size as needed,
e.g., the suggestions of Jan Vorbruggen or Heiko Neus.

Good luck,
Tom McGlynn

Duane Bozarth

unread,

Apr 4, 2003, 12:01:53 PM4/4/03

to

Paul van Delst wrote:
>
> ....The thing I dislike the most is the declaration of dummycol as length 10000. What happens

> if you're line is longer? (Is there a maximum allowed value for the length of a character
> variable in f90/95?)

I'm not sure what the actual standard says (I expect one of the regulars
will chime in) but I checked the CVF documentation and it says (for
V6.5) that CHARACTER has length 1 to 65535. It is not marked as an
extension or noted as a limitation, but that isn't a guarantee that it
is actually a limit placed on the length by the standard. I'd think
that for implementations with a longer integer, the limit might be
longer, but that's purely conjecture.

Walt Brainerd

unread,

Apr 4, 2003, 12:14:39 PM4/4/03

to

F is a subset of Fortran, so the program should
work as it does for me (see below).

If each number occupies a fixed number of columns
in the input, it will be much easier as you can
use formatted nonadvancing input. You didn't say
so I assumed each row could have a different number
of numbers and could occupy any number of columns.
Perhaps I should have guessed otherwise, since you
said you want to count the number of columns.

[walt@localhost TESTING]$ cat f.f95

program read_em
integer, dimension(100) :: a
integer, parameter :: signal = -huge(a)
integer :: ios
a = signal
read (unit=*, fmt=*, iostat=ios) a
print *, a(:count(a/=signal))
end program read_em

[walt@localhost TESTING]$ F f
[walt@localhost TESTING]$ ./a.out
1 2 3 4
5 6
7 8 9
10
1 2 3 4 5 6 7 8 9 10
[walt@localhost TESTING]$

I typed ^D after the first 10.

John Chauvin

unread,

Apr 4, 2003, 1:47:54 PM4/4/03

to

Tom McGlynn <t...@lheapop.gsfc.nasa.gov> wrote in message news:<3E8DA83F...@lheapop.gsfc.nasa.gov>...

> If each line has exactly the same length...

Each line will contain the same number of columns but may not be the
same length (i.e. same number of characters) due to differences in
magnitude and precision. For example:

2.0 12.3456 56.3
2.1245 6.789 123456.1267

>
> Does each row contain the same number of columns? Can there
> be incomplete rows, or rows on multiple lines? We're all
> assuming, yes and no, but our code is going to be a little
> fragile unless we check this.

Each row should contain the same number of columns. The input routine
would have to check each row to detect any missing values.

>
> What are you trying to optimize here: Simplicity of the code?
> Use of memory? Speed? Can you always easily re-read the data (e.g.,
> if you're reading from a pipe that may not be possible)?

Memory and speed. The data will be read from a data file and not a
pipe. I had not consider the option of reading from a pipe.

>
> If you're reading from a file and you want to have nice simple
> code and minimize memory, then the two-pass approach: reading
> the file once to get the dimensions and then reading the data
> into a correctly sized array may be desireable. This is
> Paul van Delst's approach. I think you can simplify his code a
> little. The first read in the first pass needs to read in
> a big enough buffer to ensure it reads an entire row which is
> parsed to get the number of columns.
> Subsequent first pass reads can be dummies -- they don't need to read
> and data at all since we're just trying to count to the end.

>
> If you may sometimes be reading from non-rewindable inputs, or
> if the overhead of re-reading the file is too painful, then
> you might look at the list approach, or using initial guesses
> for the number of rows and reallocating the size as needed,
> e.g., the suggestions of Jan Vorbruggen or Heiko Neus.

Thanks for all the suggestions. This newsgroup is such a valuable
resource for the the Fortran programmer. My company use to have
numerous Fortran experts...all are gone now. The new hires are doing
most of their work with Excel these days and absolutely none have any
Fortran training....sad

John Chauvin

James Van Buskirk

unread,

Apr 4, 2003, 2:22:57 PM4/4/03

to

"John Chauvin" <jcha...@panix.com> wrote in message
news:53cc3f57.03040...@posting.google.com...

> Thanks for the reply. Your example shows how to cycle through each row
> of data but the "devil is in the details". My problem really has two
> parts:

> 1. How best to determine the number of columns and rows contain in the
> input file.
> 2. How to read this data into a 2-D array which has been dynamically
> size to fit the data.

> The "brute force" method is to count the number of rows using a fixed
> size array. Count each row via a read statement...allocate the 2-D
> input array...rewind the input file and read the data into the array.

> Is there a better approach to this solution?

Your original example file had INTEGER entries, for which the following
works:

module linked_list_mod
implicit none
type node
integer, pointer :: array(:) => NULL()
type(node), pointer :: next => NULL()
end type node
end module linked_list_mod

module get_record_mod
use linked_list_mod
implicit none
type char_node
character c
type(char_node), pointer :: next => NULL()
end type char_node
contains
subroutine get_record(iunit,temp)
integer, intent(in) :: iunit
type(node) :: temp
integer numc
character c
integer iostat
type(char_node), pointer :: head
type(char_node), pointer :: cursor

numc = 0
nullify(head)
do
read(iunit,'(a)',advance='no',iostat=iostat) c
if(iostat /= 0) exit
if(numc == 0) then
allocate(head)
cursor => head
else
allocate(cursor%next)
cursor => cursor%next
end if
numc = numc+1
cursor%c = c
end do
call gr_1(head,numc,temp%array)
end subroutine get_record

subroutine gr_1(head,numc,array)
type(char_node), pointer :: head
integer, intent(in) :: numc
integer, pointer :: array(:)
character(numc) line
integer i
type(char_node), pointer :: cursor
character(3), parameter :: set = ', '//achar(9)
integer numi
integer deltai

do i = 1, numc
cursor => head
head => head%next
line(i:i) = cursor%c
deallocate(cursor)
end do
numi = 0
i = 1
do
deltai = verify(line(i:),set)
if(deltai == 0) exit
i = i+deltai-1
numi = numi+1
deltai = scan(line(i:),set)
if(deltai == 0) exit
i = i+deltai-1
end do
allocate(array(numi))
read(line,*) array
end subroutine gr_1
end module get_record_mod

program test
use get_record_mod
implicit none
integer iunit
type(node), pointer :: head
type(node), pointer :: cursor
integer, allocatable :: array(:,:)
integer rows
integer, allocatable :: temp(:)
integer i
character(80) fmt
integer iostat

allocate(head)
iunit = 10
open(iunit,file='test.dat',status='old')
call get_record(iunit,head)
rows = 1
allocate(temp(size(head%array)))
cursor => head
do
read(iunit,*,iostat=iostat) temp
if(iostat /= 0) exit
rows = rows+1
allocate(cursor%next)
cursor => cursor%next
allocate(cursor%array(size(temp)))
cursor%array = temp
end do
allocate(array(rows,size(temp)))
deallocate(temp)
do i = 1, rows
cursor => head
head => head%next
array(i,:) = cursor%array
deallocate(cursor)
end do
! At this point array contains the values from the input file.
write(fmt,'(a,i0,a)') '(',size(array,2),'(1x,i0))'
write(*,fmt) transpose(array)
end program test

Now that I see you actually want to input REAL or DOUBLE PRECISION
values, some changes will be required, left as an exercise for the
reader.

--
write(*,*) transfer((/17.392111325966148d0,3.6351694777236872d228, &
6.0134700169991705d-154/),(/'x'/)); end

Duane Bozarth

unread,

Apr 4, 2003, 2:52:49 PM4/4/03

to

John Chauvin wrote:
>
> Tom McGlynn <t...@lheapop.gsfc.nasa.gov> wrote in message news:<3E8DA83F...@lheapop.gsfc.nasa.gov>...
>
> > If each line has exactly the same length...
>
> Each line will contain the same number of columns but may not be the
> same length (i.e. same number of characters) due to differences in
> magnitude and precision. For example:
>
> 2.0 12.3456 56.3
> 2.1245 6.789 123456.1267
>
> >
> > Does each row contain the same number of columns? Can there
> > be incomplete rows, or rows on multiple lines? We're all
> > assuming, yes and no, but our code is going to be a little
> > fragile unless we check this.
>
> Each row should contain the same number of columns. The input routine
> would have to check each row to detect any missing values.
>
> >
> > What are you trying to optimize here: Simplicity of the code?
> > Use of memory? Speed? Can you always easily re-read the data (e.g.,
> > if you're reading from a pipe that may not be possible)?
>
> Memory and speed. The data will be read from a data file and not a
> pipe. I had not consider the option of reading from a pipe.
>

Unless the file is really large and the time spent doing something on
the data after reading them in is really small, I'd expect the time
required for the "count the lines" solution to be small enough as to not
be a problem. And, except for the need to specify a buffer that will
hold the longest possible line to be read, it will be about as
memory-efficient (in terms of overall storage, at least) as any
(although with today's typical multi-multi-MB memory machines, that is
rarely much of a <real> constraint any more.

I find it hard sometimes to bring myself to use such a technique as it
just doesn't look "elegant", but oftentimes, it's hard to beat KISS...

Gus Gassmann

unread,

Apr 4, 2003, 4:18:16 PM4/4/03

to

John Chauvin wrote:

> Tom McGlynn <t...@lheapop.gsfc.nasa.gov> wrote in message news:<3E8DA83F...@lheapop.gsfc.nasa.gov>...
>
> > If each line has exactly the same length...
>
> Each line will contain the same number of columns but may not be the
> same length (i.e. same number of characters) due to differences in
> magnitude and precision. For example:
>
> 2.0 12.3456 56.3
> 2.1245 6.789 123456.1267
>
> >
> > Does each row contain the same number of columns? Can there
> > be incomplete rows, or rows on multiple lines? We're all
> > assuming, yes and no, but our code is going to be a little
> > fragile unless we check this.
>
> Each row should contain the same number of columns. The input routine
> would have to check each row to detect any missing values.

Is there a special code for those, like an asterisk ('*'), which may cause an error
during the read, or simply more white space (in which case you'd have a helluva
time to line things up properly)? The way I read your problem, the first record
does not necessarily tell you how many fields you have. Is this correct?

Paul van Delst

unread,

Apr 4, 2003, 5:01:02 PM4/4/03

to

John Chauvin wrote:
>
> Tom McGlynn <t...@lheapop.gsfc.nasa.gov> wrote in message news:<3E8DA83F...@lheapop.gsfc.nasa.gov>...
>
> > If each line has exactly the same length...
>
> Each line will contain the same number of columns but may not be the
> same length (i.e. same number of characters) due to differences in
> magnitude and precision. For example:
>
> 2.0 12.3456 56.3
> 2.1245 6.789 123456.1267
>
> >
> > Does each row contain the same number of columns? Can there
> > be incomplete rows, or rows on multiple lines? We're all
> > assuming, yes and no, but our code is going to be a little
> > fragile unless we check this.
>
> Each row should contain the same number of columns. The input routine
> would have to check each row to detect any missing values.
>
> >
> > What are you trying to optimize here: Simplicity of the code?
> > Use of memory? Speed? Can you always easily re-read the data (e.g.,
> > if you're reading from a pipe that may not be possible)?
>
> Memory and speed. The data will be read from a data file and not a
> pipe. I had not consider the option of reading from a pipe.

One other option that I've also used, where possible, is to (shock, gasp!) modify the
input file or code that write the input file so that the first line contains the two
dimensions: n_cols and n_rows. These are read, the allocation is done, and the read
proceeds. Single pass. Would work with a pipe too, wouldn't it?

This is a really good solution when the read takes forever and you simply don't want to
wait for the two pass method. Of course, it won't work if you have no control over the
data file creation.

Richard Maine

unread,

Apr 4, 2003, 9:32:08 PM4/4/03

to

"task" <lord...@wp.pl> writes:

> do while (.not.eof(unit_of_opened_file))

Eof is not a standard intrinsic and is not available in many compilers.

--
Richard Maine
email: my last name at domain
domain: isomedia dot com

Ken Plotkin

unread,

Apr 5, 2003, 11:35:51 AM4/5/03

to

On Sat, 05 Apr 2003 02:32:08 GMT, Richard Maine <nos...@see.signature>
wrote:

>"task" <lord...@wp.pl> writes:
>
>> do while (.not.eof(unit_of_opened_file))
>
>Eof is not a standard intrinsic and is not available in many compilers.

There are compilers that have eof, but where eof is incredibly slow.

FWIW, I often have to see how much stuff is in a file so I can
allocate to match. For number of lines, I just do a dry read, using
"end= " in the read statement.

I have not studied all of the suggested solutions, so this may be in
one of them. To find out how many columns, I'd read the first line as
a string, then count how many times a space is followed by a digit,
allowing for the possibility of no leading space at the start.

In compilers with stream (AKA threepwood, binary, transparent, etc.)
input, you don't even have to allocate a large string for that first
line: open as stream, then read (and test) one character at a time
until you hit the line delimiter.

Ken Plotkin

Bil Kleb

unread,

Apr 5, 2003, 12:03:55 PM4/5/03

to

Richard Maine wrote:
> "task" <lord...@wp.pl> writes:
>
> >do while (.not.eof(unit_of_opened_file))
>
> Eof is not a standard intrinsic and is not available in many compilers.

That's too bad. I was awed by the readability of that form ...

--
Bil Kleb
NASA, Hampton, Virginia, USA

Ken Plotkin

unread,

Apr 5, 2003, 1:23:26 PM4/5/03

to

On Sat, 05 Apr 2003 12:03:55 -0500, Bil Kleb <W.L....@LaRC.NASA.Gov>
wrote:

>That's too bad. I was awed by the readability of that form ...

Try it with MS Powerstation, and you'll have plenty of time to read
it. :-)

Klaus Schmid

unread,

Apr 5, 2003, 6:39:53 PM4/5/03

to

Actually I think any other language is more apropriate for
this kind of purpose. Fortran has no strength in i/o. It
is not possible to copy a file since we never know how
long the lines are really. Information about trailing
blanks and remaining characters is not accessible in
Fortran.

Although the list directed input helps a lot with parsing,
it can be used only in one way, no options, no controls,
no supplementary information.

Beside i/o there are many reasons to use Fortran. And
there might be reasons to use Fortran purely, without C-
functions and non-standard features.

As far I can see, the problem described could be
solved fairly good with F77 (tested with g77) as follows.

parameter( nd= 999)
real x(1:nd)

iu= 10
open(iu,file='read1.dat',status='old')
call getrc(iu,nd,nr,nc,x)
write(*,*) 'nr,nc=', nr, nc
end

subroutine getrc( iu, nd, nr, nc, x)

real x(1:nd)
parameter( real_inf= -1.7E38)

c read data
rewind(iu)
do i= 1, nd
x(i)= real_inf
enddo
read(iu,*,end=180) (x(i),i=1,nd), xt
stop '(too many data)'
c (with F77+ you may loop here to increase the array size dynamically)
180 continue
xt= real_inf/2.
i= 0
190 continue
i= i +1
if ( i .le. nd .and. x(i) .gt. xt) goto 190
n1= i -1

c get number of rows
rewind(iu)
nr= 0
210 continue
read(iu,*,end=290) xt
nr= nr +1
goto 210
290 continue

c check columns
nc= int(n1/nr)
if ( nc *nr .ne. n1) stop '(wrong columns)'

c supp. check (may force error)
rewind(iu)
do i= 1, nr
read(iu,*) (xt,j=1,nc)
enddo

end

-- Klaus

me...@skyway.usask.ca

unread,

Apr 5, 2003, 6:45:47 PM4/5/03

to

In a previous article, klaus....@kdt.de (Klaus Schmid) wrote:
>Actually I think any other language is more apropriate for
>this kind of purpose. Fortran has no strength in i/o. It
>is not possible to copy a file since we never know how
>long the lines are really. Information about trailing
>blanks and remaining characters is not accessible in
>Fortran.

If your fortran will do "true" binary or stream or whatever
the local brand calls it, then you can read as little as you like
and look for CR, line feed etc. and other control chars (if you
think it's ascii).
That's what I would do with an unknown format. Not fun, but it gets the
job done.
Chris

<snip>

bv

unread,

Apr 5, 2003, 9:16:47 PM4/5/03

to

Bil Kleb wrote:
>
> That's too bad. I was awed by the readability of that form ...

However, it's easy to roll your own -- here's how:

logical function eof(lun)

eof = .false.
read(lun,*,iostat=ieof) dmy
if(ieof .ne. 0) then
eof = .true.
return
endif
backspace (lun)
end

Then,
do while (.not.eof(lun))

might awe you again. Check details, there's a smoke signal for me to
split...

--
Dr.B.Voh
------------------------------------------------------
Applied Algorithms http://sdynamix.com

Dave Weatherall

unread,

Apr 6, 2003, 3:11:43 AM4/6/03

to

On Sat, 5 Apr 2003 23:39:53 UTC, klaus....@kdt.de (Klaus Schmid)
wrote:

> Actually I think any other language is more apropriate for
> this kind of purpose. Fortran has no strength in i/o. It
> is not possible to copy a file since we never know how
> long the lines are really. Information about trailing
> blanks and remaining characters is not accessible in
> Fortran.

This rather depends on your definition of 'file' and 'line' and, to a
degree, their relationship with the host operating system. Are the
records fixed or variable? Are they formatted or unformatted? Is the
data contained therein Little-endian or Big-Endian? Are the
Floating-Point data formats compatible?

To me, your statement implies a 'one size fits all' solution is
available, somewhere in some language. The fact of the matter is that
a file represents saved information. For producer and consumer to
communicate, each needs to know the definition of the content/layout.
This is paricularly important when moving data in files from one
OS/CPU combination to another. This is not specific to programming
language. (Except maybe Java :-) but only then because, in effect, the
VM is the 'same' on all platforms)

--
Cheers - Dave.

John Chauvin

unread,

Apr 6, 2003, 2:30:46 PM4/6/03

to

Gus Gassmann <hgas...@mgmt.dal.ca> wrote in message news:<3E8DF697...@mgmt.dal.ca>...

> Is there a special code for those, like an asterisk ('*'), which may cause an error
> during the read, or simply more white space (in which case you'd have a helluva
> time to line things up properly)? The way I read your problem, the first record
> does not necessarily tell you how many fields you have. Is this correct?

The data does not contain any information on the number of columns or
rows. All values will be real (not integers). The following errors
could occur:

1. Two columns not separated by a space
ex. 1256.34-12456.34
2. Not a number error for a value
ex. 2345.09 345.0 nan
3. Blank value

Thanks,

John Chauvin

Klaus Schmid

unread,

Apr 6, 2003, 6:20:16 PM4/6/03

to

"Dave Weatherall" <djw...@attglobal.net> wrote in message news:<DTiotGxQ0bj6-pn2-JqUWi4qdgQvn@localhost>...

> This rather depends on your definition of 'file' and 'line' and, to a
> degree, their relationship with the host operating system. Are the
> records fixed or variable? Are they formatted or unformatted? Is the
> data contained therein Little-endian or Big-Endian? Are the
> Floating-Point data formats compatible?
>
> To me, your statement implies a 'one size fits all' solution is
> available, somewhere in some language. The fact of the matter is that
> a file represents saved information. For producer and consumer to
> communicate, each needs to know the definition of the content/layout.
> This is paricularly important when moving data in files from one
> OS/CPU combination to another. This is not specific to programming
> language. (Except maybe Java :-) but only then because, in effect, the
> VM is the 'same' on all platforms)

Please read my comments in the context -- only plain text files or
terminal input, all 'human readable'. The layout is defined basically
by end_of_line and end_of_file -- a very basic and portable concept.

Probably sad to say, but this is not supported by Standard-Fortran.

You may consider my posted code as a workaround. With non-Standard
features more straightforward solutions would be possible.

-- Klaus

James Giles

unread,

Apr 6, 2003, 6:40:58 PM4/6/03

to

Klaus Schmid wrote:
...

> Please read my comments in the context -- only plain text files
> or terminal input, all 'human readable'. The layout is defined
> basically by end_of_line and end_of_file -- a very basic and
> portable concept.

Well, not all that portable. The ASCII standard specifies that
records be delimited by the character located at 30 (decimal)
in the collating sequence. Almost no one uses that. And the
end of file is similarly implementation dependent. Often end
of file isn't a character at all, but is determined by the length
attribute of the file's description in the system.

> Probably sad to say, but this is not supported by Standard-Fortran.

Well, it isn't in f77, but in f90 and f95 you can use non-advancing
I/O. This feature allows reading formatted (text) files a character
at a time, if that's your taste. The record and file delimiters are
not returned as characters, but *are* detectable with EOR and EOF
arguments to the I/O statements. Of course, these are detected as
the end of record and end of file conditions local to the Fortran
implementation. If you import a file from some other environment,
the file's delimiters may have to be converted.

Of course, as I look back through this thread, it seems to be mostly
in the context of g77 and other f77 compilers. If you insist on using
a 25 year old standard, you will be stuck with 25 year old solutions.
Most f77 implementations provide non-standard ways to read "raw"
input. I don't recall whether g77 does.

F2kV specifies stream I/O. I haven't read through that part of the
proposed standard. If they remain true to form, it will probably
be much more complex than most people expect or need.

--
J. Giles

Gary L. Scott

unread,

Apr 6, 2003, 6:55:23 PM4/6/03

to

Klaus Schmid wrote:
>
> "Dave Weatherall" <djw...@attglobal.net> wrote in message news:<DTiotGxQ0bj6-pn2-JqUWi4qdgQvn@localhost>...
> > This rather depends on your definition of 'file' and 'line' and, to a
> > degree, their relationship with the host operating system. Are the
> > records fixed or variable? Are they formatted or unformatted? Is the
> > data contained therein Little-endian or Big-Endian? Are the
> > Floating-Point data formats compatible?
> >
> > To me, your statement implies a 'one size fits all' solution is
> > available, somewhere in some language. The fact of the matter is that
> > a file represents saved information. For producer and consumer to
> > communicate, each needs to know the definition of the content/layout.
> > This is paricularly important when moving data in files from one
> > OS/CPU combination to another. This is not specific to programming
> > language. (Except maybe Java :-) but only then because, in effect, the
> > VM is the 'same' on all platforms)
>
> Please read my comments in the context -- only plain text files or
> terminal input, all 'human readable'. The layout is defined basically
> by end_of_line and end_of_file -- a very basic and portable concept.

EOL and EOF are hardly portable at the file content level. The method
used to represent both conditions varies widely by platform. On some
platforms, there is no file "data" content included to represent these,
instead it is external to the data content, not directly readable from
Fortran (or C for that matter). Fortunately, nearly all systems can
produce a simple bit-bucket type of file, but more sophisticated
structures are far from rare.

>
> Probably sad to say, but this is not supported by Standard-Fortran.
>
> You may consider my posted code as a workaround. With non-Standard
> features more straightforward solutions would be possible.
>
> -- Klaus

--

Gary Scott
mailto:gary...@ev1.net

Fortran Library
http://www.fortranlib.com

Support the GNU Fortran G95 Project: http://g95.sourceforge.net

Paul van Delst

unread,

Apr 7, 2003, 9:57:03 AM4/7/03

to

Well, those types of lines of data put the kibosh on reading all the numbers at once
(after counting lines and columns). That pretty much leaves the slower option of reading a
line at a time into a character buffer and then parsing that to split it up into actual
numbers.

I think the first two would be relatively easy to catch. The last one sounds like a bit of
a bugger, though. How are you supposed to determine at which column the blank value occurs
-- assuming that the format widths are variable? You'd have to read and parse several
lines to be sure you've determined how many columns to expect in a line so that when you
don't parse that number you've encountered a line in which there is a blank value....but
like I said, how would you determine which column contains the blank value (and what
values would you assign it post read? 0.0? -999.99?)

You might have already done this but can you post, say, 10 lines or so of the data file
with examples of all the above cases? These sorts of file I/O issues really get under my
skin since it prevents people from doing the interesting science stuff. :o)

Richard Maine

unread,

Apr 7, 2003, 10:56:34 AM4/7/03

to

bv <bv...@Xsdynamix.com> writes:

> However, it's easy to roll your own -- here's how:
>
> logical function eof(lun)
>
> eof = .false.
> read(lun,*,iostat=ieof) dmy
> if(ieof .ne. 0) then
> eof = .true.
> return
> endif
> backspace (lun)
> end

Easy enough if you aren't too pick about it working as expected.
Lets see, the above

1. Won't work at all on files that can't be backspaced. Such files are
reasonably common. That is often a problem with standard input
if it comes from something like a terminal or a pipe. This is the
real problem - one likely to actually bite you. The other two below
are not so significant (one because it isn't common, the other
because it is easily fixed).

2. May take an awfully long time on some systems that implement
backspace "strangely". Fortunately, those aren't too common, but
they exist.

3. Will quietly skip blank lines. That one is easy enough to fix, though;
just use an explicit format instead of list-drected.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain

Gus Gassmann

unread,

Apr 7, 2003, 12:50:43 PM4/7/03

to

John Chauvin wrote:

I think at this point the task becomes hopeless. You can't even do it by hand,
it seems:

1256.34-12456.34
2345.09-345.0 nan
123.45 124.56

How would you want to parse this, particularly the third row?

If you do not care about the positions, that is, if the second entry in the third row
(which I intended to be column 3) can be placed into column 2, then you might
try dynamic allocations, the way others have suggested before:

Guess an initial number of rows and columns
Allocate the array
nrows = 0
do until EOF
read one line
nrows = nrows + 1
if (nrows .gt. rows_allocated) then
reallocate with twice the number of rows
endif
parse one line
if (columns_found .gt. columns_allocated) then
reallocate with more columns (twice as many? columns_found?)
endif
enddo

The reallocations make this slow, and you give away space (which you
could recover with another reallocation after everything has been read).

Or you could parse and clean up one line at a time, writing them out to
a scratch file as you go. You can then keep track of the number of rows
and columns as you go and perform one allocation before you read everything
back in.

The best solution is likely situation-dependent.

Richard Maine

unread,

Apr 7, 2003, 2:17:00 PM4/7/03

to

Gus Gassmann <hgas...@mgmt.dal.ca> writes:

> I think at this point the task becomes hopeless. You can't even do it by hand,

....

One of the most important steps in solving any problem with a computer
(or without, for that matter) is to clearly state the problem. Although
that may sound like just a "motherhood" statement, it is truly amazing
the number of times you will see people spending lots of time trying
to find a solution for a problem that doesn't actually have one or
where the problem isn't well enough defined to say what a solution to
it would be.

I think I'd agree with Gus's evaluation that, at least as stated, this
problem doesn't have a unique solution. That can make it sort of hard
to find. Perhaps I missed some of the details (I was on travel all last
week, and although I skimmed all the clf postings, I didn't read them
carefully).