Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replace one single line in a ascii data file

48 views
Skip to first unread message

relaxmike

unread,
Feb 5, 2008, 10:33:03 AM2/5/08
to
Hi,

I am currently trying to process one large unformatted ascii data
file.
This file is line-based and human readable.
I am not able to find a way to replace only the 2nd line of the file.
I tried two solutions, but the first does not work for memory reasons,
and the second does not work because I do not master fortran I/O.

In the first solution, I count the number of lines with a sequence
of read statements associated with the "end" option :

this % numberoflines = 0
open ( fileunit , FILE = this % filename , ACTION = 'read' ,
STATUS = 'UNKNOWN' )
do while ( read_file )
read ( fileunit , * , end = 10 , err = 20 ) line
this % numberoflines = this % numberoflines + 1
end do
10 close ( fileunit )

Then I can allocate an array of strings and read all lines, which
are stored in each row of the array :

character ( len= ASCII_MAX_COLUMNS ) , dimension(:), pointer ::
file_data => NULL()

allocate ( this % file_data ( 1 : this % numberoflines ) )
open ( fileunit , FILE = this % filename , ACTION = 'read' ,
STATUS = 'UNKNOWN' )
do iline = 1 , this % numberoflines
read ( fileunit , this % formatmsg , end = 10 , err = 20) line
this % file_data ( iline ) = line
end do
10 close ( fileunit )

After that, it is easy to set one particular line by direct access
to the array, and then write back all the content of the array into
the file.

write ( this % file_data ( iline ) , * ) newdata

The problem occurs with large data files, because all the data
is in memory. With a data file of size 150 MB, the allocation failed.

So I tried another solution, based on replacing directly the line
in the file, without storing all data. To modify the line #iline, the
algorithm is based on reading (ilines - 1), then write the new data.

open ( fileunit , FILE = this % filename , ACTION = 'readwrite' ,
STATUS = 'UNKNOWN' )
do iline = 1, linenumber - 1
read ( fileunit , * , end = 10 , err = 20 ) line
end do
write ( fileunit , this % formatmsg ) trim ( newline_content )
10 close ( fileunit )

Obviously (!), it fails, because all lines from (iline + 1) to the end
of the
file are deleted...

When reading in my Fortran guide, I understand that a direct access
file should solve the problem :

read ( fileunit , rec= recordnumber ) newline_content

But I cannot change the format of the file, which is not a direct
access file.
That is to say, there is no record number which corresponds with my
line number.

The solution of copying the file (160 MB) is so dirty that I stoped
thinking about
it immediately after the first "Why not ?".

So my bag of ideas is empty : thank you for you help !

Best regards,

Michaël

dpb

unread,
Feb 5, 2008, 11:29:48 AM2/5/08
to
relaxmike wrote:
...

> After that, it is easy to set one particular line by direct access
> to the array, and then write back all the content of the array into
> the file.
>
> write ( this % file_data ( iline ) , * ) newdata
>
> The problem occurs with large data files, because all the data
> is in memory. With a data file of size 150 MB, the allocation failed.

...

So, just rearrange the loop--read the line, if that line is to be
changed, change it; write the (perhaps modified) line. Repeat.

There's really no difference other than moving the change logic into the
loop that does the reads from the "all in memory" solution other than
you then don't need to keep more than one line.

--

Richard Maine

unread,
Feb 5, 2008, 12:06:24 PM2/5/08
to
relaxmike <michael...@gmail.com> wrote:

> I am currently trying to process one large unformatted ascii data
> file.

Just a wording correction. You are *NOT* dealing with an unformatted
file. If it is a "line-based human readable file", it is formatted. In
fact, you can think of that as being what formatted means. In
particular, you are using list-directed formatting; that's what the *
you are using for the format indicates. Unformatted is something
entirely different (and is neither line-based, nor human readable).

> I am not able to find a way to replace only the 2nd line of the file.

You pretty much can't directly "replace" a line in a sequential file.
That's not even so much a limitation of Fortran, but of the nature of
sequential files in general. It might be possible to hack something
together based on using direct access I/O, but as the file isn't
actually a direct access file, such solutions are nonstandard,
non-portable, won't work at all in some cases (notably where the
replacement line doesn't "fit" in the existing space), and really quite
complicated to get all the fiddly bits working. If you *REALLY* want to
do that, you need to deal with doing your own record management in a
fixed-size buffer. That's way to much of a mess to try to describe to
someone in any detail (if the hint about doing your own record
management in a fixed-size buffer isn't enough help, as I strongly
suspect it isn't, then you really aren't up to doing it that way). But
that's not the recommended approach anyway.

> The solution of copying the file (160 MB) is so dirty that I stoped
> thinking about
> it immediately after the first "Why not ?".

Well, then I'd suggest going back and thinking about it more, because
that's by far the simplest solution - by a *LOT*. And a 160MB isn't even
horribly huge by today's standards. In particular, if you don't have
enough disk space to temporarily have two copies of the file, you
probably have a lot bigger problems to worry about. It is awfully
simple.

Algorithm:

open original file and the new one

read/write n-1 records, one at a time
(no need to keep an array of the things, as you write each one
imediatelyu after reading it).

read the n'th record
write the replacement nth record

read/write subsequent records to the end.
(again, no need for an array).

close both files, deleting the old one.

If you think that's "dirty", then forget I even mentioned the hack
needed to rewrite a record in place. That is at least an order of
magnitude "dirtier".

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

dpb

unread,
Feb 5, 2008, 12:13:56 PM2/5/08
to
dpb wrote:
> relaxmike wrote:
> ...
>
...

> So, just rearrange the loop--read the line, if that line is to be
> changed, change it; write the (perhaps modified) line. Repeat.

To be clear, of course, you need to be writing to a second file, not
overwriting the first...

--

Les

unread,
Feb 5, 2008, 12:36:36 PM2/5/08
to

"Richard Maine" <nos...@see.signature> wrote in message
news:1ibtxht.pbgevrt59iniN%nos...@see.signature...
<snip>

> close both files, deleting the old one.

If the "new" file is to be used elsewhere as though it was the "old" one (ie
with the same name) then you will also need to name the "new" file with the
name of the "old" file. This is system dependent. Your compiler manual will
explain how.

Les

relaxmike

unread,
Feb 6, 2008, 6:47:16 AM2/6/08
to
Thank you for all answers.
Formatted and unformatted files are now clearer for me.
The fact that the "*" is a kind of format is what confused me before
that discussion.
I even laughed with the

"That is at least an order of magnitude "dirtier"."

(fortran lets little space for fun...)
So I ended up with the solution based on a file copy.
It is really "awfully simple". The cost of having one copy
of this large file is not so expensive for that particular file.

Considering the file delete and rename, it is really a pity that
fortran does not include a standard way of processing files, so
that we are forced to use fortran extensions (which all in all
are included in almost most compilers).

For the deletion of the file, I used the following weird,
but standard fortran (ugly, but it works) :

integer :: status
open ( UNIT = file_unit , FILE = filename , STATUS ='OLD', IOSTAT=
status )
if ( status == 0 ) then
close ( UNIT = file_unit , STATUS = 'DELETE', IOSTAT= status )
end

The Intel compiler includes the "RENAME" function :

character(len=MAX_FILENAME), intent(in) :: oldfn, newfn
call RENAME ( oldfn , newfn )

It is also included in g95 and gfortran, which provide the
additionnal
"status" argument. This did not work with my current Intel 8.0 :

integer :: status
call RENAME ( oldfn , newfn , status )

generates the following error :

forrtl: severe (157): Program Exception - access violation

But it does not matter, since if you just remove the "status"
argument,
it works fine.

Again, thank you all for your answers.

Best regards,

Michaël

Gary Scott

unread,
Feb 6, 2008, 8:26:32 AM2/6/08
to
relaxmike wrote:
> Thank you for all answers.
> Formatted and unformatted files are now clearer for me.
> The fact that the "*" is a kind of format is what confused me before
> that discussion.
> I even laughed with the
>
> "That is at least an order of magnitude "dirtier"."
>
> (fortran lets little space for fun...)

...I think programming in Fortran IS fun.

<snip>

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

Thomas Koenig

unread,
Feb 6, 2008, 3:10:07 PM2/6/08
to
On 2008-02-06, relaxmike <michael...@gmail.com> wrote:

> So I ended up with the solution based on a file copy.

Just wondering... suppose I don't want to change the length of the
line in question. Will formatted stream I/O let me change things?

Should the following program print

123456
ASDFef
qwerty

?

! Replace the first four characters of the second line
! of the file foo.txt with ASDF.
program main
implicit none
character(len=6) :: c
integer :: i
open(20,file="foo.txt",form="formatted",access="stream")
write(20,'(A)') '123456'
write(20,'(A)') 'abcdef'
write(20,'(A)') 'qwerty'
rewind 20
! Skip over the first line
read(20,'(A)') c
! Save the position
inquire(20,pos=i)
! Read in the complete line...
read(20,'(A)') c
! Write out the first four characters
write(20,'(A)',pos=i,advance="no") 'ASDF'
! Fill up the rest of the line. Here, we know the length. If we
! don't, things will be a bit more complicated.
write(20,'(A)') c(5:6)
! Copy the file to standard output
rewind 20
do i=1,3
read(20,'(A)') c
print '(A)',c
end do
close (20)
end program main

glen herrmannsfeldt

unread,
Feb 6, 2008, 3:30:01 PM2/6/08
to
Thomas Koenig wrote:
(snip)

> Just wondering... suppose I don't want to change the length of the
> line in question. Will formatted stream I/O let me change things?

> Should the following program print

> 123456
> ASDFef
> qwerty
(snip)

> open(20,file="foo.txt",form="formatted",access="stream")
> write(20,'(A)') '123456'
> write(20,'(A)') 'abcdef'
> write(20,'(A)') 'qwerty'
> rewind 20

The tradition is that once you write in the middle of a
sequential file everything after that point is lost.
One reason for this is that is the way tape I/O works
on most tape drives(*). Sequential file I/O tends to
look the same for disk or tape I/O. Direct access
I/O was designed to take advantage of the properties
of disk I/O, allowing one to write in the middle of a
file without writing a new EOF.

The first direct access I/O system I knew and used was on
OS/360, which takes advantage of some properties of the
disk I/O system. Direct access files are written such that
each record is a physical disk block of the appropriate
size. (Between one byte and the length of a disk track.)

(*) DECtape, designed by DEC, is a direct access tape
system which allows the hardware to rewrite a block on the
tape without disturbing other blocks. It was supported
as a direct access I/O device by some DEC systems.
Otherwise, most tape systems are designed for sequential I/O.

-- glen

Richard Maine

unread,
Feb 6, 2008, 4:55:02 PM2/6/08
to
glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Thomas Koenig wrote:
> (snip)
>
> > Just wondering... suppose I don't want to change the length of the
> > line in question. Will formatted stream I/O let me change things?

> The tradition is that once you write in the middle of a
> sequential file everything after that point is lost...

While that is all a fine explanation, it is an explanation of a
different question than the one Thomas asked. Thomas specifically asked
about stream I/O. Stream I/O is *NOT* the same thing as sequential. Nor
is it the same thing as direct. It is a third alternative. (And no, it
isn't designed around tape access).

As it turns out, the answer is "no"; formatted stream access will not
let you modify the middle of a file. It actually takes a bit of digging
to find that answer. Even though I thougt I recalled that as the answer,
it took me a while to find it. F2003 9.2.3.3, "File position after data
transfer", 3rd para

"For a formatted stream output statement, if no error condition
occurred, the terminall point of the file is set to the highest-numbered
position to which data was transferred by the statement."

If I recall correctly, this basically follows the same rules as C. I
suspect it is there to acccomodate variations in the underlying file
structure implementation. For example, without this kind of rule, it
could be awfully hard to describe some of the things that could happen
in terms of record structure. Remember that formatted stream files also
have record structure. Suppose you wrote multiple records in the place
where there was formerly only one - or conversely wrote one record over
where there were formerly multiple ones. How would that interact with
underlying implementations that might actually keep track of records? Or
suppose you overwrote half of a 2-byte record terminator. I vaguely
remember discussions attempting to address questions like that. Seems
like it got messy to do in a general manner and that the conclusion was
that it made sense to follow the C rule and just disallow it,
particularly as C interop was one of the justifications for adding
stream I/O.

Note that this is all for formatted stream I/O. Unformatted stream is
completely different (and much simpler). There are no record issues for
unformatted stream, and the answer is "yes", you can modify things in
the middle of the file. See 9.2.2.3(3), which basically says that you
can reador write the data in any order with unformatted stream, subject
only to "obvious" limitations (you can't read stuff that hasn't been
written).

So, yes, opening the file as unformatted stream allows you to muck
around pretty much at will. Pretty much like the direct access option I
mentioned earlier. But you do need to work at a pretty low level and
know what you are doing. (For example, you will have to explicitly
handle the system-dependent formatted record terminators). The standard
doesn't guarantee that you can open any file as unformatted stream, but
your odds on at least current systems (i.e. any system you will ever see
an implementation of f2003 stream I/O on).

Thomas Koenig

unread,
Feb 7, 2008, 5:38:08 PM2/7/08
to
On 2008-02-06, Richard Maine <nos...@see.signature> wrote:
> F2003 9.2.3.3, "File position after data
> transfer", 3rd para
>
> "For a formatted stream output statement, if no error condition
> occurred, the terminall point of the file is set to the highest-numbered
> position to which data was transferred by the statement."

Thank you very much indeed, Richard, for clarifying this point.

This is now http://gcc.gnu.org/PR35132 , by the way :-)

Clive Page

unread,
Feb 9, 2008, 1:00:35 PM2/9/08
to
In message <slrnfqn22g....@meiner.onlinehome.de>, Thomas Koenig
<tko...@netcologne.de> writes

>Thank you very much indeed, Richard, for clarifying this point.
>
>This is now http://gcc.gnu.org/PR35132 , by the way :-)

For those unfamiliar with stream I/O, I wrote some notes a little while
back which can be found at
http://www.star.le.ac.uk/~cgp/streamIO.html
(pending a proper reorganisation of my web area). This notes in a not
very conspicuous place that if you write in the middle of a Formatted
stream file you truncate the file at that point. But I think you should
be able to use unformatted stream I/O to do what you want; the main
penalty is having to handle end-of-record markers yourself.

--
Clive Page

0 new messages