Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Fork multiple readers on one file, multiple writers on another?

0 views
Skip to first unread message

Robert Dodier

unread,
Oct 6, 2001, 2:35:40 PM10/6/01
to
Hello all,

I have a pretty basic question which I hope you can help me with.
I've read a number of man pages & searched Google for links, but
I haven't been able to come up with a solution.

I want several processes (created by fork) to read one file and
write another file. Each process is assigned one block in the
input file and one block in the output file -- the blocks don't
overlap.

The problem is that with more than 1 process, the processes step
on each other's toes. I guess this must be because all i/o must
eventually go from/to the same physical file. I believe what's
directly causing trouble is a shared offset -- I want each process
to seek to its block in the input or output, and then read and
write from there; it doesn't seem to work that way, though.

What is the appropriate way to coordinate multiple readers &
writers which don't share memory? If I were using pthreads, I
could use a pthreads mutex to coordinate. However, I need to
have a non-shared heap, so pthreads is not feasible, I think.

The relevant snippet of code is shown below. I've tried moving
the fopen's before the fork loop, and I've tried putting flock
(on both files) around the reading & writing, but that doesn't
help.

Any suggestions? Thanks in advance. I appreciate it.

best,
Robert Dodier

PS. I'm working on Linux, but I need something to run on other Unices.

---------------- begin relevant code -----------------
for ( j = 0; j < nprocesses; j++ )
{
if ( (child_pid = fork()) == 0 )
{
(in = fopen( infile, "r" )) != 0 || die( infile );
(out = fopen( outfile, "a" )) != 0 || die( outfile );

fseek( in, j*bs, SEEK_SET );
fseek( out, j*bs, SEEK_SET );

for ( i = 0; i < bs; i++ )
fputc( fgetc(in), out );

if ( j == nprocesses-1 )
{
off_t rem = size - (j+1)*bs;

for ( i = 0; i < rem; i++ )
fputc( fgetc(in), out );
}

return 0;
}
}

Andrew Gierth

unread,
Oct 6, 2001, 2:59:00 PM10/6/01
to Robert Dodier
[comp.programming.threads removed as this is not a threads issue]

>>>>> "Robert" == Robert Dodier <rob...@athenesoft.com> writes:

Robert> Hello all,
Robert> I have a pretty basic question which I hope you can help me
Robert> with. I've read a number of man pages & searched Google for
Robert> links, but I haven't been able to come up with a solution.

Robert> I want several processes (created by fork) to read one file
Robert> and write another file. Each process is assigned one block in
Robert> the input file and one block in the output file -- the blocks
Robert> don't overlap.

Robert> The problem is that with more than 1 process, the processes
Robert> step on each other's toes. I guess this must be because all
Robert> i/o must eventually go from/to the same physical file.

No.

If you open the files before the fork, it's because all the processes
are sharing a single open of the file. Each call to open() gives you
an independent file offset (and some other bits), but when a
descriptor is copied via dup() or fork(), the offset is shared. (Note
that this is the underlying system idea of the offset, which is not
the same as the one used by the stdio code)

In the example code where you open the files after the fork, the
problem is that you're using append mode for the output file; all
writes in append mode occur at the _current_ end of file, not at the
offset that you've set with fseek. Use "r+" mode instead.

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Kaz Kylheku

unread,
Oct 6, 2001, 4:40:46 PM10/6/01
to
In article <23af61c2.01100...@posting.google.com>, Robert

Dodier wrote:
>The problem is that with more than 1 process, the processes step
>on each other's toes. I guess this must be because all i/o must
>eventually go from/to the same physical file. I believe what's
>directly causing trouble is a shared offset -- I want each process
>to seek to its block in the input or output, and then read and
>write from there; it doesn't seem to work that way, though.

The inherited file descriptors of a UNIX child process share
the same file object with the parent by reference. This is
similar to file descriptors duplicated with the dup*() functions.
If you don't want that, you must open the file multiple times.

Robert Dodier

unread,
Oct 6, 2001, 11:18:57 PM10/6/01
to
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:

> Robert> I want several processes (created by fork) to read one file
> Robert> and write another file. Each process is assigned one block in
> Robert> the input file and one block in the output file -- the blocks
> Robert> don't overlap.
>
> Robert> The problem is that with more than 1 process, the processes
> Robert> step on each other's toes. I guess this must be because all
> Robert> i/o must eventually go from/to the same physical file.
>
> No.

[...]


> In the example code where you open the files after the fork, the
> problem is that you're using append mode for the output file; all
> writes in append mode occur at the _current_ end of file, not at the
> offset that you've set with fseek. Use "r+" mode instead.

Thanks very much for your help, Andrew. Indeed, changing "a" to "r+"
fixes the problem. I also removed the flock calls. The correct output
is generated.

Having each process open the file gives each its own offset -- that
much I understand. However, I guess I still don't understand how this
can make the program work correctly -- what guarantees that when the
output buffer is dumped (within fputc) that the output goes at the
offset imagined by the process which called fputc?

Thanks again for your help. You are very perceptive to catch the
"a" versus "r+" problem!

best,
Robert Dodier

Andrew Gierth

unread,
Oct 7, 2001, 12:36:40 AM10/7/01
to Robert Dodier
>>>>> "Robert" == Robert Dodier <rob...@athenesoft.com> writes:

Robert> Having each process open the file gives each its own offset
Robert> -- that much I understand. However, I guess I still don't
Robert> understand how this can make the program work correctly --
Robert> what guarantees that when the output buffer is dumped (within
Robert> fputc) that the output goes at the offset imagined by the
Robert> process which called fputc?

In the case where a file was opened by the stdio library, and the
underlying file open isn't being shared with anything else, then the
stdio library's idea of what the underlying file offset is will always
be accurate.

Problems only arise when more than one stdio FILE* is referring to a
given underlying file open at any given time; then things get
confused.

Patrick Rabau

unread,
Oct 12, 2001, 7:58:25 PM10/12/01
to
k...@ashi.footprints.net (Kaz Kylheku) wrote in message news:<i%Jv7.46333$ob.12...@news1.rdc1.bc.home.com>...

As an alternative, you can replace the use of the stdio library by
calls to the pread() and pwrite() system calls. These basically
combine the seek and the read/write in one atomic operation.

Patrick

0 new messages