I/O in application programs.

1 view
Skip to first unread message

Fred

unread,
Oct 16, 2000, 3:00:00 AM10/16/00
to

I am learning Unix. In the past I have written many
programs, most of which I coded in IBM mainframe assembly
language. Also I have programmed other machines, usually in
assembly. But assembly is not the only programming notation
I have used, and I (like nearly everyone else) use C when I
work with Unix.

I find that some of my past experience is transferable
to Unix. For example, the Unix handling of signals (and
especially signal masks) is much like the way hardware
handles interrupts. But a 100% transfer is not possible, nor
do I expect that. Therefore I am not surprised that a
difficulty has arisen. It pertains to I/O. There are a
number of ways in which I could try to explain this
difficulty, but the best way is by example, using assembly
language as a point of reference.

Suppose that we wish to write a simple application
program that repeatedly reads a record from a file,
computationally modifies the data in the record, and then
writes the changed record to another file. The program uses
an input buffer to read the data, a computational buffer for
modifying the data, and an output buffer to write the data.
In its basic schema, this program may be described in the
following sequence of steps. (For simplicity, I omit error
handling).

Step 1: Open the files.

Step 2: Commence an input operation on the input file.

Step 3: Tell the operating system to suspend this program's
execution until such moment as the most recently-initiated
input operation is finished.

Step 4: Test the most recently-completed input operation, to
determine whether EOF has been encountered. If so, go to
step 12.

Step 5: Move all data from the input buffer to the
computational buffer.

Step 6: Commence an input operation on the input file.

Step 7: Modify the data in the computational buffer.

Step 8: Tell the operating system to suspend this program's
execution until such moment as the most recently-initiated
output operation (if any) is finished.

Step 9: Move all data from the computational buffer to the
output buffer.

Step 10: Commence an output operation on the output file.

Step 11: Go to step 3.

Step 12: Tell the operating system to suspend this program's
execution until such moment as the most recently-initiated
output operation (if any) is finished.

Step 13: Close both files.

Step 14: Stop.


An IBM systems programmer might do I/O using the X'9C'
instruction, but the typical IBM application programmer uses
the EXCP macro to accomplish steps 2, 6, and 10. He uses the
WAIT macro to accomplish steps 3, 8, and 12. However, no one
need know IBM mainframe language in order to answer my Unix
question. He need only understand what I attempt to
accomplish by the fourteen steps listed above. I run three
kinds of procedure, which are input, output, and compute. I
attempt to run them concurrently, and to coordinate them so
that they do not interfere with one another. For example, I
do not want to alter the contents of the output buffer during
the time interval in which the output driver is obtaining
data from it. And I do not want to move data from the input
buffer to the computational buffer at a moment when the most
recent input operation has not finished filling the buffer.

Note that the above model does not require the
application program to handle any interrupts or to process
any signals. All such work is left in the hands of the
operating system. As far as I know, the above model is
consistent with standard techniques employed by application
programmers around the world. Can it be used by a C
programmer who works in the Unix environment? Probably so.
But I do not know how.

I assume that steps 2 and 5 can be implemented using the
Unix read call. I assume that step 10 can be implemented
using the Unix write call. I assume that my program can use
the O_NDELAY flag or the O_NONBLOCK flag (perhaps in
combination with other flags) to avoid "getting stuck" in
those steps. It appears, therefore, that steps 2, 5, and 10
present no real obstacle to implementation of the above
scheme.

But what is the Unix analog of IBM's WAIT macro? In
other words, how can I implement steps 3, 8, and 12? At
first glance it appears that I can do so via the Unix select
function, or perhaps poll. But can I? For guidance on this
point, let us turn to W. Richard Stevens's book entitled
Advanced Programming in the Unix Environment. (Addison-
Wesley, 1993.) This is not the most recent book in the
subject, but perhaps that's just as well. Not everyone has
the most recent release of Unix. If I want my program to be
portable, I should make it compatible with systems of
moderate age.

In section 12.5 of his book, Stevens treats the topic of
"I/O Multiplexing." Therein Stevens carefully cautions his
reader that "I/O multiplexing is not yet part of POSIX." In
other words, reference to the POSIX specification will not
answer my questions. I must determine how things work in the
real world. And because I wish to maximize the portability
of my program, I must not limit my inquiry to the workings of
my specific Unix platform. Rather, I must try to learn how
most Unix platforms do this kind of work.

At first glance, Stevens appears to offer real guidance.
The first sentence of section 12.5.1 says, "The select
function lets us do I/O multiplexing under both SVR4 and
4.3+BSD." That seems promising. Stevens goes on to explain
how file descriptors can be grouped into sets, so that one
call to the select function can be used on more than one
descriptor. Also he explains that each call can specify
three sets, one of which corresponds to each of three
conditions. Those explanations are quite clear. Then he
says, "A positive return value specifies the number of
descriptors that are ready. In this case the only bits left
on in the three descriptor sets are the bits corresponding to
the descriptors that are ready." From his remarks, I gather
that the select function will (optionally) suspend execution
of the application program until such moment as there is a
change in one or more of these bits.

So far, so good. I say this because, although the
above-quoted passages may appear cryptic when lifted out of
context (as I have done), they are clear enough when read in
combination with other things Stevens says about I/O
multiplexing. The problem comes a couple of paragraphs
later, where he says, "We now need to be more specific about
what 'ready' means." Stevens then proceeds to define "ready"
so that the word has no genuine meaning. He says, for
instance, that "A descriptor in the write set is considered
ready if a write to that descriptor won't block."

In passing, I wonder how the kernel can know for sure
that a future I/O operation won't block, without knowing the
details of the operation. What about an attempt to write a
truly humongous number of bytes? It seems to me that such an
attempt might block, unless the driver has a truly humongous
buffer. However, that is merely an aside. My real problem
is as follows.

No doubt it is nice to know that the next call to write
will not block, but that's not the same as knowing that the
previous call to write is finished. Similarly it is nice to
know that the next call to read will not block, but that's
not the same as knowing that the previous call to read is
finished, and much less is it the same as knowing that the
read did not encounter an EOF condition. In other words, the
select function offers a prediction about the future, but
that is not what I need at steps 3, 8, and 12. What I need
is not a prediction about the future, but rather a statement
about the present. For example, when my program arrives at
step 3, either the input operation is finished, or it's not.
If it's finished, I can safely copy data from the input
buffer to the computational buffer. If it's not finished, I
need to go to sleep until such time as it is finished.
Apparently, neither select nor poll meets this need.

Yes, Stevens says that the select function enables I/O
multiplexing under both SVR4 and 4.3+BSD. But he includes in
section 12.5 no example program showing how this can be
accomplished.

Does anyone know?

If so, please drop me a line at f_the...@yahoo.com.

Thank you.

mars...@hotmail.com

unread,
Oct 16, 2000, 3:00:00 AM10/16/00
to
In article <39EB25B2...@yahoo.com>,
Fred <f_the...@yahoo.com> wrote:

>(snip...)


>
> I am learning Unix. In the past I have written many
> programs, most of which I coded in IBM mainframe assembly
> language.

> (snip...)


> In its basic schema, this program may be described in the
> following sequence of steps. (For simplicity, I omit error
> handling).
>
> Step 1: Open the files.

open(), fopen(), etc.

> Step 2: Commence an input operation on the input file.

read()

> Step 3: Tell the operating system to suspend this program's
> execution until such moment as the most recently-initiated
> input operation is finished.

This is the default behaviour of read() and its ilk.

> Step 4: Test the most recently-completed input operation, to
> determine whether EOF has been encountered. If so, go to
> step 12.

Return value from read(), feof() for stream I/O.

> Step 5: Move all data from the input buffer to the
> computational buffer.
>
> Step 6: Commence an input operation on the input file.

Why do you want to do this at this time?

> Step 7: Modify the data in the computational buffer.
>
> Step 8: Tell the operating system to suspend this program's
> execution until such moment as the most recently-initiated
> output operation (if any) is finished.

Finished with respect to what? Copied out of your space?
Written to the hard drive? By default, write() won't return
until the data are safely tucked away in disc I/O buffers.

> Step 9: Move all data from the computational buffer to the
> output buffer.
>
> Step 10: Commence an output operation on the output file.
>
> Step 11: Go to step 3.
>
> Step 12: Tell the operating system to suspend this program's
> execution until such moment as the most recently-initiated
> output operation (if any) is finished.

fflush(), fclose(), close(), etc.

> Step 13: Close both files.
>
> Step 14: Stop.

> (snip...)

Unless there is a lot you're not telling, what you really
need is to change your mindset. It is simply not necessary to
asynchronously overlap I/O operations. The OS does it for you.

Laura Halliday VE7LDH "Que les nuages soient notre
Grid: CN89mg pied a terre..." - Hospital/Shafte


Sent via Deja.com http://www.deja.com/
Before you buy.

Grant Edwards

unread,
Oct 16, 2000, 3:00:00 AM10/16/00
to
In article <39EB25B2...@yahoo.com>, Fred wrote:

>In section 12.5 of his book, Stevens treats the topic of "I/O
>Multiplexing." Therein Stevens carefully cautions his reader
>that "I/O multiplexing is not yet part of POSIX." In other
>words, reference to the POSIX specification will not answer my
>questions. I must determine how things work in the real world.
>And because I wish to maximize the portability of my program, I
>must not limit my inquiry to the workings of my specific Unix
>platform. Rather, I must try to learn how most Unix platforms
>do this kind of work.

I don't think that Unix applications typically try to overlap
input/compute/output.

[ regarding select ]

>the descriptors that are ready." From his remarks, I gather
>that the select function will (optionally) suspend execution
>of the application program until such moment as there is a
>change in one or more of these bits.

Correct.

>So far, so good. I say this because, although the above-quoted
>passages may appear cryptic when lifted out of context (as I
>have done), they are clear enough when read in combination with
>other things Stevens says about I/O multiplexing. The problem
>comes a couple of paragraphs later, where he says, "We now need
>to be more specific about what 'ready' means." Stevens then
>proceeds to define "ready" so that the word has no genuine
>meaning. He says, for instance, that "A descriptor in the
>write set is considered ready if a write to that descriptor
>won't block."

If you're taling about regular file I/O, files are _always_
ready in Unix. Using select on a file-descriptor that is
associated with a regular file is not a useful thing to do.

File-descriptors associated with things like TCP connections
and serial ports may be not-ready, and select is useful for
them.

>In passing, I wonder how the kernel can know for sure that a
>future I/O operation won't block, without knowing the details
>of the operation.

It can't. If you try to write a block of data larger than can
be buffered, and you've got non-blocking I/O enabled, then the
driver will take as much as it can. You have to look at the
return value of write() to see how much data was actually
"consumed" by the driver. If you try to read more data than
available from a non-blocing descriptor then it will give you
however much it has available. You've got to check the return
value from read() to see how many bytes you received.

NB: A file descriptor is "ready" if there is at least 1 byte of
data to be read, or room for at least 1 byte of data. (Or
there's an error, or it's close, or ....).

>No doubt it is nice to know that the next call to write
>will not block, but that's not the same as knowing that the
>previous call to write is finished.

How do you define "finished?" Generally, you just leave the
file-descriptors in blocking mode and don't worry about it --
read() won't return until it's read as much data as you
requested, and write() won't return until it's written all of
the data you gave it.

>Similarly it is nice to know that the next call to read will
>not block, but that's not the same as knowing that the previous
>call to read is finished, and much less is it the same as
>knowing that the read did not encounter an EOF condition. In
>other words, the select function offers a prediction about the
>future, but that is not what I need at steps 3, 8, and 12. What
>I need is not a prediction about the future, but rather a
>statement about the present. For example, when my program
>arrives at step 3, either the input operation is finished, or
>it's not. If it's finished, I can safely copy data from the
>input buffer to the computational buffer. If it's not
>finished, I need to go to sleep until such time as it is
>finished. Apparently, neither select nor poll meets this need.

Unless you're writing a multi-threaded program, the easiest
thing to do is leave the descriptors in non-blocking mode
read(), compute(), write(). read() and write() both return the
number of bytes read/written. If you really want to overlap
the read/compute/write operations, you're going to have to have
multiple threads. In that case, use blocking read/write calls
in the I/O threads and and semaphores for inter-thread
synchronisation.

I'd recommend that you become proficient at single-threaded
Unix application programming before you try to do a
multiple-threaded implimentation.

--
Grant Edwards grante Yow! I'm using my X-RAY
at VISION to obtain a rare
visi.com glimpse of the INNER
WORKINGS of this POTATO!!

The Proximate Cluebat

unread,
Oct 16, 2000, 11:06:18 PM10/16/00
to
In article <8sfpio$3nd$1...@nnrp1.deja.com>, mars...@hotmail.com
wrote:
> In article <39EB25B2...@yahoo.com>,

> Fred <f_the...@yahoo.com> wrote:
> > In its basic schema, this program may be described in the
> > following sequence of steps. (For simplicity, I omit error
> > handling).
> >
> > Step 1: Open the files.
> open(), fopen(), etc.

Step 1.5: Lock the files to prevent race conditions or file
hosement. Use a shared lock (flock(fd, LOCK_SH)) when reading and
an exclusive lock (flock(fd, LOCK_EX)) when writing. (fd here is
your file descriptor.)

[ snip several steps -- no, Mr. Mainframe, you don't want to do I/O
like that under Unix, but that's already been discussed elsewhere ]

Step 12.5: Unlock the files. (flock(fd, LOCK_UN)).

> > Step 13: Close both files.


Because you, the programmer, do not know when another process might
want to I/O your files, nor do you know that only one copy of your
program could be running at once, it is your responsibility to lock
and to check for locks. Unix file locks are "advisory" -- meaning
that they are not mandatory -- however if your program and another
interfere with one another and yours is the one disregarding locks,
yours is the one with the bug.

Note that flock(3) is the 4.4BSD file locking mechanism, supported
by Linux and BSD-derived Unix systems; if you need to run on a
vanilla POSIX implementation as well, use lockf(3) instead.

Note also that you *can* achieve mandatory locks under Linux, but
you very well may not want to. See Documentation/mandatory.txt in
your Linux kernel source tree.

--
The Proximate Cluebat <curr...@out.of.order>

Reply all
Reply to author
Forward
0 new messages