Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what is "unbuffered" about low level I/O read() system call

916 views
Skip to first unread message

sachin...@gmail.com

unread,
Jul 7, 2018, 8:42:06 AM7/7/18
to
Hello,
I am new to Unix Programming and came across various low level I/O calls and Standard I/O calls for my project work.

For low level I/O calls it is referred as unbuffered I/O whereas for standard I/O call it is referred as buffered I/O.

I am unable to figure out, what is so "unbuffered" about read() call as I read about this in one of the text "Advanced Unix Programming Environment" from W. Richard Stevens Chapter 3.

When I see the the declaration of read() call, it accepts 2nd parameter as pointer to a buffer and 3rd parameter as buffer size as taken from one of the standard Unix programming book.

ssize_t read (int fildes, void *buff, size_t nbytes);

i.e. it seems a buffered I/O like standard I/Os fread/fgets that accepts one of the parameter as pointer to a buffer.

Any help in this regard is appreciated.

Regards,
Sachin


Nicolas George

unread,
Jul 7, 2018, 8:57:32 AM7/7/18
to
sachin...@gmail.com, dans le message
<c1e36f94-f931-488d...@googlegroups.com>, a écrit :
> I am unable to figure out, what is so "unbuffered" about read() call
> as I read about this in one of the text "Advanced Unix Programming
> Environment" from W. Richard Stevens Chapter 3.

Each call to read() results directly in a system call to read on the
file descriptor. For special files, like devices, it can have strange
consequences. For regular files, it is a normal read, but system calls
are expensive.

Compare to calls to fread(), which result in actual system calls to fill
a buffer within the FILE structure, but then read directly from that
buffer.

If you do a thousand read()s of size 1, it results in a thousand system
calls. If you do a thousand fread()s of size 1, it results probably in a
single system call to read maybe 4k, and then a thousand simple copies
from the internal buffer to the result buffer.

learner

unread,
Jul 7, 2018, 9:36:39 AM7/7/18
to
Thanks Nicolas,
I think this makes me understand it better. So in standard I/O there is an implicit buffering through FILE structure, which is missing in case of direct read() call right?. So is it right to say that standard I/O is efficient compared to direct I/O call and we should use it over direct low level I/O.

Regards,
Sachin

Marcel Mueller

unread,
Jul 7, 2018, 10:32:57 AM7/7/18
to
On 07.07.18 15.36, learner wrote:
> So in standard I/O there is an implicit buffering through FILE structure, which is missing in case of direct read() call right?.

exactly

> So is it right to say that standard I/O is efficient compared to direct I/O call and we should use it over direct low level I/O.

Mostly yes, but it depends. E.g. when reading from a tape device the
block size (i.e. the number of bytes read with one system call) is quite
essential. In this case using fread could have unwanted effects.


Marcel

Kaz Kylheku

unread,
Jul 7, 2018, 11:58:50 AM7/7/18
to
On 2018-07-07, sachin...@gmail.com <sachin...@gmail.com> wrote:
> Hello,
> I am new to Unix Programming and came across various low level I/O calls and Standard I/O calls for my project work.
>
> For low level I/O calls it is referred as unbuffered I/O whereas for standard I/O call it is referred as buffered I/O.
>
> I am unable to figure out, what is so "unbuffered" about read() call as I read about this in one of the text "Advanced Unix Programming Environment" from W. Richard Stevens Chapter 3.

The read interface isn't "unbuffered". It's abstracted from the issue of
buffering. Buffering is device dependent. If you read a byte from a
file, for sure the kernel buffers a whole page of it, so that a read
read doesn't access the storage device.

Another kind of device doesn't buffer. If you read just half of a UDP
datagram from a socket, the other half is tossed.

In other words, the abstraction layer *itself* doesn't buffer; it just
passes control to underlying file/device drivers, which have varying
amounts and kinds of buffering. (Or even device-specific ways to
manipulate buffers: for instance with TTY devices, we can use a special
function to flush unread bytes.)

A buffered I/O abstraction layer, such as the C standard I/O library,
is called buffered because it itself provides buffering independently
of the underlying layers.

Siri Cruise

unread,
Jul 7, 2018, 12:45:11 PM7/7/18
to
In article <9ad2d89c-3162-4ed0...@googlegroups.com>,
learner <sachin...@gmail.com> wrote:

> I think this makes me understand it better. So in standard I/O there is an
> implicit buffering through FILE structure, which is missing in case of direct
> read() call right?. So is it right to say that standard I/O is efficient
> compared to direct I/O call and we should use it over direct

Depends. stdio does system call inside, but usually less frequently than if you
read and write. However the kernel doesn't know about stdio buffers; if you
abort without flushing output buffer, the output you need to diagnose the abort
can evaporate. You can change stdio bufferring to alleviate this....mostly. All
stdio has hidden state that can become corrupted during an abort making stdio
unusable. When write completes, the kernel has the output with guaranted
delivery, and the kernel state is protected from userland state.

I do the following when I get fatal signal. I avoid the C library as much as
possible until critical information is written to the kernel:

static char (*signame)[] = {"SIG0", ..., "SIG31"};
static char m1[] = "=====signal received ";
char m2[2] = {'0'+sig/10, '0'+sig%10};
static char m3[] = "=";
static char m4[] = "=====\n";
static char m5[] = "===============\n";
write(2, m1, (sizeof m1)-1);
write(2, m2, (sizeof m2));
write(2, m3, (sizeof m3)-1);
write(2, signame[sig], strlen(signame[sig]));
write(2, m4, (sizeof m4)-1);
signal(sig, SIG_DFL);
void *bindump[500]; int n = backtrace(bindump, 500);
backtrace_symbols_fd(bindump, n, 2);
write(2, m5, (sizeof m5)-1);
...
report error with stdio or whatever
...
report global variables
...
exit(sig)

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
I'm saving up to buy the Donald a blue stone This post / \
from Metebelis 3. All praise the Great Don! insults Islam. Mohammed

learner

unread,
Jul 7, 2018, 1:13:25 PM7/7/18
to
On Saturday, July 7, 2018 at 8:02:57 PM UTC+5:30, Marcel Mueller wrote:
> On 07.07.18 15.36, learner wrote:
> > So in standard I/O there is an implicit buffering through FILE structure, which is missing in case of direct read() call right?.
>
> exactly
>

Thanks Marcel.
> > So is it right to say that standard I/O is efficient compared to direct I/O call and we should use it over direct low level I/O.
>
> Mostly yes, but it depends. E.g. when reading from a tape device the
> block size (i.e. the number of bytes read with one system call) is quite
> essential. In this case using fread could have unwanted effects.
>

I did not quite understand this as I am not much aware of tape device. I will be using files on disk. What is so specific about block size for tape device. Isn't standard I/O library uses the optimal I/O block size, so fread should be fine.
>
> Marcel

learner

unread,
Jul 7, 2018, 1:19:52 PM7/7/18
to
On Saturday, July 7, 2018 at 9:28:50 PM UTC+5:30, Kaz Kylheku wrote:
> On 2018-07-07, sachin...@gmail.com <sachin...@gmail.com> wrote:
> > Hello,
> > I am new to Unix Programming and came across various low level I/O calls and Standard I/O calls for my project work.
> >
> > For low level I/O calls it is referred as unbuffered I/O whereas for standard I/O call it is referred as buffered I/O.
> >
> > I am unable to figure out, what is so "unbuffered" about read() call as I read about this in one of the text "Advanced Unix Programming Environment" from W. Richard Stevens Chapter 3.
>
> The read interface isn't "unbuffered". It's abstracted from the issue of
> buffering. Buffering is device dependent. If you read a byte from a
> file, for sure the kernel buffers a whole page of it, so that a read
> read doesn't access the storage device.
>
Thanks Kaz for explanation.
> Another kind of device doesn't buffer. If you read just half of a UDP
> datagram from a socket, the other half is tossed.
>
I did not understand this. Is there any reference I can follow?

Kaz Kylheku

unread,
Jul 7, 2018, 2:40:02 PM7/7/18
to
On 2018-07-07, learner <sachin...@gmail.com> wrote:
> On Saturday, July 7, 2018 at 9:28:50 PM UTC+5:30, Kaz Kylheku wrote:
>> On 2018-07-07, sachin...@gmail.com <sachin...@gmail.com> wrote:
>> > Hello,
>> > I am new to Unix Programming and came across various low level I/O calls and Standard I/O calls for my project work.
>> >
>> > For low level I/O calls it is referred as unbuffered I/O whereas for standard I/O call it is referred as buffered I/O.
>> >
>> > I am unable to figure out, what is so "unbuffered" about read() call as I read about this in one of the text "Advanced Unix Programming Environment" from W. Richard Stevens Chapter 3.
>>
>> The read interface isn't "unbuffered". It's abstracted from the issue of
>> buffering. Buffering is device dependent. If you read a byte from a
>> file, for sure the kernel buffers a whole page of it, so that a read
>> read doesn't access the storage device.
>>
> Thanks Kaz for explanation.
>> Another kind of device doesn't buffer. If you read just half of a UDP
>> datagram from a socket, the other half is tossed.
>>
> I did not understand this. Is there any reference I can follow?

Some devices will throw away data that is not read. For instance,
raw record-oriented tape drives in classic Unix. If tape blocks are 512
bytes and you do a 300 byte read, the device will read a block, give
you 300 bytes of it, and throw away the rest. Datagram sockets are like
that. If a 500 byte packet is received and you read (or recv/recvfrom)
using a 200 byte buffer, you lose the remaining 300 bytes. The remaining
bytes are not kept for you in a buffer where you could pick them up by
a subsequent read.

That is different compared to reading from a stream-like device (file,
serial port, stream socket).

The read API doesn't shield you from such device differences.

Nicolas George

unread,
Jul 7, 2018, 3:10:50 PM7/7/18
to
learner , dans le message
<9ad2d89c-3162-4ed0...@googlegroups.com>, a écrit :
> So is it right to say that standard I/O is efficient compared to
> direct I/O call and we should use it over direct low level I/O..

No, it is not right. If it was right, only one API would have been
provided. The choice is a matter of balance between control and
simplicity. Choose whichever is best for your own use case.

Joe Pfeiffer

unread,
Jul 7, 2018, 7:34:12 PM7/7/18
to
It depends on what you're doing. If you want to read 1000 individual
bytes, as in the previous response, it's likely that standard IO will be
more efficient. If you want to read a 4K record into an array, it's
likely low-level IO will be more efficient.

Joe Pfeiffer

unread,
Jul 7, 2018, 7:39:11 PM7/7/18
to
Though this is really more detail than the OP needs for his question.
Moving from file IO to device IO or networking introduces a *host* of
additional issues that don't need to be considered here.

Joe Pfeiffer

unread,
Jul 7, 2018, 7:41:13 PM7/7/18
to
It's been decades since I had anything to do with tape, but if the
standard IO library read anything other than an integer number of tape
blocks (assuming the user hadn't messed with buffer sizes or something)
I would regard it as a bug.

Barry Margolin

unread,
Jul 7, 2018, 7:49:20 PM7/7/18
to
In article <dec7d278-2355-43d8...@googlegroups.com>,
learner <sachin...@gmail.com> wrote:

> On Saturday, July 7, 2018 at 8:02:57 PM UTC+5:30, Marcel Mueller wrote:
> > On 07.07.18 15.36, learner wrote:
> > > So in standard I/O there is an implicit buffering through FILE structure,
> > > which is missing in case of direct read() call right?.
> >
> > exactly
> >
>
> Thanks Marcel.
> > > So is it right to say that standard I/O is efficient compared to direct
> > > I/O call and we should use it over direct low level I/O.
> >
> > Mostly yes, but it depends. E.g. when reading from a tape device the
> > block size (i.e. the number of bytes read with one system call) is quite
> > essential. In this case using fread could have unwanted effects.
> >
>
> I did not quite understand this as I am not much aware of tape device.

Magnetic tape is what you used to backup your mainframes 30 years ago.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Barry Margolin

unread,
Jul 7, 2018, 7:54:39 PM7/7/18
to
In article <1befges...@pfeifferfamily.net>,
But the difference in the latter case is likely to be very small. The
two operations are essentially:

Low-level I/O:

Read from disk to kernel buffer
Copy from kernel buffer to application buffer

Standard I/O:

Read from disk to kernel buffer
Copy from kernel buffer to stdio buffer
Copy from stdio buffer to application buffer

The last step of copying 4KB from between two buffers is very fast. The
only real bottleneck is the first step of reading from the disk (and the
OS often does read-ahead so there's no delay waiting for this).

Ian Pilcher

unread,
Jul 8, 2018, 11:11:48 AM7/8/18
to
On 07/07/2018 08:36 AM, learner wrote:
> I think this makes me understand it better. So in standard I/O there
> is an implicit buffering through FILE structure, which is missing in
> case of direct read() call right?. So is it right to say that
> standard I/O is efficient compared to direct I/O call and we should
> use it over direct low level I/O.

Reading the setbuf(3) man page might be helpful.

Also keep in mind that the your operating system will generally do its
own buffering of data written with write(2), so data written to a file
with fwrite(3) will generally be buffered in (at least) 2 different
places. (See O_DIRECT and O_SYNC in the open(2) man page.)

--
========================================================================
Ian Pilcher arequ...@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================

Scott Lurndal

unread,
Jul 9, 2018, 10:32:02 AM7/9/18
to
learner <sachin...@gmail.com> writes:
>On Saturday, July 7, 2018 at 6:27:32 PM UTC+5:30, Nicolas George wrote:
>
>> > I am unable to figure out, what is so "unbuffered" about read() call
>> > as I read about this in one of the text "Advanced Unix Programming
>> > Environment" from W. Richard Stevens Chapter 3. =20
>>=20
>> Each call to read() results directly in a system call to read on the
>> file descriptor. For special files, like devices, it can have strange
>> consequences. For regular files, it is a normal read, but system calls
>> are expensive.
>>=20
>> Compare to calls to fread(), which result in actual system calls to fill
>> a buffer within the FILE structure, but then read directly from that
>> buffer.
>>=20
>> If you do a thousand read()s of size 1, it results in a thousand system
>> calls. If you do a thousand fread()s of size 1, it results probably in a
>> single system call to read maybe 4k, and then a thousand simple copies
>> from the internal buffer to the result buffer.
>
>Thanks Nicolas,
>I think this makes me understand it better. So in standard I/O there is an =
>implicit buffering through FILE structure, which is missing in case of dire=
>ct read() call right?. So is it right to say that standard I/O is efficient=
> compared to direct I/O call and we should use it over direct low level I/O=
>.

Actual physical I/O requests are often constrained in both size and
starting offset. For example, disk reads always start at the beginning
of a sector (typically 512 or 4096 bytes) and read an integral number of
sectors.

Because the read system call doesn't have those constraints, even the read
system call data will be buffered (in kernel DMA buffers and/or the page cache
depending on your operating system) so that the data can be copied into
the correct user-space buffer by the read system call.

In the days of the Stevens books, the read system call could be used to directly
read from disk into the application (bypassing any kernel buffering) by using
the raw disk device (e.g. /dev/rc0d0p0 instead of /dev/c0d0p0). If the starting
address of the application buffer wasn't correctly aligned on a
sector boundary, the read system call would fail. This was the most efficient
form of disk I/O and was heavily used by e.g. Oracle and other database management
systems.

Linux supports an open(2) flag (O_DIRECT) which provides similar (but not identical)
capabilities.

The mmap(2) system call avoids the read system call entirely and maps the file
directly to the application address space using the standard paging structures.

Scott Lurndal

unread,
Jul 9, 2018, 10:36:15 AM7/9/18
to
Tape blocks can be variable length, depending on the media. DECtape had fixed block sizes,
but most standard 7-track, 9-track and 18-track magnetic tapes might start with a couple
of 80-byte ASCII records (volume header, file header), then a series of data blocks
(perhaps 80 byte records blocked 9 to a block), then a tape mark (terminating the file),
another 80-byte HDR1 record, more file data, a tape-mark then a second tape mark to
terminate the volume. The hardware would terminate a read after
the block had been consumed, regardless of the size of the receiving buffer in memory. If
the receiving buffer was too small, a short read would occur (handling of which is OS
dependent).

Scott Lurndal

unread,
Jul 9, 2018, 10:37:10 AM7/9/18
to
Barry Margolin <bar...@alum.mit.edu> writes:
>In article <dec7d278-2355-43d8...@googlegroups.com>,
> learner <sachin...@gmail.com> wrote:
>
>> On Saturday, July 7, 2018 at 8:02:57 PM UTC+5:30, Marcel Mueller wrote:
>> > On 07.07.18 15.36, learner wrote:
>> > > So in standard I/O there is an implicit buffering through FILE structure,
>> > > which is missing in case of direct read() call right?.
>> >
>> > exactly
>> >
>>
>> Thanks Marcel.
>> > > So is it right to say that standard I/O is efficient compared to direct
>> > > I/O call and we should use it over direct low level I/O.
>> >
>> > Mostly yes, but it depends. E.g. when reading from a tape device the
>> > block size (i.e. the number of bytes read with one system call) is quite
>> > essential. In this case using fread could have unwanted effects.
>> >
>>
>> I did not quite understand this as I am not much aware of tape device.
>
>Magnetic tape is what you used to backup your mainframes 30 years ago.

Used for far more than backups. Watching a 12-unit sort-merge (with drives
that supported read-backwards and write-backwards) was impressive.

Scott Lurndal

unread,
Jul 9, 2018, 10:39:19 AM7/9/18
to
Barry Margolin <bar...@alum.mit.edu> writes:
>In article <1befges...@pfeifferfamily.net>,
> Joe Pfeiffer <pfei...@cs.nmsu.edu> wrote:

>> It depends on what you're doing. If you want to read 1000 individual
>> bytes, as in the previous response, it's likely that standard IO will be
>> more efficient. If you want to read a 4K record into an array, it's
>> likely low-level IO will be more efficient.
>
>But the difference in the latter case is likely to be very small. The
>two operations are essentially:
>
>Low-level I/O:
>
>Read from disk to kernel buffer
>Copy from kernel buffer to application buffer
>
>Standard I/O:
>
>Read from disk to kernel buffer
>Copy from kernel buffer to stdio buffer
>Copy from stdio buffer to application buffer
>
>The last step of copying 4KB from between two buffers is very fast.

Actually, it's horribly slow, pollutes the caches with data that may
never be used, and is avoided in most high-performance I/O applications.

Hence things like user-mode RDMA.

Rainer Weikusat

unread,
Jul 9, 2018, 12:08:23 PM7/9/18
to
learner <sachin...@gmail.com> writes:
> On Saturday, July 7, 2018 at 6:27:32 PM UTC+5:30, Nicolas George wrote:
>> > I am unable to figure out, what is so "unbuffered" about read() call
>> > as I read about this in one of the text "Advanced Unix Programming
>> > Environment" from W. Richard Stevens Chapter 3.
>>
>> Each call to read() results directly in a system call

[...]

>> Compare to calls to fread(), which result in actual system calls to fill
>> a buffer within the FILE structure, but then read directly from that
>> buffer.

[...]

> So is it right to say that standard I/O is efficient compared to
> direct I/O call and we should use it over direct low level I/O.

stdio is supposed to be an abstraction providing an efficent way to use
character stream semantics based on the block-based system call
interface. Compare these two program for counting the number of lines
which can be read from stdin:

------
#include <stdio.h>
#include <unistd.h>

int main(void)
{
char c;
size_t nr;
unsigned count;

count = 0;
while ((nr = read(0, &c, 1) > 0))
if (c == '\n') ++count;

printf("%u\n", count);

return 0;
}
-------

vs

------
#include <stdio.h>
#include <unistd.h>

int main(void)
{
unsigned count;
int c;

count = 0;
while ((c = getchar()) != EOF)
if (c == '\n') ++count;

printf("%u\n", count);

return 0;
}
------

stdio also provides formatted output of non-character data (like the line
count above).

Ben Bacarisse

unread,
Jul 9, 2018, 3:23:36 PM7/9/18
to
Rainer Weikusat <rwei...@talktalk.net> writes:
<snip>
> stdio is supposed to be an abstraction providing an efficent way to use
> character stream semantics based on the block-based system call
> interface. Compare these two program for counting the number of lines
> which can be read from stdin:
>
> ------
> #include <stdio.h>
> #include <unistd.h>
>
> int main(void)
> {
> char c;
> size_t nr;
> unsigned count;
>
> count = 0;
> while ((nr = read(0, &c, 1) > 0))

You have a typo with the parentheses.

But, more significantly, read returns an ssize_t so that it can signal
an error using -1. The assignment to nr loses that information but,
since you don't use nr, you can just write:

char c;
unsigned count = 0;
while (read(0, &c, 1) > 0)
if (c == '\n') ++count;


> if (c == '\n') ++count;
>
> printf("%u\n", count);
>
> return 0;
> }
> -------
>
> vs
>
> ------
> #include <stdio.h>
> #include <unistd.h>

(No need for unistd.h here.)

> int main(void)
> {
> unsigned count;
> int c;
>
> count = 0;
> while ((c = getchar()) != EOF)
> if (c == '\n') ++count;
>
> printf("%u\n", count);
>
> return 0;
> }
> ------
>
> stdio also provides formatted output of non-character data (like the line
> count above).

stdio many also be more portable. There could be C implementations that
don't provide the Unix-like read and write calls, but they should all
provide the stdio interface.

--
Ben.

Miquel van Smoorenburg

unread,
Jul 9, 2018, 5:42:16 PM7/9/18
to
In article <psK0D.117492$7H4....@fx12.iad>,
Scott Lurndal <sl...@pacbell.net> wrote:
>>The last step of copying 4KB from between two buffers is very fast.
>
>Actually, it's horribly slow, pollutes the caches with data that may
>never be used, and is avoided in most high-performance I/O applications.
>Hence things like user-mode RDMA.

Well yes, if all that your application does is to read data from
an I/O device at 20 gigabytes/second, every copy counts.

But for reading an .mp4 file from disk in order to decode
and play it on screen, the time that the copy takes is basically
lost in the noise. Don't worry about it.

Mike.

Rainer Weikusat

unread,
Jul 10, 2018, 7:22:11 AM7/10/18
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> Rainer Weikusat <rwei...@talktalk.net> writes:
> <snip>
>> stdio is supposed to be an abstraction providing an efficent way to use
>> character stream semantics based on the block-based system call
>> interface. Compare these two program for counting the number of lines
>> which can be read from stdin:
>>
>> ------
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> int main(void)
>> {
>> char c;
>> size_t nr;
>> unsigned count;
>>
>> count = 0;
>> while ((nr = read(0, &c, 1) > 0))
>
> You have a typo with the parentheses. But, more significantly, read
> returns an ssize_t so that it can signal an error using -1. The
> assignment to nr loses that information but,

Actually, no: nr will be 1 if read returned a value > 0 and 0
otherwise. That's arguably accidentally correct but correct
nevertheless. But as the point of the example was just to demonstrate
systemcall/byte vs stdio character streams, that's not really important.

The point of stdio is (or rather was) mainly that people don't have to
write code like

----
#include <stdio.h>
#include <unistd.h>

int main(void)
{
char buf[4096];
int nr;
unsigned count;

count = 0;
while ((nr = read(0, buf, sizeof buf)) > 0)
do
if (buf[--nr] == '\n') ++count;
while (nr);

printf("%u\n", count);

return 0;
}
----

and don't even need to understand why they should have been doing this.

The most important drawback is that stdio-I/O doesn't happen in
real-time.

[...]

> stdio many also be more portable. There could be C implementations that
> don't provide the Unix-like read and write calls, but they should all
> provide the stdio interface.

Or less portable: Depending on 'historicalness' of an implementation,
the total number of open stdio streams may be restricted to some weird,
small number, eg, 256.

Scott Lurndal

unread,
Jul 10, 2018, 9:49:33 AM7/10/18
to
"Miquel van Smoorenburg" <miq...@netscum.invalid> writes:
>In article <psK0D.117492$7H4....@fx12.iad>,
>Scott Lurndal <sl...@pacbell.net> wrote:
>>>The last step of copying 4KB from between two buffers is very fast.
>>
>>Actually, it's horribly slow, pollutes the caches with data that may
>>never be used, and is avoided in most high-performance I/O applications.
>>Hence things like user-mode RDMA.
>
>Well yes, if all that your application does is to read data from
>an I/O device at 20 gigabytes/second, every copy counts.

Around here, the machines (hundreds of them) are running at
70-80% cpu utilization (simulation farms). Every cycle matters.

>
>But for reading an .mp4 file from disk in order to decode
>and play it on screen, the time that the copy takes is basically
>lost in the noise. Don't worry about it.

This is, of course, the rationale for most programmatic bloat :-)

Barry Margolin

unread,
Jul 10, 2018, 11:31:09 AM7/10/18
to
In article <JP21D.465411$bS4.3...@fx01.iad>,
On the other hand, "premature optimization is the root of all evil".

Rainer Weikusat

unread,
Jul 10, 2018, 12:17:20 PM7/10/18
to
Barry Margolin <bar...@alum.mit.edu> writes:
> sc...@slp53.sl.home (Scott Lurndal) wrote:
>> "Miquel van Smoorenburg" <miq...@netscum.invalid> writes:

[...]

>> >But for reading an .mp4 file from disk in order to decode
>> >and play it on screen, the time that the copy takes is basically
>> >lost in the noise. Don't worry about it.
>>
>> This is, of course, the rationale for most programmatic bloat :-)
>
> On the other hand, "premature optimization is the root of all evil".

Out-of-context quotes are the root of much nonsense: This is from TAOCP
and specifically refers to trying to write an 'optimal' machine code
program from scratch instead of focussing on creating a working one
first.

This makes a lot of sense because the effect of many machine code
'optimizations' is miniscule and the working program might be fast
enough (and working -- another very nice property).

Valentin Nechayev

unread,
Jul 15, 2018, 10:50:02 AM7/15/18
to
Scott Lurndal <sc...@slp53.sl.home> wrote to comp.unix.programmer:

>>But the difference in the latter case is likely to be very small. The
>>two operations are essentially:
>>
>>Low-level I/O:
>>
>>Read from disk to kernel buffer
>>Copy from kernel buffer to application buffer
>>
>>Standard I/O:
>>
>>Read from disk to kernel buffer
>>Copy from kernel buffer to stdio buffer
>>Copy from stdio buffer to application buffer
>>
>>The last step of copying 4KB from between two buffers is very fast.

SL> Actually, it's horribly slow, pollutes the caches with data that may
SL> never be used, and is avoided in most high-performance I/O applications.

Really,
1) fread() isn't obliged to do this additional copying, it only _can_.
Direct passing to application buffer (after filling of already got data
from stdio buffer) is a used variant.
2) When doing per-character processing, its time is mixed into target
processing and so doesn't affect summary time on a typical modern processor.
(Yep, these cases are principally different.)

SL> Hence things like user-mode RDMA.

The things like RDMA avoid yet another copying - from a device to a kernel
buffer, just directly to userspace buffer.


-netch-

Scott Lurndal

unread,
Jul 16, 2018, 9:41:41 AM7/16/18
to
ne...@netch.kiev.ua (Valentin Nechayev) writes:
>Scott Lurndal <sc...@slp53.sl.home> wrote to comp.unix.programmer:
>
>>>But the difference in the latter case is likely to be very small. The
>>>two operations are essentially:
>>>
>>>Low-level I/O:
>>>
>>>Read from disk to kernel buffer
>>>Copy from kernel buffer to application buffer
>>>
>>>Standard I/O:
>>>
>>>Read from disk to kernel buffer
>>>Copy from kernel buffer to stdio buffer
>>>Copy from stdio buffer to application buffer
>>>
>>>The last step of copying 4KB from between two buffers is very fast.
>
>SL> Actually, it's horribly slow, pollutes the caches with data that may
>SL> never be used, and is avoided in most high-performance I/O applications.
>
>Really,
>1) fread() isn't obliged to do this additional copying, it only _can_.
>Direct passing to application buffer (after filling of already got data
>from stdio buffer) is a used variant.

This doesn't follow. Data is first DMA'd from disk to a kernel buffer.
Data is second copied (by the kernel) from the DMA buffer to the STDIO
buffer (the first copy). Data is then copied from the STDIO buffer to
the buffer provided to the fread system call. getc() will avoid the
copy by fetching directly from the STDIO buffer, but with a significant
performance penalty for accessing byte-by-byte.

But fread() must copy the data to the buffer indicated by the
first argument to fread.

Barry Margolin

unread,
Jul 17, 2018, 10:46:24 AM7/17/18
to
In article <mg13D.253450$IF4.1...@fx14.iad>,
In some cases fread() can skip the step of reading into the stdio buffer.

Suppose the stdio buffer is 4K and you ask to read 8K. Rather than doing
2 iterations of:

read(4K) into stdio buffer
copy to caller's buffer

it could simply read(8K) directly into caller's buffer.

If it already has something buffered, it can copy that from the stdio
buffer to the caller, then perform a direct read() of the rest.

Scott Lurndal

unread,
Jul 17, 2018, 11:51:52 AM7/17/18
to
If one is reading that much with fread, one should probably
be using read directly (or mmap) rather than relying on undefined
behaviour.

james...@alumni.caltech.edu

unread,
Jul 17, 2018, 5:00:41 PM7/17/18
to
On Tuesday, July 17, 2018 at 11:51:52 AM UTC-4, Scott Lurndal wrote:
> Barry Margolin <bar...@alum.mit.edu> writes:
...
> >In some cases fread() can skip the step of reading into the stdio buffer.
> >
> >Suppose the stdio buffer is 4K and you ask to read 8K. Rather than doing
> >2 iterations of:
> >
> >read(4K) into stdio buffer
> >copy to caller's buffer
> >
> >it could simply read(8K) directly into caller's buffer.
> >
> >If it already has something buffered, it can copy that from the stdio
> >buffer to the caller, then perform a direct read() of the rest.
>
> If one is reading that much with fread, one should probably
> be using read directly (or mmap) rather than relying on undefined
> behaviour.

There's nothing undefined about the behavior - it's not even unspecified.
It's just a permitted optimization.

Rainer Weikusat

unread,
Jul 17, 2018, 5:19:55 PM7/17/18
to
The fread description from the C99 definition is

The fread function reads, into the array pointed to by ptr, up
to nmemb elements whose size is specified by size, from the
stream pointed to by stream. For each object, size calls are
made to the fgetc function and the results stored, in the order
read, in an array of unsigned char exactly overlaying the object

This means it's even unclear if this attempt at transparent
turd-polishing is at least permitted.

JFTR: My opinion on this is that calls to fread or fwrite are
essentially mistaken and should be avoided. The read/write interface is
just fine for reading data in fixed-size blocks > 1.



James Kuyper

unread,
Jul 17, 2018, 10:51:40 PM7/17/18
to
Of course it is - the difference between what is specified by the
standard (multiple calls to fgetc()) and what is done with this
optimization (direct transfer of an entire block of data) has no impact
on what the standard calls "the observable behavior" (5.1.2.3p6) of the
program, and it is therefore a permitted optimization. It makes the data
transfer faster, but that's not "observable behavior". No data is
written to the internal buffer, but that's also not "observable
behavior". The FILE object must (directly or indirectly) contain
pointers/counters that keep track of what's in the buffer and what's in
the file, and those need to be updated accordingly, and they will
contain different values than they would after a string of fgetc()
calls, but that's also not "observable behavior".

William Ahern

unread,
Jul 18, 2018, 4:45:10 AM7/18/18
to
Error persistence semantics that allow you to lazily check ferror and feof
are useful when interleaving I/O among several files handles or in similar
complex scenarios (e.g. nested loops). For years I didn't appreciate the
benefit, but then it hit me one day while diagnosing a dropped ETIMEDOUT
condition on a socket that resulted in lost data. Now I often duplicate the
semantics in my own APIs.

Using FILE handles allows callers to make use of fmemopen and open_memstream
(or GNU fopencookie or BSD funopen), permitting one to unify stream and
string APIs with a standard data type.

Large block reads are often premature optimization. fgetc is an
underutilized function. I've seen many ad hoc parsers manually do buffering
and pointer walking (often including buggy boundary logic) when using stdio
and fgetc would have dramatically simplified the code without loss of
performance. I submitted a patch to OpenBSD Make earlier this year to fix a
bug that failed to fold lines with trailing comments. The bug was inside a
routine that manually buffered to optimize the skipping of comments. My
patch fixed the immediate bug but I had half a mind to remove the manual
buffering as the necessary behavior was lost inside the tangle of pointer
and token conditionals. Instead I decided to let sleeping dogs lie.

William Ahern

unread,
Jul 18, 2018, 4:48:25 AM7/18/18
to
James Kuyper <james...@alumni.caltech.edu> wrote:
> On 07/17/2018 05:19 PM, Rainer Weikusat wrote:
<snip>
>> The fread description from the C99 definition is
>>
>> The fread function reads, into the array pointed to by ptr, up
>> to nmemb elements whose size is specified by size, from the
>> stream pointed to by stream. For each object, size calls are
>> made to the fgetc function and the results stored, in the order
>> read, in an array of unsigned char exactly overlaying the object
>>
>> This means it's even unclear if this attempt at transparent
>> turd-polishing is at least permitted.
>
> Of course it is - the difference between what is specified by the
> standard (multiple calls to fgetc()) and what is done with this
> optimization (direct transfer of an entire block of data) has no impact
> on what the standard calls "the observable behavior" (5.1.2.3p6) of the
> program, and it is therefore a permitted optimization. It makes the data
> transfer faster, but that's not "observable behavior". No data is
> written to the internal buffer, but that's also not "observable
> behavior". The FILE object must (directly or indirectly) contain
> pointers/counters that keep track of what's in the buffer and what's in
> the file, and those need to be updated accordingly, and they will
> contain different values than they would after a string of fgetc()
> calls, but that's also not "observable behavior".

In POSIX doesn't fread behave as if calling flockfile and funlockfile when
entering and leaving the function, respectively? That makes the behavior
different than a series of fgetc calls in the presence of threading, and
particularly noticeable on line-buffered stderr.

James Kuyper

unread,
Jul 18, 2018, 7:52:17 AM7/18/18
to
Yes. The description for flockfile() and funlockfile() says "All
functions that reference (FILE *) objects, except those with names
ending in _unlocked, shall behave as if they use flockfile() and
funlockfile() internally to obtain ownership of these (FILE *) objects."
In the past, I've done very little multi-threaded code (that's going to
change in the near future), so I don't tend to think of such issues.
Still, it's perfectly permissible for the optimization described to be
performed between the flockfile() and funlockfile() calls. Note, in
particular, that this same requirement applies to fgetc() itself, so the
optimization is even more valuable in this context.

Rainer Weikusat

unread,
Jul 18, 2018, 11:36:36 AM7/18/18
to
William Ahern <wil...@25thandClement.com> writes:
> Rainer Weikusat <rwei...@talktalk.net> wrote:

[...]

>> JFTR: My opinion on this is that calls to fread or fwrite are
>> essentially mistaken and should be avoided. The read/write interface is
>> just fine for reading data in fixed-size blocks > 1.
>
> Error persistence semantics that allow you to lazily check ferror and feof
> are useful when interleaving I/O among several files handles or in similar
> complex scenarios (e.g. nested loops).

Beyond "a program I believed to be more complicated than a mere average
program once used ...", I have no idea what this is suppose to
mean. Could you perhaps describe an example of this?

> Using FILE handles allows callers to make use of fmemopen and open_memstream
> (or GNU fopencookie or BSD funopen), permitting one to unify stream and
> string APIs with a standard data type.

Using char * arithmetic on an array is just as unified (NB:
I'm not making a statement re: "Unification -- good or bad?").

> Large block reads are often premature optimization.

"Ergo: Don't stdio just because believe it will magically improve
performance."?

> fgetc is an underutilized function. I've seen many ad hoc parsers
> manually do buffering and pointer walking (often including buggy
> boundary logic) when using stdio and fgetc would have dramatically
> simplified the code without loss of performance.

I don't dispute that (minus the technicality that a simple
implementation of a stdio-style interface on top of a simple buffering
layer is IMHO preferable over the inherited Wolpertinger since they
added 'mandatory thread-safety for dummies' to that such that people who
don't need it have to put effort into avoiding it).

But - despite persisten rumours to the contrary - binary data other than
text without character set information exists and real-time
communication is also sometimes needed. stdio is ill-suited for both
situations.

William Ahern

unread,
Jul 18, 2018, 12:45:10 PM7/18/18
to
Rainer Weikusat <rwei...@talktalk.net> wrote:
> William Ahern <wil...@25thandClement.com> writes:
>> Rainer Weikusat <rwei...@talktalk.net> wrote:
>
> [...]
>
>>> JFTR: My opinion on this is that calls to fread or fwrite are
>>> essentially mistaken and should be avoided. The read/write interface is
>>> just fine for reading data in fixed-size blocks > 1.
>>
>> Error persistence semantics that allow you to lazily check ferror and feof
>> are useful when interleaving I/O among several files handles or in similar
>> complex scenarios (e.g. nested loops).
>
> Beyond "a program I believed to be more complicated than a mere average
> program once used ...", I have no idea what this is suppose to
> mean. Could you perhaps describe an example of this?

Here's a simpler example from code I just ran across 10 minutes ago:

n = fprintf(stdout, "%c %.2fM ", what, tomega(size));
frepc(' ', 12 - MIN(n, 12), stdout); // fputc loop
fprintf(stdout, "%s\n", path);
if (ferror(stdout)) { ... }

Rather than check for errors 3 separate times, I can reliably wait until
after the series of I/O operations to check that they all completed
successfully.

>> Using FILE handles allows callers to make use of fmemopen and open_memstream
>> (or GNU fopencookie or BSD funopen), permitting one to unify stream and
>> string APIs with a standard data type.
>
> Using char * arithmetic on an array is just as unified (NB:
> I'm not making a statement re: "Unification -- good or bad?").

The caller or callee may need to compose input or output--i.e. generate
formatted strings, concatenate string pieces, etc in a series of operations,
perhaps across separate routines. If the API supports a FILE handle as a
source or sink then code can be made simpler by using the standard
facilities.

Example function from an RTP/RTCP library of mine,
which is used to format RTCP packets for tracing and logging:

void rtcp_print(struct rtcp_packet *pkt, FILE *fp) {
union rtcp_any *any = 0;

fprintf(fp, "[RTCP: %zu bytes]\n", pkt->size);

rtcp_foreach(&any, pkt) {
rtcp_any_print(pkt, any, fp);
}
} /* rtcp_print() */

Of course, the functions for actual stream processing don't rely on stdio
facilities. But it was convenient to not need to duplicate any buffer
management complexity for the above routine.

Actually, the rtp library doesn't do any I/O other than the ancillary
diagnostic formatting routines like above. And it's a single file library.
Reusing a non-standard buffer manipulation library--like I have in my
toolbox, and as many other programmers do--would have been a needless and
annoying dependency.

I've also found that using stdio facilities eases language interoperability
as most languages can readily shared FILE handles.

>> Large block reads are often premature optimization.
>
> "Ergo: Don't stdio just because believe it will magically improve
> performance."?
>
>> fgetc is an underutilized function. I've seen many ad hoc parsers
>> manually do buffering and pointer walking (often including buggy
>> boundary logic) when using stdio and fgetc would have dramatically
>> simplified the code without loss of performance.
>
> I don't dispute that (minus the technicality that a simple
> implementation of a stdio-style interface on top of a simple buffering
> layer is IMHO preferable over the inherited Wolpertinger since they
> added 'mandatory thread-safety for dummies' to that such that people who
> don't need it have to put effort into avoiding it).
>
> But - despite persisten rumours to the contrary - binary data other than
> text without character set information exists and real-time
> communication is also sometimes needed. stdio is ill-suited for both
> situations.

Sure. My contention is that your statement

JFTR: My opinion on this is that calls to fread or fwrite are
essentially mistaken and should be avoided. The read/write interface is
just fine for reading data in fixed-size blocks > 1.

was too extreme. There are plenty of reasonable uses of stdio facilities.

Rainer Weikusat

unread,
Jul 18, 2018, 4:16:11 PM7/18/18
to
William Ahern <wil...@25thandClement.com> writes:
> Rainer Weikusat <rwei...@talktalk.net> wrote:
>> William Ahern <wil...@25thandClement.com> writes:
>>> Rainer Weikusat <rwei...@talktalk.net> wrote:
>>
>> [...]
>>
>>>> JFTR: My opinion on this is that calls to fread or fwrite are
>>>> essentially mistaken and should be avoided. The read/write interface is
>>>> just fine for reading data in fixed-size blocks > 1.
>>>
>>> Error persistence semantics that allow you to lazily check ferror and feof
>>> are useful when interleaving I/O among several files handles or in similar
>>> complex scenarios (e.g. nested loops).
>>
>> Beyond "a program I believed to be more complicated than a mere average
>> program once used ...", I have no idea what this is suppose to
>> mean. Could you perhaps describe an example of this?
>
> Here's a simpler example from code I just ran across 10 minutes ago:
>
> n = fprintf(stdout, "%c %.2fM ", what, tomega(size));
> frepc(' ', 12 - MIN(n, 12), stdout); // fputc loop
> fprintf(stdout, "%s\n", path);
> if (ferror(stdout)) { ... }
>
> Rather than check for errors 3 separate times, I can reliably wait until
> after the series of I/O operations to check that they all completed
> successfully.

Or not: If stdout is not line-buffered, ie, redirected to a file, the
test might end up being premature and the I/O error happen at some
unspecified, later time. This would need

fflush(stdout); /* stdio buffering model doesn't work here */

Also, the issue that every stdio-call might end up doing I/O and thus,
fail, wouldn't exist if stdio wasn't being used, hence, the "error
persistence semantics" are a cure for a built-in disease.

>>> Using FILE handles allows callers to make use of fmemopen and open_memstream
>>> (or GNU fopencookie or BSD funopen), permitting one to unify stream and
>>> string APIs with a standard data type.
>>
>> Using char * arithmetic on an array is just as unified (NB:
>> I'm not making a statement re: "Unification -- good or bad?").
>
> The caller or callee may need to compose input or output--i.e. generate
> formatted strings, concatenate string pieces, etc in a series of operations,
> perhaps across separate routines. If the API supports a FILE handle as a
> source or sink then code can be made simpler by using the standard
> facilities.

[...]

> Reusing a non-standard buffer manipulation library--like I have in my
> toolbox, and as many other programmers do--would have been a needless and
> annoying dependency.

I usually use a "non-standard buffer manipulation library" called perl
whenever that's feasible. It's even reasonably easy to (runtime) link arbitrary,
compiled code into a Perl application, this just needs a litte research
beyond "run h2xs because we've decreed that this is all you'll ever
need!" :-).

If it's not feasible, I do buffer management in C without a library for
that. This can get a bit cumbersome at times but it's perfectly doable.

[...]

>>> fgetc is an underutilized function.

[...]

>> I don't dispute that

[...]

>> But - despite persisten rumours to the contrary - binary data other than
>> text without character set information exists and real-time
>> communication is also sometimes needed. stdio is ill-suited for both
>> situations.
>
> Sure. My contention is that your statement
>
> JFTR: My opinion on this is that calls to fread or fwrite are
> essentially mistaken and should be avoided. The read/write interface is
> just fine for reading data in fixed-size blocks > 1.
>
> was too extreme. There are plenty of reasonable uses of stdio facilities.

IMHO, the only reasonable use for fread and fwrite is 'target
practice': Facilities the underlying layer already provided shouldn't be
reimplemnented on top of the upper abstraction layer. Given enough time,
this will end up just like a Microsoft networking protocol stack, ie, an
arbitrary stack of abstraction layers mutually reimplementing what the
abstraction layer below was supposed to abstract of.

That's maybe a good occupational therapy for CS graduates but lousy
software design.

Rainer Weikusat

unread,
Jul 18, 2018, 5:21:30 PM7/18/18
to
In an environment which supports symbol interposition and function call
tracing, that's certainly observable behaviour.

The only argument in favour of fread/ fwrite I can think of would be an
application doing 'small writes' of non-char data items, say, because it
uses a 24-bit representation for abstract 'characters'. The same
performance argument as for character stream I/O would then obviously
apply. But such a situation would be better handled by providing an
interface similar to the char-based one.

Someone using stdio to read relatively large blocks of binary data is
just misusing the facility: That's what read/ write already provide, so
why not just use that? I consider this preferable to adding
special-casing for "library misuse" to stdio itself in order to hide the
effects of this misuse as hard as possible.

Kaz Kylheku

unread,
Jul 18, 2018, 5:52:27 PM7/18/18
to
On 2018-07-18, Rainer Weikusat <rwei...@talktalk.net> wrote:
> Someone using stdio to read relatively large blocks of binary data is
> just misusing the facility: That's what read/ write already provide, so
> why not just use that?

The ISO C language doesn't define any binary block I/O other than
fread and fwrite; a reliance on read and write means that the program
is POSIX C.

> I consider this preferable to adding
> special-casing for "library misuse" to stdio itself in order to hide the
> effects of this misuse as hard as possible.

Language implementations should handle the portable features well,
rather than steering users toward extensions if whenever they ask for
half-decent performance.

James Kuyper

unread,
Jul 18, 2018, 8:34:23 PM7/18/18
to
OK - please identify which item under 5.1.2.3p6 connects to symbol
interposition or function call tracing.

I wasn't referring to "behavior which is observable", I was very
specifically and quite explicitly referring to "observable behavior" as
that term is defined by the C standard. 5.1.2.3p6 lists three categories
of behavior, identifying them as the ONLY kinds of behavior which must
actually match between the abstract machine and the actual behavior of a
program generated by a conforming implementation of C.

It then goes on to define the term "observable behavior" to provide a
convenient term for referring to those kinds of behavior. However, the
standard never uses the term "observable behavior", it merely defines
it. Therefore, if you feel you're getting hung up on the fact that many
kinds of behavior are in fact observable, but don't qualify as
"observable behavior" according to the C standard, ignore that term
entirely. Just talk about "the kinds of behavior described in 5.1.2.3p6".

The key point is that an optimization that produces a program where
those kinds of behavior are all precisely as required by the standard,
the fact that there are other kinds of behavior, which can be observed,
which don't match the description provided by the standard, is
unimportant. Such an optimization is entirely permissible, and does not
render the implementation non-conforming.

In particular, the sequence of standard library function calls that
occur when a program is executed is not one of the behaviors listed in
5.1.2.3p6. Therefore, despite the fact that the standard defines the
behavior of fread() in terms of repeated calls to fgetc(), and despite
the fact that it's perfectly feasible to trace the code and determine
whether or not it actually calls fgetc(), a fully conforming
implementation of fread() doesn't need to ever actually call fgetc(). As
long as the behaviors listed in 5.1.2.3p6 are the same for the final
executable as if it did make repeated calls to fgetc(), it's perfectly
free to use some entirely different method of producing those behaviors.

Rainer Weikusat

unread,
Jul 19, 2018, 10:13:59 AM7/19/18
to
James Kuyper <james...@alumni.caltech.edu> writes:
> On 07/18/2018 05:21 PM, Rainer Weikusat wrote:
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>> On 07/17/2018 05:19 PM, Rainer Weikusat wrote:

[...]

>>>> The fread function reads, into the array pointed to by ptr, up
>>>> to nmemb elements whose size is specified by size, from the
>>>> stream pointed to by stream. For each object, size calls are
>>>> made to the fgetc function and the results stored, in the order
>>>> read, in an array of unsigned char exactly overlaying the object
>>>>
>>>> This means it's even unclear if this attempt at transparent
>>>> turd-polishing is at least permitted.
>>>
>>> Of course it is - the difference between what is specified by the
>>> standard (multiple calls to fgetc()) and what is done with this
>>> optimization (direct transfer of an entire block of data) has no impact
>>> on what the standard calls "the observable behavior" (5.1.2.3p6) of the
>>> program, and it is therefore a permitted optimization. It makes the data
>>> transfer faster, but that's not "observable behavior".
>>
>> In an environment which supports symbol interposition and function call
>> tracing, that's certainly observable behaviour.
>
> OK - please identify which item under 5.1.2.3p6 connects to symbol
> interposition or function call tracing.
>
> I wasn't referring to "behavior which is observable",

But I was.

[...]

> The key point is that an optimization that produces a program where
> those kinds of behavior are all precisely as required by the standard,
> the fact that there are other kinds of behavior, which can be observed,
> which don't match the description provided by the standard, is
> unimportant.

That's something somebody considers less important than a more
complicated implementation of a function which shouldn't even exist in
order to hide the fact that it does exist for a use case which shouldn't
exist, either. This so-called 'optimization' does nothing but
transparently short-cirtuit a facility which is of no use for the
problem it's being applied to (or rather "tries to ....as hard as
possible").

> Such an optimization is entirely permissible, and does not
> render the implementation non-conforming.

It's also 'entirely permissible' to provide a read function (insofar the C
standard goes) which executes an equivalent of

kill(rand(), SIGSEGV);
raise(SIGILL);

This doesn't mean it's also sensible, both on its own and within an
environment where a function named read usually has another behaviour.

> In particular, the sequence of standard library function calls that
> occur when a program is executed is not one of the behaviors listed in
> 5.1.2.3p6. Therefore, despite the fact that the standard defines the
> behavior of fread() in terms of repeated calls to fgetc(), and despite
> the fact that it's perfectly feasible to trace the code and determine
> whether or not it actually calls fgetc(), a fully conforming
> implementation of fread() doesn't need to ever actually call fgetc().

Insofar the implementation is only used with strictly conforming C code,
this is true. But POSIX incorporates ISO-C and other standards define
other behaviour of real implementation of POSIX and conforming C code is
prefectly free to make of features outside of ISO-C.

"I'm writing code targetting a POSIX system which uses ELF for binary
applications. POSIX defines fread in terms of fgetc, ELF support symol
interposition, ergo, I should be able to change the behaviour of fread
by providing a custom fgetc" seems like a perfectly reasonable demand to
me.

No, I don't still believe in documentation. But I used to in the past.

Scott Lurndal

unread,
Jul 19, 2018, 10:35:01 AM7/19/18
to
Kaz Kylheku <157-07...@kylheku.com> writes:
>On 2018-07-18, Rainer Weikusat <rwei...@talktalk.net> wrote:
>> Someone using stdio to read relatively large blocks of binary data is
>> just misusing the facility: That's what read/ write already provide, so
>> why not just use that?
>
>The ISO C language doesn't define any binary block I/O other than
>fread and fwrite; a reliance on read and write means that the program
>is POSIX C.

If you were writing this on a newsgroup without 'unix' in the name,
you might have a valid point.

james...@alumni.caltech.edu

unread,
Jul 19, 2018, 11:42:36 AM7/19/18
to
I was the one who brought the concept of "observable behavior" into this discussion, and made it precisely clear which definition of that term I was referring to. To respond to my message by saying "that's certainly observable behavior", while using the term "observable behavior" in a different sense, makes your response a non-sequitur, and should at least have been accompanied by a note of some kind that you were switching topics. It is "observable behavior" as defined by the C standard, that is relevant to the question of which optimizations are permitted by that standard; "behavior which is observable" but which fails to match that definition is irrelevant to that question.

...
> > In particular, the sequence of standard library function calls that
> > occur when a program is executed is not one of the behaviors listed in
> > 5.1.2.3p6. Therefore, despite the fact that the standard defines the
> > behavior of fread() in terms of repeated calls to fgetc(), and despite
> > the fact that it's perfectly feasible to trace the code and determine
> > whether or not it actually calls fgetc(), a fully conforming
> > implementation of fread() doesn't need to ever actually call fgetc().
>
> Insofar the implementation is only used with strictly conforming C code,
> this is true. But POSIX incorporates ISO-C and other standards define
> other behaviour of real implementation of POSIX and conforming C code is
> prefectly free to make of features outside of ISO-C.
>
> "I'm writing code targetting a POSIX system which uses ELF for binary
> applications. POSIX defines fread in terms of fgetc, ELF support symol
> interposition, ergo, I should be able to change the behaviour of fread
> by providing a custom fgetc" seems like a perfectly reasonable demand to
> me.

It might seem reasonable to you, but the POSIX standard doesn't agree. Section 2.2.2 says:

"The following identifiers are reserved regardless of the inclusion of headers:
...
3. All identifiers in the table below are reserved for use as identifiers with external linkage."

One of the identifiers in that table is fgetc. That section goes on to say:

"Applications shall not declare or define identifiers with the same name as an identifier reserved in the same context."

Rainer Weikusat

unread,
Jul 19, 2018, 12:20:32 PM7/19/18
to
james...@alumni.caltech.edu writes:
> On Thursday, July 19, 2018 at 10:13:59 AM UTC-4, Rainer Weikusat wrote:

[...]

>> > In particular, the sequence of standard library function calls that
>> > occur when a program is executed is not one of the behaviors listed in
>> > 5.1.2.3p6. Therefore, despite the fact that the standard defines the
>> > behavior of fread() in terms of repeated calls to fgetc(), and despite
>> > the fact that it's perfectly feasible to trace the code and determine
>> > whether or not it actually calls fgetc(), a fully conforming
>> > implementation of fread() doesn't need to ever actually call fgetc().
>>
>> Insofar the implementation is only used with strictly conforming C code,
>> this is true. But POSIX incorporates ISO-C and other standards define
>> other behaviour of real implementation of POSIX and conforming C code is
>> prefectly free to make of features outside of ISO-C.
>>
>> "I'm writing code targetting a POSIX system which uses ELF for binary
>> applications. POSIX defines fread in terms of fgetc, ELF support symol
>> interposition, ergo, I should be able to change the behaviour of fread
>> by providing a custom fgetc" seems like a perfectly reasonable demand to
>> me.
>
> It might seem reasonable to you, but the POSIX standard doesn't agree. Section 2.2.2 says:
>
> "The following identifiers are reserved regardless of the inclusion of headers:
> ...
> 3. All identifiers in the table below are reserved for use as identifiers with external linkage."
>
> One of the identifiers in that table is fgetc. That section goes on to say:
>
> "Applications shall not declare or define identifiers with the same
> name as an identifier reserved in the same context."

You might perhaps have noted that the 'replacement fgetc' was subject to
yet another standard document covering something POSIX doesn't
cover. OTOH, it's at least good to know that POSIX prohibits alternate
library implementations of malloc. Maybe that'll cause them to crumble
...

Rainer Weikusat

unread,
Jul 19, 2018, 2:57:04 PM7/19/18
to
Not really. But it's not difficult to guess: The "least requirements on a
conforming implementation" as defined by C11 (quoted from a public
draft):

,----
| - Accesses to volatile objects are evaluated strictly according to the
| rules of the abstract machine.
|
| - At program termination, all data written into files shall be
| identical to the result that execution of the program according to
| the abstract semantics would have produced.
|
| - The input and output dynamics of interactive devices shall take place
| as specified in 7.21.3. The intent of these requirements is that
| unbuffered or line-buffered output appear as soon as possible, to
| ensure that prompting messages actually appear prior to a program
| waiting for input.
|
| This is the observable behavior of the program.
`----

It follows that I cannot possibly be writing this text as the behaviour of
the program I'm using to do so is not observable, it's running atop a
window system without observable behaviour and a kernel which doesn't
have observable behaviour, either. Not to mention that I couldn't post
it if I could write it, maybe because of some "quantum fluctuation" which
temporarily causeds reality to become real and the C standard fictional,
as all of the internet lacks observable behaviour as well.

And this starts to become pretty absurd since all of this applies to
your text as well: It's not part of the observable behaviour of anything
insofar the 'least requirements on a conforming C implementation' are
concerned.

james...@alumni.caltech.edu

unread,
Jul 19, 2018, 6:02:50 PM7/19/18
to
Yes, really - I explicitly cited 5.1.2.3p6 as the place where the
standard defines the meaning of that term.

> .... But it's not difficult to guess:

Why do you have to guess? I explicitly cited 5.1.2.3p6, the same clause
you quote below:

> ... The "least requirements on a
> conforming implementation" as defined by C11 (quoted from a public
> draft):
>
> ,----
> | - Accesses to volatile objects are evaluated strictly according to the
> | rules of the abstract machine.
> |
> | - At program termination, all data written into files shall be
> | identical to the result that execution of the program according to
> | the abstract semantics would have produced.
> |
> | - The input and output dynamics of interactive devices shall take place
> | as specified in 7.21.3. The intent of these requirements is that
> | unbuffered or line-buffered output appear as soon as possible, to
> | ensure that prompting messages actually appear prior to a program
> | waiting for input.
> |
> | This is the observable behavior of the program.
> `----
>
> It follows that I cannot possibly be writing this text as the behaviour of
> the program I'm using to do so is not observable, it's running atop a
> window system without observable behaviour and a kernel which doesn't
> have observable behaviour, either. Not to mention that I couldn't post
> it if I could write it, maybe because of some "quantum fluctuation" which
> temporarily causeds reality to become real and the C standard fictional,
> as all of the internet lacks observable behaviour as well.

The term "observable behavior" is simply a piece of jargon defined by
the C standard, whose meaning is not derivable by applying the
conventional meanings of the individual English words that make up the
term. That's not particularly unusual - most of the pieces of jargon
defined by the standard have that same characteristic. In fact, if that
weren't the case for any particular term, then there would be no need
for the C standard to provide a definition for that term - the ordinary
English meaning would be sufficient.

It's a term whose sole purpose is to provide a more convenient way of
referring to the behaviors listed in 5.1.2.3p6 than "the behaviors
listed in 5.1.2.3p6". The standard only defines the term, and never uses
it. In particular, definition of that term imposes no requirements of
it's own - it certainly does NOT, as you ridiculously suggest, prohibit
the observability of behaviors not on that list.

What the standard does require is that these behaviors be produced when
a program with defined behavior is translated and executed by a
conforming implementation of C. By identifying these as the "least"
requirements, and not including any other behaviors in that list, what
the standard is doing is permitting an implementation to optimize a
program so that it's behavior differs from the behavior of the abstract
machine, but only if the difference is confined to behaviors not on that
list.

Rainer Weikusat

unread,
Jul 20, 2018, 10:21:30 AM7/20/18
to
Because I specifcally referred to C99 which doesn't contain the trailing
'silly statement'. Hence, I had to guess that you were probably
referring to a later version.

>> ... The "least requirements on a
>> conforming implementation" as defined by C11 (quoted from a public
>> draft):

[...]

>> | This is the observable behavior of the program.
>> `----
>>
>> It follows that I cannot possibly be writing this text as the behaviour of
>> the program I'm using to do so is not observable,

[...]

> The term "observable behavior" is simply a piece of jargon defined by
> the C standard, whose meaning is not derivable by applying the
> conventional meanings of the individual English words that make up the
> term.

Considering that the C standard doesn't define the meaning of 'this', 'is',
'the', 'observable', 'behaviour', 'of' and 'the', this statement is obviously
untrue. It also couldn't possibly be true as the text is written in
English, hence, it can't be the sole definition of this language:
'Observable behaviour' is used with its English meaning and it's applied
to a list of observable changes of an environment a strictly conforming
C program could cause to happen.

[...]

> What the standard does require is that these behaviors be produced when
> a program with defined behavior is translated and executed by a
> conforming implementation of C. By identifying these as the "least"
> requirements, and not including any other behaviors in that list, what
> the standard is doing is permitting an implementation to optimize a
> program so that it's behavior differs from the behavior of the abstract
> machine, but only if the difference is confined to behaviors not on that
> list.

The 'least' means 'a conforming implementation may be as feeble and
useless as that', IOW, it doesn't prohibit arbitrary nonsense
implementation of, say, fread[*], provided the don't change the
observable behaviour of a strictly conforming C program. But conforming
C program can (and usually will) have other (and more) kinds of
observable behaviour and it's perfectly valid to evaluate the merits of
whatever kind of "Yes we can!"-hackery in the context of that,
especially if its usefulness for any purpose is as dubious as here.

[*] Instead of doing a read directly to the user-supplied buffer, it
could also do a write from that in order to use UUCP to contact a
computer in Anarctica which is only powered one once every six
months and not return until a reply was received.

This might also be useful for a particular application.

james...@alumni.caltech.edu

unread,
Jul 20, 2018, 11:16:04 AM7/20/18
to
Which "silly statement" are you referring to?

> >> ... The "least requirements on a
> >> conforming implementation" as defined by C11 (quoted from a public
> >> draft):
>
> [...]
>
> >> | This is the observable behavior of the program.
> >> `----
> >>
> >> It follows that I cannot possibly be writing this text as the behaviour of
> >> the program I'm using to do so is not observable,
>
> [...]
>
> > The term "observable behavior" is simply a piece of jargon defined by
> > the C standard, whose meaning is not derivable by applying the
> > conventional meanings of the individual English words that make up the
> > term.
>
> Considering that the C standard doesn't define the meaning of 'this', 'is',
> 'the', 'observable', 'behaviour', 'of' and 'the', this statement is obviously
> untrue.

What in the world are you talking about? 3p1 explains how terms used in the C standard are supposed to be interpreted:
"For the purposes of this International Standard, the following definitions apply. ..."

It's referring to the definitions that appear in section 3.

"... Other terms are defined where they appear in italic type or on the left side of a syntax rule.
Terms explicitly defined in this International Standard are not to be presumed to refer implicitly to similar terms defined elsewhere. Terms not defined in this International Standard are to be interpreted according to ISO/IEC 2382−1. Mathematical symbols not defined in this International Standard are to be interpreted according to ISO 31−11."

The term "observable behavior" appears in italic type in 5.1.2.3p6, marking it as a definition, overriding any other meaning that you might expect the phrase to have. That makes it a very different case from "'this', 'is', 'the', 'observable', 'behaviour', 'of' and 'the',", all of which are supposed to be interpreted according to normal English usage.
Note: since the C standard started out as an ANSI standard, it was originally written in US English, and continues to be written in that language. As a result, it contains many occurrences of "behavior", but not a single occurrence of "behaviour".

> ... It also couldn't possibly be true as the text is written in
> English, hence, it can't be the sole definition of this landardge:
> 'Observable behaviour' is used with its English meaning

No, it is not. "observable behavior" is explicitly identified as a special term that is defined by 5.1.2.3p6.

> [...]
>
> > What the standard does require is that these behaviors be produced when
> > a program with defined behavior is translated and executed by a
> > conforming implementation of C. By identifying these as the "least"
> > requirements, and not including any other behaviors in that list, what
> > the standard is doing is permitting an implementation to optimize a
> > program so that it's behavior differs from the behavior of the abstract
> > machine, but only if the difference is confined to behaviors not on that
> > list.
>
> The 'least' means 'a conforming implementation may be as feeble and
> useless as that', IOW, it doesn't prohibit arbitrary nonsense
> implementation of, say, fread[*], provided the don't change the
> observable behaviour of a strictly conforming C program. But conforming
> C program can (and usually will) have other (and more) kinds of
> observable behaviour and it's perfectly valid to evaluate the merits of
> whatever kind of "Yes we can!"-hackery in the context of that,
> especially if its usefulness for any purpose is as dubious as here.

That brings up a point I've been wondering about. What precisely do you find so dubious about this optimization? It allows fread() to be essentially just a wrapper for read(), executing a whole lot faster for large blocks than it could be if it were actually required to call fgetc() for each and every byte. I see no corresponding disadvantage. What disgusts you so much about it?
Surely it's not the supposedly lost opportunity to customize the behavior of fread() by providing an alternative definition for fgetc()? You mentioned that issue, but you aren't really claiming that this is a common need, are you?

Rainer Weikusat

unread,
Jul 20, 2018, 12:12:48 PM7/20/18
to
The specific silly statement this digression happens to be about.

[...]

>> >> ... The "least requirements on a
>> >> conforming implementation" as defined by C11 (quoted from a public
>> >> draft):
>>
>> [...]
>>
>> >> | This is the observable behavior of the program.
>> >> `----
>> >>
>> >> It follows that I cannot possibly be writing this text as the behaviour of
>> >> the program I'm using to do so is not observable,
>>
>> [...]
>>
>> > The term "observable behavior" is simply a piece of jargon defined by
>> > the C standard, whose meaning is not derivable by applying the
>> > conventional meanings of the individual English words that make up the
>> > term.
>>
>> Considering that the C standard doesn't define the meaning of 'this', 'is',
>> 'the', 'observable', 'behaviour', 'of' and 'the', this statement is obviously
>> untrue.
>
> What in the world are you talking about?

I explained this in the part of the text you deleted: The 'observable
behaviour' of a strictly-conforming C program, as defined by the C
standard. This is behaviour is called 'observable' because it happens to
be ... well ... observable.

But feel free to assume the word was chosen because its English meaning
is "extremely high mountain made out of abandoned shoes" or whatever
other 'custom intpretation' happens to suit you.

[more word-dancing cut]

>> > What the standard does require is that these behaviors be produced when
>> > a program with defined behavior is translated and executed by a
>> > conforming implementation of C. By identifying these as the "least"
>> > requirements, and not including any other behaviors in that list, what
>> > the standard is doing is permitting an implementation to optimize a
>> > program so that it's behavior differs from the behavior of the abstract
>> > machine, but only if the difference is confined to behaviors not on that
>> > list.
>>
>> The 'least' means 'a conforming implementation may be as feeble and
>> useless as that', IOW, it doesn't prohibit arbitrary nonsense
>> implementation of, say, fread[*], provided the don't change the
>> observable behaviour of a strictly conforming C program. But conforming
>> C program can (and usually will) have other (and more) kinds of
>> observable behaviour and it's perfectly valid to evaluate the merits of
>> whatever kind of "Yes we can!"-hackery in the context of that,
>> especially if its usefulness for any purpose is as dubious as here.
>
> That brings up a point I've been wondering about. What precisely do
> you find so dubious about this optimization? It allows fread() to be
> essentially just a wrapper for read(), executing a whole lot faster
> for large blocks than it could be if it were actually required to call
> fgetc() for each and every byte.

I already wrote this as well.

james...@alumni.caltech.edu

unread,
Jul 20, 2018, 1:56:54 PM7/20/18
to
That's not clear enough, but it allows me to make a guess. My best guess
is that the "silly statement" you're referring to is "This is the
_observable behavior_ of the program.". That is NOT what this digression
is about. Everything I've said about the permissibility of this
optimization was based solely upon the text that precedes that
statement. That statement merely defines a term that's convenient for
discussion of those requirements. In fact, since you seem hung up about
that term, I've tried to avoid using it when referring to those
requirements.

I've pointed out some key facts several times. You've never addressed
those facts, but your comments would make more sense if you've rejected
them for some reason. Would you care to make it clear whether or not you
have rejected them, and if so, why? To make things easier, I'll break it
up into little pieces. Could you please identify which ones you agree
with, and which ones you disagree with? When you identify one you
disagree with, would you care to explain why?

1. In 5.1.2.3p6, the phrase "observable behavior" is italicized.

2. Per 3p1, putting a term in italic type is used by the C standard for,
among other things, identifying things outside of clause 3 that
constitute the official C standard definition of that term.

3. Per 3p1, when the C standard provides a definition for a term, that
term should NOT be interpreted in the context of the C standard using
any other meaning that it might ordinarily have.

4. Therefore, that "silly statement" constitutes the official definition
of the term "observable behavior".

5. As a definition, that "silly statement" CANNOT impose any
requirements of it's own. The relevant requirements are imposed by the
preceding statements in 5.1.2.3p6, not by that one.

6. In particular, 5.1.2.3p6 should NOT be interpreted as requiring that
the specified list of behaviors be observable, nor as a prohibiting the
observability of behaviors not on the list.

7. As a result, the "silly statement" is irrelevant to the point I was
talking about, which is what the requirements governing optimization
are. You consider that "silly statement" to be the main topic of this
digression, only because you've misinterpreted it as imposing a silly
requirement.

8. To interpret "observable behavior" in that fashion requires
interpreting the individual words of that phrase as ordinary English
words, which 3p1 prohibits.

Kaz Kylheku

unread,
Jul 20, 2018, 2:20:27 PM7/20/18
to
I don't have a point? Nobody is justified in using fread and fwrite in
an application, even if it uses POSIX extensions; it is abuse of stdio?


--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1

Rainer Weikusat

unread,
Jul 20, 2018, 3:03:34 PM7/20/18
to
Kaz Kylheku <157-07...@kylheku.com> writes:
> On 2018-07-19, Scott Lurndal <sc...@slp53.sl.home> wrote:
>> Kaz Kylheku <157-07...@kylheku.com> writes:
>>>On 2018-07-18, Rainer Weikusat <rwei...@talktalk.net> wrote:
>>>> Someone using stdio to read relatively large blocks of binary data is
>>>> just misusing the facility: That's what read/ write already provide, so
>>>> why not just use that?
>>>
>>>The ISO C language doesn't define any binary block I/O other than
>>>fread and fwrite; a reliance on read and write means that the program
>>>is POSIX C.
>>
>> If you were writing this on a newsgroup without 'unix' in the name,
>> you might have a valid point.
>
> I don't have a point? Nobody is justified in using fread and fwrite in
> an application, even if it uses POSIX extensions; it is abuse of stdio?

Yes. The stdio buffering model is geared towards applications working
with "lines" read from or written to "text files". This is awkard/
somewhat complex to implement with read/ write. But that's not the case
for fixed-size I/O of binary data. A program targetting systems which
provide read/ write should use these for such cases.

Rainer Weikusat

unread,
Jul 21, 2018, 1:30:47 PM7/21/18
to
james...@alumni.caltech.edu writes:
> On Friday, July 20, 2018 at 12:12:48 PM UTC-4, Rainer Weikusat wrote:
>> james...@alumni.caltech.edu writes:

[...]

>> > Which "silly statement" are you referring to?
>>
>> The specific silly statement this digression happens to be about.
>
> That's not clear enough, but it allows me to make a guess. My best guess
> is that the "silly statement" you're referring to is "This is the
> _observable behavior_ of the program."

Exactly. I consider it a silly statements as standardized C has existed
for more than 20 years without and it adds nothing of value to the text.

[...]

> I've pointed out some key facts several times. You've never addressed
> those facts, but your comments would make more sense if you've rejected
> them for some reason.

To the contrary, they then wouldn't make any sense at all: The C
standard defines the observable behaviour of strictly conforming C
code. It doesn't (and cannot and isn't meant to) define the observable
behaviour of conforming C code. Depending on the real execution etc
environment, this is subject to other specifications and also, to common
practice.

Implementing fread in the described way _does_ change the observable
behaviour of a program (otherwise, no one could possibly claim the
change would be an improvement) and what C standard happens to call
'observable behaviour of a program', namely, see above, the observable
behaviour of strictly conforming C code, is irrelevant here.

As a side note: The C text very likely uses the phrase observable
behaviour because of its English meaning. Otherwise, they could have
chosen "concrete-stuffed chicken" instead.








james...@alumni.caltech.edu

unread,
Jul 21, 2018, 6:51:51 PM7/21/18
to
On Saturday, July 21, 2018 at 1:30:47 PM UTC-4, Rainer Weikusat wrote:
> james...@alumni.caltech.edu writes:
> > On Friday, July 20, 2018 at 12:12:48 PM UTC-4, Rainer Weikusat wrote:
> >> james...@alumni.caltech.edu writes:
>
> [...]
>
> >> > Which "silly statement" are you referring to?
> >>
> >> The specific silly statement this digression happens to be about.
> >
> > That's not clear enough, but it allows me to make a guess. My best guess
> > is that the "silly statement" you're referring to is "This is the
> > _observable behavior_ of the program."
>
> Exactly. I consider it a silly statements as standardized C has existed
> for more than 20 years without and it adds nothing of value to the text.

The only thing it adds is a convenient term for referring to "the kinds
of behavior listed in section 5.1.2.3p6".

You didn't bother identifying which of the numbered statements I gave
you disagreed with. I therefore choose to assume that you're disagreeing
with all of them.

...
> As a side note: The C text very likely uses the phrase observable
> behaviour because of its English meaning.

I very much doubt that. The C++ standard has been using the term
"observable behavior" since early drafts of the 1998 standard for the
same purpose. I think the authors of the C standard thought it would be
a good idea to incorporate the same concept. The original C++ definition
was very different from the current one, but the current version is
worded in a way quite similar to the way it is worded in the C standard,
which is a strong indication it's meant to be interpreted the same way
in both standards:

Section 1.9:
"8 The least requirements on a conforming implementation are:
(8.1) — Access to volatile objects are evaluated strictly according to
the rules of the abstract machine.
(8.2) — At program termination, all data written into files shall be
identical to one of the possible results that execution of the program
according to the abstract semantics would have produced.
(8.3) — The input and output dynamics of interactive devices shall take
place in such a fashion that prompting output is actually delivered
before a program waits for input. What constitutes an interactive device
is implementation-defined.
These collectively are referred to as the observable behavior of the
program. [ Note: More stringent correspondences between abstract and
actual semantics may be defined by each implementation. — end note ]"

The key difference is that the C++ standard actually uses the term
"observable behavior", whereas the C standard only defines the term
without ever using it:

"The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations.
In particular, they need not copy or emulate the structure of the
abstract machine. Rather, conforming implementations are required to
emulate (only) the observable behavior of the abstract machine as
explained below." (1.9p1).

If "observable behavior" had been intended to mean "behavior which is
observable", that clause would be pretty meaningless - it would allow no
significant different between the behavior of a conforming
implementation and the behavior of the abstract machine, whereas it's
hard for a sane person to read 1.9 of the C++ standard without realizing
that the key point of that clause is to explicitly allow such differences
- while restricting the permitted differences to things that aren't
"observable behavior".

Rainer Weikusat

unread,
Jul 22, 2018, 6:39:22 AM7/22/18
to
james...@alumni.caltech.edu writes:
> On Saturday, July 21, 2018 at 1:30:47 PM UTC-4, Rainer Weikusat wrote:
>> james...@alumni.caltech.edu writes:
>> > On Friday, July 20, 2018 at 12:12:48 PM UTC-4, Rainer Weikusat wrote:
>> >> james...@alumni.caltech.edu writes:
>>
>> [...]
>>
>> >> > Which "silly statement" are you referring to?
>> >>
>> >> The specific silly statement this digression happens to be about.
>> >
>> > That's not clear enough, but it allows me to make a guess. My best guess
>> > is that the "silly statement" you're referring to is "This is the
>> > _observable behavior_ of the program."
>>
>> Exactly. I consider it a silly statements as standardized C has existed
>> for more than 20 years without and it adds nothing of value to the text.
>
> The only thing it adds is a convenient term for referring to "the kinds
> of behavior listed in section 5.1.2.3p6".

The only thing it practically adds is an opportunity for sophistic
confusion as it now becomes 'problematic' to discuss the observable
behaviour of a program: Whenever I write that I don't disagree with or
dispute anything written in the C standard but that I'm writing about
something entirely different this text doesn't apply to, you 'choose' to
ignore that and to hark back to the 'authoritative English language
definition' of your choice (C11), ie, your effective standpoint is that
the term 'observable behaviour' must not be used to describe anything
except what the C standard happens to use it for.

[...]

>> As a side note: The C text very likely uses the phrase observable
>> behaviour because of its English meaning.
>
> I very much doubt that.

Some people also "very much doubt" that stuff they presently can't see
actually exists. Usually, they're being paid for that.

james...@alumni.caltech.edu

unread,
Jul 23, 2018, 1:41:38 PM7/23/18
to
On Sunday, July 22, 2018 at 6:39:22 AM UTC-4, Rainer Weikusat wrote:
> james...@alumni.caltech.edu writes:
> > On Saturday, July 21, 2018 at 1:30:47 PM UTC-4, Rainer Weikusat wrote:
> >> james...@alumni.caltech.edu writes:
....
> >> > That's not clear enough, but it allows me to make a guess. My best guess
> >> > is that the "silly statement" you're referring to is "This is the
> >> > _observable behavior_ of the program."
> >>
> >> Exactly. I consider it a silly statements as standardized C has existed
> >> for more than 20 years without and it adds nothing of value to the text.
> >
> > The only thing it adds is a convenient term for referring to "the kinds
> > of behavior listed in section 5.1.2.3p6".
>
> The only thing it practically adds is an opportunity for sophistic
> confusion as it now becomes 'problematic' to discuss the observable
> behaviour of a program: Whenever I write that I don't disagree with or
> dispute anything written in the C standard but that I'm writing about
> something entirely different this text doesn't apply to, you 'choose' to
> ignore that and to hark back to the 'authoritative English language
> definition' of your choice (C11), ie, your effective standpoint is that
> the term 'observable behaviour' must not be used to describe anything
> except what the C standard happens to use it for.

This discussion started out as being about whether or not a given
optimization is permitted. If you've wandered off into discussing
something else, you failed to make that clear.
The kinds of behavior that are listed in 5.1.2.3p6 is very relevant to
such a discussion, because those kinds of behaviors are the only ones
that determine whether or not the optimized code meets the "least
requirements", and therefore whether or not the optimization is
permitted.

However, in a C context, I can't image any useful purpose served by
discussing whether or not something is "behavior which is observable",
in the ordinary English meaning of the phrase. So, what opportunities
for meaningful discussion about C are lost by the C standard defining
the term "observable behavior" to refer to the kinds of behavior listed
in 5.1.2.3p6, rather than "behavior which is observable"?

I know for a fact that, during the period when the current C++ standard
defined the term "observable behavior", and the current C standard did
not, I often had to discuss similar issues in both the C and C++ context
(often in the very same message), and the fact that the C++ standard
defined that term often made those discussions a lot easier to write
when I was writing about C++ than when writing about C.

> >> As a side note: The C text very likely uses the phrase observable
> >> behaviour because of its English meaning.
> >
> > I very much doubt that.
>
> Some people also "very much doubt" that stuff they presently can't see
> actually exists. Usually, they're being paid for that.

I notice that you didn't say anything about the reasons I gave for that
doubt. I presume that means that you couldn't come up with anything to
say about them that would support your point of view. If you could have
come up with something, that would have made a much better response than
the one you actually made.

james...@alumni.caltech.edu

unread,
Jul 23, 2018, 1:54:48 PM7/23/18
to
On Saturday, July 21, 2018 at 1:30:47 PM UTC-4, Rainer Weikusat wrote:
...
> To the contrary, they then wouldn't make any sense at all: The C
> standard defines the observable behaviour of strictly conforming C
> code. It doesn't (and cannot and isn't meant to) define the observable
> behaviour of conforming C code. Depending on the real execution etc
> environment, this is subject to other specifications and also, to common
> practice.

Oddly enough, the committee appears to disagree with you about that. By
definition, the standard imposes no requirements of any kind when a
program has undefined behavior (3.4.3p1). An implementation has no
obligation to even accept a program that's not strictly conforming
(4p6). The only program that any implementation is obligated to
translate and execute correctly is the infamous "one program" that must
exist in order to meet the requirements of 5.2.4.1p1.

However, if an implementation chooses to accept, translate, and allow
the execution of a given program, and if that program fails to qualify
as strictly conforming only because it contains instances of unspecified
behavior, and is otherwise correct, it is called by the standard a
"correct program", and the standard explicitly requires (4p3) that the
program must act in accordance with the requirements of 5.1.2.3,
precisely the section I've been talking about.

Rainer Weikusat

unread,
Jul 23, 2018, 2:23:21 PM7/23/18
to
james...@alumni.caltech.edu writes:

[...]

>> >> As a side note: The C text very likely uses the phrase observable
>> >> behaviour because of its English meaning.
>> >
>> > I very much doubt that.
>>
>> Some people also "very much doubt" that stuff they presently can't see
>> actually exists. Usually, they're being paid for that.
>
> I notice that you didn't say anything about the reasons I gave for that
> doubt. I presume that means that you couldn't come up with anything to
> say about them that would support your point of view.

It's more that I don't care.

Rainer Weikusat

unread,
Jul 23, 2018, 2:53:55 PM7/23/18
to
Something which is totally off-topic here I do care about (privately):

https://www.youtube.com/watch?v=cHmdLtXdrlE

This adds just as much or as little of value to the discussion as
continuing this word fencing for the sake of it would.

0 new messages