On 22/07/15 11:37, Chris Ahlstrom wrote:
> Jasen Betts wrote this copyrighted missive and expects royalties:
>
>> On 2015-07-21, Chris Ahlstrom <
OFee...@teleworm.us> wrote:
...
>> in one terminal
>>
>> $ while yes AAAAAAAAAAAAAAA | dd conv=notrunc bs=20K count=100 of=a
>> do sleep 0.01
>> yes BBBBBBBBBBBBBBB | dd conv=notrunc bs=20K count=100 of=a
>> sleep 0.01
>> done
>>
>> in aonther rsync the file "a" over a slow connection
>>
>> I got lines with BBB then with AAA then with AAA again
>>
>> $ od -ta a
>> 0000000 B B B B B B B B B B B B B B B nl
>> *
>> 1540000 A A A A A A A A A A A A A A A nl
>> *
>> 1560000 B B B B B B B B B B B B B B B nl
>> *
>> 2220000 A A A A A A A A A A A A A A A nl
>> *
>> 6070000
>
> What does that prove? dd is opening and closing "a" a number of times.
No, it doesn't prove that, it shows with a high probability that you
know nothing about *nix systems and how they work.
If ``dd'' is opening and closing file ``a'' a number of times, is that
standard practice? If so, then ``yes'' should also be opening and
closing its output a number of times. BUT once ``yes'' closes its
output, it can never re-open the pipe. This means there is probably
another explanation to fit the facts.
It is more likely that:
The writing of ``dd'' and the reading of ``rsync'' have been interlaced
with the ``rsync'' reading more than the ``dd'' has managed to write.
Possible proof:
The first fact to note is that there are a large number of Bs before a
short number of As before the Bs recommence
Second is that no indication is given of (a) the total number of
processors the machine has, nor (b) how many processes are actually
running on the machine - there are at least 3 programs (the ``yes'',
``dd'' and ``rsync'') and the network handling of the kernel. If there
are not enough processors for one for each of the running programs at
some stage the running programs will have to physically stop (sleep/be
suspended) temporarily as the processor is used to run another program.
Third is that disk access is SLOW. As a result, data is buffered in
memory until physically written to disk
Fourth is that programs will avoid unnecessary overheads to ensure they
complete as fast as possible. Opening and closing files incurs an
overhead which means that ``dd'' is very unlikely to keep closing and
opening the output file.
Fifth is that an unnamed pipe between programs is a circular buffer of
memory of limited length (64Ki)
Sixth is that programs will not [in general, especially if a file is
[very] large] read the whole of a file at once, but read a buffer full
at a time 9which is then processed before the next buffer full is read);
the optimum size of buffer depends upon various factors (eg size of data
that can be physically read in one go from a disk drive; reliability of
packet size across a network, etc).
How can these facts account for the output?
First consider the output of ``yes'' and the input of ``dd'' - unless
the latter is extracting data from the pipe at the same rate (or higher)
as the former is putting it in, at some stage the former will block
waiting for space in the pipe; when such blocking occurs, it will be
put in a sleep mode. Similarly the latter: if it tries to read from an
empty pipe (eg because the former was suspended by the scheduler as the
processor was needed for another program) it will block waiting for data
and be put in a sleep mode.
Next, ``dd'' will spew out to file ``a'' as fast as it can based on as
fast as it can read data from its input and filling its internal buffer
(block). If the disk [write] buffering becomes too much it too might be
blocked until the [write] buffers are written to disk (to ensure data
integrity on the disk).
Now how does the ``rsync'' fit in? Under *nix there is [generally] no
file locking UNLESS the programs mutually agree to it - they set up
their own flags. So, as soon as ``rsync'' starts to read file ``a'' it
will get whatever happens to be available to read as the position it has
got through the file. If the data is not buffered in memory, it may get
suspended whilst it waits for the data to be read from the disk; and it
may also get suspended whilst it waits to ensure any network data sent
is acknowledged as being received at the other end.
So, if the machine only has 1 or 2 processors (or there are other
programs running which brings the total number of programs running to
greater than the number of processors), at some stage only 1 or 2 of
``yes'', ``dd'' and ``rsync'' can be running at any moment. If dd
manages to process 01540000 bytes of overwriting file ``a'' with Bs
before it gets suspended then rsync runs and processes 0156000 bytes of
file ``a'' before it gets suspended and ``dd'' starts running again to
process a further 0460000 bytes before being suspended and rsync
restarting and processing more than 0440000 bytes, what would the output
of ``rsync'' look like at this stage? Exactly the above.
QED.
> as soon as ``rsync'' opens file ``a''
> rsync just grabbed "a" as it existed just after the previous close.
>
It wasn't closed, it was sync'd to the disk/the dd process got suspended
and rsync ran.