Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: does writing same data back to a file cause an actual disk write?

0 views
Skip to first unread message
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

unruh

unread,
Dec 9, 2009, 12:11:32 PM12/9/09
to
On 2009-12-09, The Natural Philosopher <t...@invalid.invalid> wrote:
> unruh wrote:
>> On 2009-12-08, The Natural Philosopher <t...@invalid.invalid> wrote:
>>> Ive been pondering this..as an adjunxct to smome messing about with SQL.
>>>
>>> I cant be sure, but it looks like many redads and write-backs of the
>>> same data are not causing disk activity. I was wondering if Linux is
>>> smart enough to compare data in its buffers with write data, and not
>>> flush if it doesn't change..
>>
>> Not unless you use rsync, but then the file is read anyway because rsync
>> needs to test the md5 of the file.
>> Linux has no idea if the data being written is the same. It would have to
>> read both what is already on the disk
>
> Er no. its just read that off. It needs to compare the write request
> with its existing disk cache. I juts wondered if it did..

It does not know it is in the disk cache, or that it has "just read that
off". It has to do a disk read. It might be that the data is in the
cache, but Linux does not know that. It would be incredibly inefficient
to spend time comparing the what is on the disk to what is to be written
before writting. 99% of the time it would have to write anyway, so all
that extra effort would be wasted.

>
>
>
>> I suspect what you are seeing is disk buffering. Ie, the stuff is saved
>> in a buffer. Do a sync and see if the disk light comes on then.
>>
>>

The Natural Philosopher

unread,
Dec 9, 2009, 4:36:21 PM12/9/09
to
unruh wrote:
> On 2009-12-09, The Natural Philosopher <t...@invalid.invalid> wrote:
>> unruh wrote:
>>> On 2009-12-08, The Natural Philosopher <t...@invalid.invalid> wrote:
>>>> Ive been pondering this..as an adjunxct to smome messing about with SQL.
>>>>
>>>> I cant be sure, but it looks like many redads and write-backs of the
>>>> same data are not causing disk activity. I was wondering if Linux is
>>>> smart enough to compare data in its buffers with write data, and not
>>>> flush if it doesn't change..
>>> Not unless you use rsync, but then the file is read anyway because rsync
>>> needs to test the md5 of the file.
>>> Linux has no idea if the data being written is the same. It would have to
>>> read both what is already on the disk
>> Er no. its just read that off. It needs to compare the write request
>> with its existing disk cache. I juts wondered if it did..
>
> It does not know it is in the disk cache,

why not? with 200Mbyte of cached disk buffers, it doesn't know where
they belong? I dread the thought of them being written to disk,
randomly. ;-)


or that it has "just read that
> off".

Of course it does, or there is no effing point to having a disk cache at
all!

A disk cache is a bit of software that says' oh good, I read that 5
seconds ago, no need to read it again'


It has to do a disk read.

I am qabsolutely stunned that somone on a lkinux group apperas tio have
absolutely no idfea how linux, or any other OS, actually works.,

Are you still using DOS version 1? every operating system I have come
across since then caches disk, even later versions of CP/M did.


It might be that the data is in the
> cache, but Linux does not know that.

YOu are kidding no? If its in Linux cache, WTF is the point of having it
if Linux doesn't know?

LET the chinese take over the world. They deserve it.

It would be incredibly inefficient
> to spend time comparing the what is on the disk to what is to be written
> before writting.

Again the mind just boggles. You appear to know nothing about RAM speed
versus disk write speed versus disk seek speed, or indeed anything at
ALL about disk caching.

I guess I simply asked a question taht is beyind your ken.

99% of the time it would have to write anyway, so all
> that extra effort would be wasted.

But 99% of the time it doesn't write anyway, not till much later.

FFS educate yourself

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

My question is whether or not it checks to see if a buffer its about to
replace with new data is checked and is NOT marked 'dirty' if its
identical.

Since a memory to memory compare is about 4-10 times faster than a disk
write. And that's assuming you don't have to do any seeking.


Jeremy Nicoll - news posts

unread,
Dec 9, 2009, 5:41:49 PM12/9/09
to
The Natural Philosopher <t...@invalid.invalid> wrote:

> Jeremy Nicoll - news posts wrote:
> > The Natural Philosopher <t...@invalid.invalid> wrote:
> >
> >> Jeremy Nicoll - news posts wrote:
> >>> The Natural Philosopher <t...@invalid.invalid> wrote:
> >>>
> >>>> Er no. its just read that off. It needs to compare the write request
> >>>> with its existing disk cache. I juts wondered if it did..

> > > > But what happens if something else changed the data on the disk
> > > > since it "just read it off"?
> >
> >> Then you have a broken computer!
> >>
> > > Only one thing reads data of a disk on a linux system, or writes it,
> > > and that's the disk driver software.
> >
> > What about disks shared with other systems?
>
> No disk I know of attaches to two computers. Only networked disk
> subsystems do.

Ah well then... Back in the 1980s I was working in IBM mainframe systems
where shared DASD controllers were the norm.

[I know that SCSI was around well before that and it also allowed all the
devices on a SCSI chain to be shared.]

The IBM mainframe I/O system was essentially a set of computers separate
('the disk subsystem controllers') from the mainframes themselves; the
mainframes formulated I/O requests, handed them to the DASD subsystems, and
then did other things for other users/processes until the I/O system
interrupted the mainframe and said "I've done it". The mainframe would note
this, then despatch whatever tak was waiting for the I/O to complete.

Of course the DASD subsystem could have different levels of cacheing (and
different sorts), so "I've done it" didn't necesarily mean that data to be
written had yet reached a platter, but none of the apps or the OSes
themselves needed to know that. (Configuring the subsystems and choosing
which datasets resided on which subsystems was a separate process where the
need for speed vv absolute certainty that a write had occurred could be
balanced.)

Also of course apps (and indeed parts of the OS) could implement their own
local-to-one-machine cacheing if they wanted to but there wasn't much point.

When I was last working with these systems it was possible to have DASD
controllers implementing mirroring (etc) between physical disks miles apart,
with the controllers connected on dedicated optical fibre links. So not
only was there shared disks, but there were multiple copies of these shared
volumes well physically separated.

Bear in mind that these days a single z/OS mainframe can have many
(thousands, I believe) of separate copies of Linux running in them at any
one time. I can't believe that that doesn't use shared disks.

--
Jeremy C B Nicoll - my opinions are my own.

Email sent to my from-address will be deleted. Instead, please reply
to newsre...@wingsandbeaks.org.uk replacing "nnn" by "284".

The Natural Philosopher

unread,
Dec 9, 2009, 6:10:19 PM12/9/09
to
Why?

All those disks have to in some way pass through a single disk system or
there is chaos.

A disk on the end of 100 mile sof optical fiber is still connected... if
you want two applications on two sets of hardware to access it, then you
move the intelligence from the computer to the disk drive: It becomes in
effect a computer in its own right attached to a disk drive.

When I talk about a disk drive I don't means a mainframe cabinet with a
gigabyte of RAM in it, I mean a thing of platters and spindles and
heads..it CANT be connected to more than one device. It would be like
having mother in law with a steering wheel as well, in the back seat..

Message has been deleted

Marten Kemp

unread,
Dec 9, 2009, 7:25:33 PM12/9/09
to

z/VM mainframe, actually, but the rest of the description is correct.
I think the limits of Linux-for-zSeries images is somewhere in the
200-300 range due to the real memory requirements for setting up
hipersocket virtual network connections. By the way, don't use
'zLinux' please, cause IIRC zLinux is someone's trademark who
wouldn't sell it to IBM.

--
-- Marten Kemp (Fix ISP to reply)
You can't help being ignorant 'cause there's always
something you don't know; what you can't be is stupid.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Jerry Peters

unread,
Dec 11, 2009, 5:05:07 PM12/11/09
to
The Natural Philosopher <t...@invalid.invalid> wrote:
> Jeremy Nicoll - news posts wrote:
>> Chris Davies <chris-...@roaima.co.uk> wrote:
>>
>>> The Natural Philosopher <t...@invalid.invalid> wrote:
>>>> Er no. its just read that off. It needs to compare the write request
>>>> with its existing disk cache. I juts wondered if it did..
>>> Jeremy Nicoll - news posts <jn.nntp....@wingsandbeaks.org.uk> wrote:
>>>> But what happens if something else changed the data on the disk since it
>>>> "just read it off"?
>>>
>>> Natural Philosopher is talking about stuff inside the kernel. Jeremy
>>> is talking about user-level stuff. The two are pretty different and I
>>> think this thread is talking at cross purposes.
>>
>> I am not talking about user stuff. My computing degree was focussed on OS
>> internals and I worked for years as an MVS systems programmer.
>>
> well we were talking about different things, to be sure, but unusually,
> it hasn't degenerated, and it has been interesting.
>
> I mean my point was that at some point nm te chain, there has to be a
> single arbiter of what goes on to a disk, as it only has one set of
> heads and platters!
>
> I didn't know that _disks_ with enough intelligence to allow two
> machines access to that single point existed.
>
> Obviously its a cinch if its a disk attached to a networked computer of
> some sort.
>
>
> Anyway. my enquiry assumed a typical PC linux machine. It had to do with
> Mysql multiple redundant updates on a table not showing much disk
> activity. It transpires that MySQL doesn't write what it doesn't need
> to, and that's a useful bit of knoweldge.
>
> It transpires that Linux almost certainly DOES write what it doesn't
> need to, because at some level the cost of checking is expensive and the
> chance of its being the case is low.
>
> The thought of two computers sharing a common disk horrifies me though.
> It drives a cart and horses through all caching mechanisms. The only
> logical thing would be to implement caching in the drive itself.
>
> At awhich pint the drive IS the computer in question.
>
SCSI bus attached to 2 PCs. A disk with 2 partitions on it attached to
the SCSI bus. Each partition is used by one and only one of the PCs.
The disk is now shared, although the data is not.

Jerry

0 new messages